Maarten Balliauw {blog}

ASP.NET, ASP.NET MVC, Windows Azure, PHP, ...

NAVIGATION - SEARCH

Microsoft Web Development Summit 2009

PHP at Microsoft Being in the US for 2 times in a month (PDC09 and Web Development Summit) is fun, tiring and rewarding. The WDS09 was an invite-only event organized by Microsoft, focusing on interaction between Microsoft and the PHP community. I must say: the event has been helpful and interesting for both parties!

  • The Heathman Hotel in Kirkland is a nice hotel!
  • Traveling towards the US is far more productive than flying back: I did PHPMEF traveling westbound, I crashed (half sleep/half awake) on the eastbound flight…
  • If you just traveled over 26 hours: do NOT go shopping immediately when you arrive home! It’s really frustrating and tiring.
  • Did a session on Windows Azure SDK for PHP, PHPExcel and PHPLinq.
  • Did an interview for the Connected Show
  • Met a lot of people I knew from Twitter and e-mail, and met a lot of new people, both Microsoft and PHP community. Nice to meet you all!
  • Event focus was on feedback between Microsoft and PHP community, overall I think the dialogue was respectful and open and helpful to both parties.

Standing at the Microsoft logo

This was actually my first time at the WDS which has been around for 5 years already. The Interop team invited me there, and I want to thank them for doing that: it was a great trip, a great event and I got the chance to meet lots of new people.

Attendees were mostly people from the PHP community, like Cal Evans, Rafael Doms, Chris Cornutt, Romain Bourdon (WAMP server anyone?), Alison “snipe” Gianotto, … Next to that, lots of Microsoft people came by during various sessions. Some of them even reserved the whole week and were attending all sessions to make sure they were in the feedback loop all the time.

We’ve seen Microsoft sessions on IIS, Web Platform Installer, Silverlight, SQL Server, Bing, Powershell (sorry, Scott Hanselman, for disturbing your presentation with a tweet :-)). Interesting sessions with some info I did not know. PHP community sessions were also available: Wordpress, Joomla, Drupal, the PHP community perspective, feedback sessions, PHPLinq, PHPExcel, interoperability bridges, … A good mix of content with knowledgeable speakers and good communication between speakers, product groups and audience. Well done!

Document Interoperability Workshop, London, May 18 2009

Microsoft building London, Cardinal Place After a pleasant flight with VLM airlines (Antwerp – London City), traveling under half of the city of London, I arrived at the Microsoft offices in Victoria for their third (?) DII workshop, of which I attended a previous one in Brussels last year.

If you are wondering: “What are you doing there???”, here’s a short intro. I’ve been working on Microsoft interop projects for quite a few years now, like PHPExcel, PHPPowerPoint, PHPLinq, PHPAzure, … When working on PHPExcel and PHPPowerpoint, I hit the term “document interoperability” quite a lot. OpenXML (the underlying file format) is well documented, but there is some work on making sure the generated document by any of those tools is fully compatible with the standard. And that’s what these DII workshops are all about.

The previous DII workshop mentioned the OpenXML document viewer, which converted DOCX to HTML. Great to see there’s a new version available today, read more at the interop blog from Microsoft.

This blog post gives an overview of my experience during the DII day.

By the way, here’s a cool blog post about interop on an Excel document between PHP, JAVA and .NET. Nice read!

Validation of OpenXML resources

Some talks on the topic, one by Alex Brown, introduced what would be needed to make sure a document is conform the standard. This is quite a complicated topic, because validation should occur at multiple levels: ZIP package level, relations, XML markup, … Using W3C’s XProc is one of the possible solutions to this, where a pipeline of different validations on XML can be linked and executed. Cool thing is that it is a non-Microsoft approach to validating documents.

Another problem facing: there’s lots of things not in an XML schema, for example custom XML data in Word documents. How to validate those? Schematron is the answer to that (nice read).

Making sure documents are accessible in the future

Matevz Gacnik had a great presentation on all the problems there are to make sure documents stored in a document management system are accessible in the future. There are some technical issues to this (making sure you do not lose information: keep the text and do not convert everything to TIFF), but there are some legal issues as well: the document should be signed, you can not store alternative copies of a document, …

From legal back to technical: Matevz also showed us some technical implementations of their OpenXML based document management system (eDMS): cool! They parse content, add extra information using custom XML and bookmarks, … Great showoff for what you can do with OOXML.

Discussion: OpenXML SDK

Next, we had a discussion on the OOXML SDK. Some opinions are that XML markup is more clear and as verbose as the SDK, other opinions are that there are people on this world that don’t like XML and want to use code anyway. I think I’m going with the latter idea. But there’s one point that remains: source code for working with the SDK is still very verbose and I don’t like to type a lot. Luckily there’s the document reflector in the SDK too, which writes a lot of code for you based on a document that you want to be generated.

InteroperabilityPHPPowerPoint

Thanks to the people at Microsoft, I also had an opportunity to do a short demo of PHPPowerPoint. The demo scenario was quite simple: I did a short overview of the architecture behind PHPPowerPoint and a demo of the SDK and what it currently can do.

Community interoperability

Gerd Schürmann from Fraunhofer institute did a talk on their role in document interoperability in Germany and how they advise the government using different R&D projects and proof-of-concept projects. Their main purpose is to be a neutral mediator in open-source use. For this, they participate in lots of community projects like SourceForge, BerliOS, … As an example, Gerd showed us a community site demonstrating various scenarios around eID in Germany.

PLANETS and document conversion tools

Wolfgang Keber did his talk on PLANETS & document conversion tools. PLANETS is a tool that is aiming at preserving your digital assets by making sure they can always be converted into other document formats. There are some subprojects available, for example one that characterises a document. It determines what document format a file is in, and also determines if, for example, tables are used. These characteristics can then be used to convert the document into a required format using any conversion tool available (extensibility!). For example, libraries can use PLANETS to automatically characterise and convert old scanned books in, for example, TIFF, to PDF or OOXML.

c1 Extensibility within Standards

One of the great talks at the DII event was Stephen Peront ‘s talk on extensibility, targeting the less-known part of the OpenXML standard: markup compatibility. Basically, this allows you to embed your own custom XML markup inside OpenXML documents without disturbing the application that is opening your document (if done right). This presentation led to discussion about whether this is a good thing or a bad thing. Some say that extending a standard is creating a new standard while others agree that this markup compatibility manner of adding extra information to a document is a good thing. My guess is that this really depends on what you are doing. Adding some extra attributes should be cool. Adding extra nested elements embedding OOXML elements embedding more custom tags may be a road you don’t really want to take.

Other coverage

Other coverage on the DII event in London:

PHPPowerPoint 0.1.0 (CTP1) released!

PHPPowerPoint logo People following me on Twitter could have already guessed, but here’s something I probably should not have done for my agenda: next to the well known PHPExcel class library, I’ve now also started something similar for PowerPoint: PHPPowerPoint.

Just like with PHPExcel, PHPPowerPoint can be used to generate PPTX files from a PHP application. This can be done by creating an in-memory presentation that consists of slides and different shapes, which can then be written to disk using a writer (of which there’s currently only one for PowerPoint 2007).

Simple PHPPowerPoint demo Here’s some sample code:

[code:c#]

/* Create new PHPPowerPoint object */
$objPHPPowerPoint = new PHPPowerPoint();

/* Create slide */
$currentSlide = $objPHPPowerPoint->getActiveSlide();

/* Create a shape (drawing) */
$shape = $currentSlide->createDrawingShape();
$shape->setName('PHPPowerPoint logo');
$shape->setDescription('PHPPowerPoint logo');
$shape->setPath('./images/phppowerpoint_logo.gif');
$shape->setHeight(36);
$shape->setOffsetX(10);
$shape->setOffsetY(10);
$shape->getShadow()->setVisible(true);
$shape->getShadow()->setDirection(45);
$shape->getShadow()->setDistance(10);

/* Create a shape (text) */
$shape = $currentSlide->createRichTextShape();
$shape->setHeight(300);
$shape->setWidth(600);
$shape->setOffsetX(170);
$shape->setOffsetY(180);
$shape->getAlignment()->setHorizontal( PHPPowerPoint_Style_Alignment::HORIZONTAL_CENTER );
$textRun = $shape->createTextRun('Thank you for using PHPPowerPoint!');
$textRun->getFont()->setBold(true);
$textRun->getFont()->setSize(60);
$textRun->getFont()->setColor( new PHPPowerPoint_Style_Color( 'FFC00000' ) );

/* Save PowerPoint 2007 file */
$objWriter = PHPPowerPoint_IOFactory::createWriter($objPHPPowerPoint, 'PowerPoint2007');
$objWriter->save(str_replace('.php', '.pptx', __FILE__));

[/code]

Advanced sample A more advanced sample is also included in the download, where a complete presentation is rendered using PHPPowerPoint.

Now go grab the fresh sample on CodePlex and be the very first person downloading and experimenting with it. Feel free to post some feature requests or general remarks on CodePlex too.

I want to thank my employer, RealDolmen, for letting me work on this during regular office hours and also the people at DynamicLogic who convinced me to start this new project.

Saving a PHPExcel spreadsheet to Google Documents

As you may know, PHPExcel is built using an extensible model, supporting different input and output formats. The PHPExcel core class library features a spreadsheet engine, which is supported by IReader and IWriter instances used for reading and writing a spreadsheet to/from a file.

PHPExcel architecture

Currently, PHPExcel supports writers for Excel2007, Excel5 (Excel 97+), CSV, HTML and PDF. Wouldnt it be nice if we could use PHPExcel to store a spreadsheet on Google Documents? Let’s combine some technologies:

Creating a custom GoogleDocs writer

First, we need an implementation of PHPExcel_Writer_IWriter which will support writing stuff to Google Documents. Since Google accepts XLS files and Zend_Gdata provides an upload method, I think an overloaded version of PHPExcel’s integrated PHPExcel_Writer_Excel5 will be a good starting point.

[code:c#]

class PHPExcel_Writer_GoogleDocs extends PHPExcel_Writer_Excel5 implements PHPExcel_Writer_IWriter {
        // ...
}

[/code]

Since Google requires to log in prior to being able to interact with the documents stored on Google Documents, let’s also add a username and password field.

[code:c#]

class PHPExcel_Writer_GoogleDocs extends PHPExcel_Writer_Excel5 implements PHPExcel_Writer_IWriter {
    private $_username;
    private $_password;

    public function setCredentials($username, $password) {
        $this->_username = $username;
        $this->_password = $password;
    }
}

[/code]

Next, let’s override the save() method. This method will save the document as an XLS spreadsheet somewhere, upload it to Google Docs and afterwards remove it from the file system. Here we go:

[code:c#]

public function save($pFilename = null) {
        parent::save($pFilename);
        $googleDocsClient = Zend_Gdata_ClientLogin::getHttpClient($this->_username,
                $this->_password, Zend_Gdata_Docs::AUTH_SERVICE_NAME);
        $googleDocsService = new Zend_Gdata_Docs($googleDocsClient);
        $googleDocsService->uploadFile($pFilename, basename($pFilename), null,
                Zend_Gdata_Docs::DOCUMENTS_LIST_FEED_URI);

        @unlink($pFilename);
}

[/code]

Nothing more! This should be our new writer class.

Using the GoogleDocs writer

Now let’s try saving a spreadsheet to Google Docs. First of all, we load a document we have stored somewhere on the file system:

[code:c#]

$objReader = PHPExcel_IOFactory::createReader('Excel2007');
$objPHPExcel = $objReader->load("05featuredemo.xlsx");

[/code]

Next, let’s use PHPExcel’s IOFactory class to load our PHPExcel_Writer_GoogleDocs class. We will also set credentials on it. Afterwards, we save.

[code:c#]

$objWriter = PHPExcel_IOFactory::createWriter($objPHPExcel, 'GoogleDocs');
$objWriter->setCredentials('xxxxxxxx@gmail.com', 'xxxxxxxx');
$objWriter->save('somefile.xls');

[/code]

This should be all there is to it. Google Docs will now contain our spreadsheet created using PHPExcel.

Google Docs Image

Note that images are not displayed due to the fact that Google Docs seems to remove them when uploading a document. But hey, it’s a start!

You can download the full example code here (26.29 kb). Make sure you have PHPExcel, Zend Framework and Zend Gdata classes installed on your system.

ECMA-376 implementation notes are out

Document Interop Initiative Last month, Microsoft released the implementation notes for their ODF implementation in Office 2007. These implementation notes are actually the documentation on how Office 2007 treats ODF documents in various cases. Today, Microsoft released the ECMA-376 implementation notes, or in short: they've now documented how Office 2007 handles OpenXML documents. The implementation notes site can be found on www.documentinteropinitiative.org.

I am really enthousiast about this one, as it actually documents how Excel will handle files created by PHPExcel. While developing this library, there were certain moments where we really had to dig into what Excel was doing, because something did not work as expected. Now, we will be able to simply check the implementation notes for stuff like this, a huge time saver!

You may wonder, what kind of things are mentioned in these implementation notes. I'll give you some examples on SpreadsheetML, as that is the OpenXML format PHPExcel focuses on. Other blog posts by Doug Mahugh and Stephen Peront offer additional insights.

image

Maybe we should add a limit to this in PHPExcel...

image

This means I can now add a new feature to PHPExcel, which of course, will be Excel 2007-only, that automatically picks the correct page scale.

image

This one may actually explain why we are having some issues with PHPExcel, Excel 2007 and negative dates...

Conclusion: I like this stuff! No more searching for why things happen: it's really listed in documentation! Thank you Microsoft for not wasting my valuable evening hours trying to figure things like these out.

kick it on DotNetKicks.com

Microsoft launches Implementation Notes (for ODF)

Document Interop Initiative Just a quick post: at the Document Operability workshop in Brussels on December 2, Microsoft already announced they were going to do something with implementation notes. Here’s a scoop from my blog post on that:

“Here's another scoop: there will be a website containing implementer notes on Office 2007! The file format specification documents all file format features, the implemente notes actually document how a specific application is implementing the file format. Some examples: the file format specifies XLSX documents can have an error message specified in data validation scenarios. The implementer notes will tell that in Excel, the size of this error message is actually limited. Another example is that in Office 2007, the OpenXML can specify custom ribbons related to the document. This is not an OpenXML feature, but it allows to customise documents for a specific application. Sweet!”

Today, the implementation notes site has been officially launched on www.documentinteropinitiative.org (direct link). These are currently only for the Word 2007 ODF implementation, but mention lots of details on how Word 2007 treats ODF documents. Seeing something strange, unexpected? Check these implementation notes and you'll know why!

Here’s the press release: http://www.microsoft.com/presspass/press/2008/dec08/12-16ImplementationNotesPR.mspx - also check Doug Mahugh's post on this.

kick it on DotNetKicks.com

OpenXML DII workshop Brussels - Quick summary

A few days ago, I wrote I was doing a presentation on the DII workshop in Brussels together with Julien Chable. Apart from heavy traffic from Antwerp to Brussels (80km, almost 3 hours... *sigh*), I think the DII workshop was quite succesful! Lots of news around OpenXML and Office, lots of interesting ideas from other community members. It was also great to meet some people who I've been mailing with for 2 years in person.

Slides of the Redmond DII session can be found here.

Morning sessions

Interoperability Principles and the Microsoft DII Vision The first session by Vijay Rajagopalan on Interoperability Principles and the Microsoft DII Vision gave insight on what efforts Microsoft is currently doing regarding document iteroperability. One of the projects in this presentation that I did not know before is the OpenXML Document Viewer / HTML Translator out on CodePlex. I once blogged about converting DOCX files to HTML, actually Microsoft is now providing their own project for this (and other OOXML file formats in the future). Note that this can also be installed as a browser plug-ins for Firefox 3.0.x on Windows and Linux which renders OpenXML files in our browser without needing Microsoft Office.

Wolfgang Keber did a talk titled "DII showcase – experience from prior efforts". He explained the Planets project a little, describing the architecture of creating convertors between OpenXML, ODF, binary formats, wordperfect, ... The b2x translator evolved from this is an alternative to Microsoft's own conversion tool. A cool thing about the Planets conversion programs is that documents can actually be converted between lots of file formats by simply chaining "translator boxes". Even cooler: a hosted version of the tools are on their way!

Apache POI Paolo Mottadelli presented the Apache POI project (JAVA), introducing all subprojects for different interoperability scenarios. One that interested me in particular was HSSF, the "Excel" implementation shipped with POI. Good to see that they have also implemented both XLS and XLSX and also have a formula calculation engine. Sort of like PHPExcel :-) Make sure to check the examples page featuring different usage scenarios.

Roundtable discussion

RoundtableAfter this, we did the first of two roundtables. We discussed participants’ use scenarios and interoperability solutions regarding OpenXML and related tools and SDK's. Interesting question: what are the differences in support for the ECMA and ISO version of OpenXML? Office 14 will use the ISO version as default and keep support for the ECMA version. There are not many breaking changes in both standards compared to each other, only some extra features. The Office 2007 SP2 release will also provide a plug)in mechanism which will support ISO format in the future.

Speaking of service packs... upcoming SharePoint SP2 will support ODF in document libraries and open/save support will be in place.

Another interesting dicussion topic: interoperability certification! Who will actually label a specific OpenXML or ODF solution as compliant with the standards? Microsoft admits they are not the organization to do that, Paolo Mottadelli explained that the Apache Software Foundation might be a good choice. But then, who's to verify Apache? Somekind of chicken-and-egg story... Interesting topic, but I'm guessing more community discussion will follow this one!

Afternoon sessions

PHPExcel and OPENXML4J
View SlideShare presentation or Upload your own.

Of course, our session on PHPExcel and OPENXML4J. Slides can be found on SlideShare and on the DII site afterwards. Quite funny that my calculation engine did in one command-line window and not in the other. Also good to see Julien's OPENXML4J in action, consuming files generated by PHPExcel. By the way, your English was good, Julien :-)

Peter Amstein's talk on  Microsoft implementations of Open XML, ODF, and document format interop testing offered some insight in the ideas behind OpenXML and the architecture of the translator project between different file formats. Another thing he covered were some decisions in creating the translator project: what with fetaures that are supported in format A and not in B?

Implementation notes Here's another scoop: there will be a website containing implementer notes on Office 2007! The file format specification documents all file format features, the implemente notes actually document how a specific application is implementing the file format. Some examples: the file format specifies XLSX documents can have an error message specified in data validation scenarios. The implementer notes will tell that in Excel, the size of this error message is actually limited. Another example is that in Office 2007, the OpenXML can specify custom ribbons related to the document. This is not an OpenXML feature, but it allows to customise documents for a specific application. Sweet!

I'm really enthousiast about the concept of implementation notes, it offers great transparency! It allows to comply with file format specification as well as specific implementations of third-party applications. I really hope other vendors are making implementation notes for their product as well. Expect these notes somewhere half 2009.

OpenXML Doug Mahugh talked about a Document Test Library that is currently being developed by Microsoft. They are trying to launch a repository of documents that are 100% valid that can be used by custom implementations to validate document input.

Seems like Microsoft is also doing proposals to add features to the Open Document Format, really willing to adopt both ODF and OpenXML in their products and considering them both worth equally. Nice to see all proposals they have made, too, because there are really valuable additions to ODF in that list of proposals.

Another thing is that Microsoft is going to allow public comments on the OpenXML standard: the evolution of the standard will be subject to these comments and proposals.

Unfortunately there was not enough time for the whole presentation as there were some examples on testing and validating OpenXML documents.

Roundtable discussion 2

This roundtable discussion was focussed on how Microsoft can provide tools to the community to check, test and validate OpenXML documents. A good suggestion was to create a website like BrowserShots.org (rendered example) which renders an uploaded document in Word 2003 - 2007, OpenOffice, mobile devices, ... and allows you to see how a specific implementation renders your document. I really think the implementation notes I referred earlier in this post would be a great help fo this kind of application.

This brought us to unit testing... How to unit test OpenXML documents? Should unit tests also take other implementations in count? (i.e. should a test be inconclusive when OpenOffice and Word 2007 open a document differently?) More discussion on this coming in the blogosphere soon, I'm sure.

Other bloggers

Note: This list will get updated in the following days...

Check Julien Chable's blog posts on this event too (in French):

Here's Doug Mahugh's post:

Another news item: "Microsoft-led group launches new Open XML-interop tools"

Some pictures of the event

kick it on DotNetKicks.com

PHPExcel featured in php|architect / November 2008

Cover of php|architect / November 2008 Nice to see that there's a lot of activity going on related to PHPExcel! This weekend, I receievd my electronic copy of php|architect / November 2008 featuring an article on generating PDF and Excel files using PHP. Guess which library is being used in the second part of the article...

While reading the article, I noticed a very cool thing: the author, Aaron Wormus, is using a convenient way to generate charts in PHPExcel using Google Charts API. Since PHPExcel currently does not support creating graphs, this is really a good method to use charts in PHPExcel generated Excel files. Here's a screenshot of the generated file in the magazine:

image 

Sweet!

Presenting at the OpenXML DII workshop Brussels

First things first: WTF is DII??? It's the "Document Interoperability Initiative" that Microsoft launched. The workshop will be a day filled with presentations from a variety of people, including members of the Office product groups at Microsoft and developers and consultants from several other companies (basically like the one held in Redmond a few weeks ago).

Together with Julien Chable, I'll be doing a talk on "Document Interop from an Open Source perspective - PHPExcel and OPENXML4J". Both our open-source products PHPExcel and OPENXML4J will be highlighted with some background and technical demos. We will prove that OpenXML documents provide interoperability between platforms (Windows / Linux) and technologies (.NET, PHP and Java).

Hope to see you there next tuesday!

kick it on DotNetKicks.com

OpenXML support in Zend Framework 1.7 Lucene indexer

image It has indeed been a long time since I blogged about OpenXML. However, this does not mean I'm completely doing nothing with it! PHPExcel still keeps eating a lot of evening hours as well as some other OpenXML-related projects.

The folks at Zend contacted me if I would be interested in implementing OpenXML indexers for the Zend Framework Lucene Indexer. I've blogged about indexing OpenXML files once before, but since today's Zend Framework 1.7 release, indexing OpenXML documents is as easy as indexing any other natively supported file format. Want to know how? Check the documentation in the manual at http://framework.zend.com/manual/en/zend.search.lucene.html.

kick it on DotNetKicks.com