Maarten Balliauw {blog}

ASP.NET, ASP.NET MVC, Windows Azure, PHP, ...

NAVIGATION - SEARCH

Reuse Excel business logic with PHPExcel

Price calculationIn many companies, business logic resides in Excel. This business logic is sometimes created by business analysts and sometimes by business users who want to automate parts of their everyday job using Excel. This same Excel-based business logic is often copied into an application (i.e. a website) and is maintained on 2 places: if the Excel logic changes, the application should also be modified. Did you know you can use PHPExcel to take advantage of the Excel-based business logic without having to worry about duplicate business logic?

Here's a scenario: You are working in a company which sells "dream cars". For every model, the company has created an Excel spreadsheet which is used to calculate the car's price based on customer preferences. These spreadsheets are updated frequently in order to reflect the car manufacturer's pricing schemes.

Your manager asks you to create a small website which accepts some input fields (Does the customer want automatic transmission? What colour should the car be painted? Does the customer want leather seats? Does the customer want sports suspension?). Based on these questions, the car's price should be calculated. Make sure all prices on the website are in sync with this Excel sheet!

Download example source code: phpexcel4business.zip (318.74 kb)

1. Create the Excel sheet containing business logic

Defined names First of all, we'll create an Excel sheet containing business logic. If you're lazy, download my example here. To make things easy for yourself when scripting, make sure you add some defined names on each field you want to use as input/output. Of course it's possible to work with the sheet's cell references later on, but if you want to be able to change the location of cells within the worksheet later, these defined names are much easier!

2. Download the latest PHPExcel version

You can find PHPExcel on www.phpexcel.net. If you want a stable version, download an official release. The source code tab on CodePlex reveals the latest Subversion source code if you want to use it.

3. Create the web based front-end

Next thing we'll do is creating a simple webpage containing an HTML form which corresponds woth the parameters you want to pass to the Excel sheet.

4. Let's do some PHP coding!

Since you came here for the good stuff, here it is! What we'll do is load the Excel sheet, pass in the parameters and use some calculated values on our resultign page. First things first: include the necessary class references:

[code:c#]

/** Include path **/
set_include_path(get_include_path() . PATH_SEPARATOR . './PHPExcel/Classes/');

/** Class requirements */
require_once('PHPExcel.php');
require_once('PHPExcel/Reader/Excel2007.php');

[/code]

PHPExcel is the base library which represents an in-memory spreadsheet. Since we need to interface with an Excel2007 file, we also include the required reader class.

Now load the Excel sheet into a PHPExcel object:

[code:c#]

// Load price calculation spreadsheet
$objReader = new PHPExcel_Reader_Excel2007();
$objPHPExcel = $objReader->load("price_calculation.xlsx");

[/code]

All data from the web form is passed in via the $_REQUEST array. Let's pass these to the Excel sheet. I named all form fields equal to my defined names in Excel which makes coincidence of all array keys and cell names being the same intentional.

[code:c#]

// Set active sheet
$objPHPExcel->setActiveSheetIndex(0);

// Assign data
$objPHPExcel->getActiveSheet()->setCellValue('automaticTransmission', $_REQUEST['automaticTransmission']);
$objPHPExcel->getActiveSheet()->setCellValue('carColor', $_REQUEST['carColor']);
$objPHPExcel->getActiveSheet()->setCellValue('leatherSeats', $_REQUEST['leatherSeats']);
$objPHPExcel->getActiveSheet()->setCellValue('sportsSeats', $_REQUEST['sportsSeats']);

[/code]

PHPExcel is great success!This is actually about it. The only thing left is to fetch the formula's calculated values and we're done!

[code:c#]

// Perform calculations
$_VIEWDATA['totalPrice'] = $objPHPExcel->getActiveSheet()->getCell('totalPrice')->getCalculatedValue();
$_VIEWDATA['discount'] = $objPHPExcel->getActiveSheet()->getCell('discount')->getCalculatedValue();
$_VIEWDATA['grandTotal'] = $objPHPExcel->getActiveSheet()->getCell('grandTotal')->getCalculatedValue();

[/code]

You can use these values to print the result on your web page:

[code:c#]

Based on your chosen preferences, your car will cost <?php echo number_format($_VIEWDATA['grandTotal'], 2); ?> EUR.

[/code]

5. Summary

Embedding business logic in Excel and re-using it in PHP is not that hard. The PHPExcel library helps you simplify development: your application logic and business logic is separated. Business logic can be maintained by a business expert or key user in Excel. As an application developer, you can easily pass data in the sheet and make use of PHPExcel's calculation engine.

kick it on DotNetKicks.com 

OpenXML + Silverlight 2.0 = cool!

Mix '08 announced some nice things, among them the release of Silverlight 2.0 (beta), ASP.NET MVC framework (CTP 2). This morning, I saw one very cool thing in my RSS reader: TextGlow.

TextGlow is James Newton-King's newest exciting project which basically combines Silverlight 2.0 and OpenXML into a fancy web-based Word 2007 document viewer. Think about combining this with my own Word 2007 document preview handler...

TextGlow Silverlight OpenXML

kick it on DotNetKicks.com

Indexing Word 2007 (docx) files with Zend_Search_Lucene

You may have noticed Microsoft released their Search Server 2008 a few weeks ago. Search Server delivers search capabilities to your organization quickly and easily. The PHP world currently does not have a full-featured solution like Search Server, but there's a building block which could be used to create something similar: Zend Framework's PHP port of Lucene. There is also a .NET port of Lucene available.

Lucene basically is an indexing and search technology, providing an easy-to-use API to create any type of application that has to do with indexing and searching. If you provide the right methods to extract data from any type of document, Lucene can index it. There are various indexer examples available for different file formats (PDF, HTML, RSS, ...), but none for Word 2007 (docx) files. Sounds like a challenge!

Source code

Want the full code? Download it here.

Prerequisites

Make sure you use PHP version 5.2, have php_zip and php_xml enabled, and have a working Zend Framework installation on your computer. Another useful thing is to have the Lucene manual pages aside along the way.

1. Creating an index

Creating an index Let's start with creating a Zend_Search_Lucene index. We will be needing the Zend Framework classes, so let's start with including them:

[code:c#] 

/** Zend_Search_Lucene */ 
require_once 'Zend/Search/Lucene.php';

[/code]

We will also be needing an index database. The following code snippets checks for an existing database first (in ./lucene_index/). If it exists, the snippets loads the index database, otherwise a new index database is created.

[code:c#]

// Index
$index = null;

// Verify if the index exists. If not, create it.
if (is_dir('./lucene_index/') == 1) {
    $index = Zend_Search_Lucene::open('./lucene_index/'); 
} else {
    $index = Zend_Search_Lucene::create('./lucene_index/');
}

[/code]

Now since the document root we are indexing might have different contents on every indexer run, let's first remove all documents from the existig index. Here's how:

[code:c#]

// Remove old indexed files
for ($i = 0; $i < $index->maxDoc(); $i++) {
    $index->delete($i);
}

[/code]

We'll create an index entry for the file test.docx. We will be adding some fields to the index, like the url where the original document can be found, the text of the document (which will be tokenized, indexed, but not completely stored as the index might grow too big fast!).

[code:c#]

// File to index
$fileToIndex = './test.docx';

// Index file
echo 'Indexing ' . $fileToIndex . '...' . "\r\n";

// Create new indexed document
$luceneDocument = new Zend_Search_Lucene_Document();

// Store filename in index
$luceneDocument->addField(Zend_Search_Lucene_Field::Text('url', $fileToIndex));

// Store contents in index
$luceneDocument->addField(Zend_Search_Lucene_Field::UnStored('contents', DocXIndexer::readDocXContents($fileToIndex)));

// Store document properties
$documentProperties = DocXIndexer::readCoreProperties($fileToIndex);
foreach ($documentProperties as $key => $value) {
    $luceneDocument->addField(Zend_Search_Lucene_Field::UnIndexed($key, $value));
}

// Store document in index
$index->addDocument($luceneDocument);

[/code]

After creating the index, there's one thing left: optimizing it. Zend_Search_Lucene offers a nice method to doing that in one line of code: $index->optimize(); Since shutdown of the index instance is done automatically, the $index->commit(); command is not necessary, but it's good to have it present so you know what happens at the end of the indexing process.

There, that's it! Our index (of one file...) is now ready! I must admit I did not explain all the magic... One piece of the magic is the DocXIndexer class whose method readDocXContents() is used to retrieve all text from a Word 2007 file. Here's how this method is built.

2. Retrieving the full text from a Word 2007 file

The readDocXContents() method mentioned earlier is the actual "magic" in this whole process. It basically reads in a Word 2007 (docx) file, loops trough all paragraphs and extracts all text runs from these paragraphs into one string.

A Word 2007 (docx) is a ZIP-file (container), which contains a lot of XML files. Some of these files describe a document and some of them describe the relationships between these files. Every XML file is validated against an XSD schema, which we'll define first:

[code:c#]

// Schemas
$relationshipSchema    = 'http://schemas.openxmlformats.org/package/2006/relationships';
$officeDocumentSchema     = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument';
$wordprocessingMLSchema = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main';

[/code]

The $relationshipSchema is the schema name that describes a relationship between the OpenXML package (the ZIP-file) and the containing XML file ("part"). The $officeDocumentSchema is the main document part describing it is a Microsoft Office document. The $wordprocessingMLSchema is the schema containing all Word-specific elements, such as paragrahps, runs, printer settings, ... But let's continue coding. I'll put the entire code snippet here and explain every part later:

[code:c#]

// Returnvalue
$returnValue = array();

// Documentholders
$relations         = null;

// Open file
$package = new ZipArchive(); // Make sure php_zip is enabled!
$package->open($fileName);

// Read relations and search for officeDocument
$relations = simplexml_load_string($package->getFromName("_rels/.rels"));
foreach ($relations->Relationship as $rel) {
    if ($rel["Type"] == $officeDocumentSchema) {
        // Found office document! Now read in contents...
        $contents = simplexml_load_string(
            $package->getFromName(dirname($rel["Target"]) . "/" . basename($rel["Target"]))
        );

        $contents->registerXPathNamespace("w", $wordprocessingMLSchema);
        $paragraphs = $contents->xpath('//w:body/w:p');

        foreach($paragraphs as $paragraph) {
            $runs = $paragraph->xpath('//w:r/w:t');
            foreach ($runs as $run) {
                $returnValue[] = (string)$run;
            }
        }

        break;
    }
}

// Close file
$package->close();

// Return
return implode(' ', $returnValue);

[/code]

The first thing that is loaded, is the main ".rels" document, which contains a reference to all parts in the root of this OpenXML package. This file is parsed using SimpleXML into a local variable $relations. Each relationship has a type ($rel["Type"]), which we compare against the $officeDocumentSchema schema name. When that schema name is found, we dig deeper into the document, parsing it into $contents. Next on our todo list: register the $wordprocessingMLSchema for running an XPath query on the document.

[code:c#]

$contents->registerXPathNamespace("w", $wordprocessingMLSchema);

[/code]

We can now easily run an XPath query "//w:body/w:p", which retrieves all w:p childs (paragraphs) of the document's body:

[code:c#]

$paragraphs = $contents->xpath('//w:body/w:p');

[/code]

The rest is quite easy. In each paragraph, we run a new XPath query "//w:r/w:t", which delivers all text nodes withing the paragraph. Each of these text nodes is then added to the $returnValue, which will represent all text content in the main document part upon completion.

[code:c#]

foreach($paragraphs as $paragraph) {
    $runs = $paragraph->xpath('//w:r/w:t');
    foreach ($runs as $run) {
        $returnValue[] = (string)$run;
    }
}

[/code]

3. Searching the index

Searching the index Searching the index starts the same way as creating the index: you first have to load the database. After loading the index database, you can easily run a query on it. Let's search for the keywords "Code Access Security":

[code:c#]

// Search query
$searchFor = 'Code Access Security';

// Search in index
echo sprintf('Searching for: %s', $searchFor) . "\r\n";
$hits = $index->find( $searchFor );

echo sprintf('Found %s result(s).', count($hits)) . "\r\n";
echo '--------------------------------------------------------' . "\r\n";

foreach ($hits as $hit) {
    echo sprintf('Score: %s', $hit->score) . "\r\n";
    echo sprintf('Title: %s', $hit->title) . "\r\n";
    echo sprintf('Creator: %s', $hit->creator) . "\r\n";
    echo sprintf('File: %s', $hit->url) . "\r\n";
    echo '--------------------------------------------------------' . "\r\n";
}

[/code]

There you go! That's all there is to it. Want the full code? Download it here: LuceneIndexingDOCX.zip (96.03 kb)

Preview Word files (docx) in HTML using ASP.NET, OpenXML and LINQ to XML

Since an image (or even an example) tells more than any text will ever do, here's what I've created in the past few evening hours:

image

Live examples:

Want the source code? Download it here: WordVisualizer.zip (357.01 kb)

Want to know how?

If you want to know how I did this, let me first tell you why I created this. After searching Google for something similar, I found a Sharepoint blogger who did the same using a Sharepoint XSL transformation document called DocX2Html.xsl. Great, but this document can not be distributed without a Sharepoint license. The only option for me was to do something similar myself.

ASP.NET handlers

The main idea of this project was to be able to type in a URL ending in ".docx", which would then render a preview of the underlying Word document. Luckily, ASP.NET provides a system of creating HttpHandlers. A HttpHandler is the class instance which is called by the .NET runtime to process an incoming request for a specific extension. So let's trick ASP.NET into believing ".docx" is an extension which should be handled by a custom class...

Creating a custom handler

A custom handler can be created quite easily. Just create a new class, and make it implement the IHttpHandler interface:

[code:c#]

/// <summary>
/// Word document HTTP handler
/// </summary>

public class WordDocumentHandler : IHttpHandler
{
    #region IHttpHandler Members

    /// <summary>
    /// Is the handler reusable?
    /// </summary>

    public bool IsReusable
    {
        get { return true; }
    }

    /// <summary>
    /// Process request
    /// </summary>
    /// <param name="context">Current http context</param>

     public void ProcessRequest(HttpContext context)
    {
        // Todo...

        context.Response.Write("Hello world!");
    }

    #endregion
}

[/code]

Registering a custom handler

For ASP.NET to recognise our newly created handler, we must register it in Web.config:

image

Now if you are using IIS6, you should also register this extension to be handled by the .NET runtime:

image

In the application configuration, add the extension ".docx" and make it point to the following executable: C:\WINDOWS\Microsoft.NET\Framework\v2.0.50727\aspnet_isapi.dll

This should be it. Fire up your browser, browse to your web site and type anything.docx. You should see "Hello world!" appearing in a nice, white page.

OpenXML

As you may already know, Word 2007 files are OpenXML packages containg WordprocessingML markup. A .docx file can be opened using the System.IO.Packaging.Package class (which is available after adding a project reference to WindowsBase.dll).

The Package class is created for accessing any OpenXML package. This includes all Office 2007 file formats, but also custom OpenXML formats which you can implement for yourself. Unfortunately, if you want to use Package to access an Office 2007 file, you'll have to implement a lot of utility functions to get the right parts from the OpenXML container.

Luckily, Microsoft released an OpenXML SDK (CTP), which I also used in order to create this Word preview handler.

LINQ to XML

As you know, the latest .NET 3.5 release brought us something new & extremely handy: LINQ (Language Integrated Query). On Doug's blog, I read about Eric White's attempts to use LINQ to XML on OpenXML.

LINQ to OpenXML

For implementing my handler, I basically used similar code to Eric's to run query's on a Word document's contents. Here's an example which fetches all paragraphs in a Word document:

[code:c#]

using (WordprocessingDocument document = WordprocessingDocument.Open("test.docx", false))
{
    // Register namespace

    XNamespace w = "http://schemas.openxmlformats.org/wordprocessingml/2006/main";

    // Element shortcuts

    XName w_r = w + "r";
    XName w_ins = w + "ins";
    XName w_hyperlink = w + "hyperlink";

    // Load document's MainDocumentPart (document.xml) in XDocument

    XDocument xDoc = XDocument.Load(
        XmlReader.Create(
            new StreamReader(document.MainDocumentPart.GetStream())
        )
    );

    // Fetch paragraphs

    var paragraphs = from l_paragraph in xDoc
                    .Root
                    .Element(w + "body")
                    .Descendants(w + "p")
         select new
         {
             TextRuns = l_paragraph.Elements().Where(z => z.Name == w_r || z.Name == w_ins || z.Name == w_hyperlink)
         };

    // Write paragraphs

    foreach (var paragraph in paragraphs)
    {
        // Fetch runs

        var runs = from l_run in paragraph.Runs
                   select new
                   {
                       Text = l_run.Descendants(w + "t").StringConcatenate(element => (string)element)
                   };

        // Write runs

        foreach (var run in runs)
        {
            // Use run.Text to fetch a text string

            Console.Write(run.Text);
        }
    }
}

[/code]

Now if you run this code, you will notice a compilation error... This is due to the fact that I used an extension method StringConcatenate.

Extension methods

In the above example, I used an extension method named StringConcatenate. An extension method is, as the name implies, an "extension" to a known class. In the following example, find the extension for all IEnumerable<T> instances:

[code:c#]

public static class IEnumerableExtensions
{
    /// <summary>
    /// Concatenate strings
    /// </summary>
    /// <typeparam name="T">Type</typeparam>
    /// <param name="source">Source</param>
    /// <param name="func">Function delegate</param>
    /// <returns>Concatenated string</returns>

    public static string StringConcatenate<T>(this IEnumerable<T> source, Func<T, string> func)
    {
        StringBuilder sb = new StringBuilder();
        foreach (T item in source)
            sb.Append(func(item));
        return sb.ToString();
    }
}

[/code]

Lambda expressions

Another thing you may have noticed in my example code, is a lambda expression:

[code:c#]

z => z.Name == w_r || z.Name == w_ins || z.Name == w_hyperlink.

[/code]

A lambda expression is actually an anonymous method, which is called by the StringConcatenate extension method. Lambda expressions always accept a parameter, and return true/false. In this case, z is instantiated as an XNode, returning true/false depending on its Name property.

Wrapping things up...

If you read this whole blog post, you may have noticed that I extensively used C# 3.5's new language features. I combined these with OpenXML and ASP.NET to create a useful Word document preview handler. If you want the full source code, download it here: WordVisualizer.zip (357.01 kb).

kick it on DotNetKicks.com

PHP and OpenXML - New API project

It's been a while since I posted something related to PHP and OpenXML, but I fixed that glitch in this blog post.

Browsing CodePlex, I found a new PHP OpenXML project named OpenXML API. From what I can see, this project is just in the "Hello world" stage of creating Word documents in PHP, but not so long ago my PHPExcel project was in that stage too...

While I am talking about PHPExcel... Since today, the Subversion repository I host myself is synchronized 2 times a week with CodePlex's Team Foundation Server. If you want the latest source snapshot, please check PHPExcel's source code tab on CodePlex.

OpenXML in Healthcare in PHP

Here's a cool present just before the weekend... 2 days ago, Wouter posted on his blog about an article he co-operated on for MSDN: OpenXML in Healthcare.

Being both a Microsoft and PHP fan (yes, you can curse me, I don't care), I thought of porting (part of) the sample code from his article into PHP. Except for the document signing, as I did not have many time to write this sample code...

The scenario for the article is quite simple: Contoso provides a central medical records database. Whenever a physician has to register a new patient, he downloads a Word 2007 document from the Contoso server, fills it out, and uploads it back. Contoso then strips out the necessary data and saves it back in their systems.

 

This Word 2007 document is crafted around embedded custom XML data, which is displayed and edited using Word 2007. In short: to do the above exercise, you just need to strip out the custom XML and you're done.

Stripping out the custom XML is also quite easy. First, locate the main relationships part in the OpenXML package. Then, search it for the main document part. From there, loop over the relationships to this document part and look for any relationship of the type "http://schemas.openxmlformats.org/officeDocument/2006/relationships/customXml". When that one's found, you just need to parse the referenced document and you're done!

Want to see a demo? Check this out.
Want the sample code? 2007HealthCareSamplePHP.zip (49.76 kb)
 Want the OpenXML background? Read the original article.

Creating Office2007 documents in C#

I've been too busy implementing SpreadsheetML in PHP, that I completely forgot to write something on OpenXML and C# on my blog. Luckily, Joris did now: Generating Office 2007 documents in C#. Now learn some PHP too, Joris, I can use a hand on PHPExcel [:P]

Here are some additional links to get you started:

PackageExplorer, not only great for OpenXML...

The last few days, I've been working with Wouter to discuss some new features and ideas for his PackageExplorer. PE is an editor for OpenXML packages, enabling you to view the contents and relations of different parts in a package, to validate XML against OpenXML schemes, ...

One of the cool things we discussed was a HTML start page. Since Visual Studio has one, we decided PE should have one too. After creating a simple mock for this, Wouter was first to Google for a way to embed images and HTML in a resource file, and display it in a fancy start page (second price, for slow Google searchers: I designed the page [<:o)]). You can see the code and build steps required over at Wouter's.

Next cool thing: a new splash screen model. Always wanted to do things right with threads? No more white gaps in your screen due to a blocked UI thread? I suggest you read the eventing and threading explanation at (again) Wouter's blog.

Next suggestion: download PE, even if you don't know a thing about OpenXML. Want to learn how to use OO patterns? Always wanted to provide add-in functionality for your own application? PE's code gives a lot of answers on those questions!

Open XML Developer workshop content

Over the last few months, Doug, Wouter, Brian and lots of other people delivered a series of OpenXML workshops around the globe. Those who were there, and those who weren't, can now download all content of these workshops from OpenXMLdeveloper.org.

There are some interesting slideshows available for those of you who never heared about OpenXML before, as well as some nice code samples to get you started developing in C# or Java.

As a side note: I also spotted a good article on OpenXML, explaining the Open Packaging Conventions with a self-created file format (based on OpenXML).

OpenXML news overview

A lot of news around OpenXML these days, so I decided to bundle some things into one big blog post.

1. Microsoft released a Microsoft SDK for Open XML Formats

In .NET 3.0, there's the System.IO.Packaging API, which allows programmatic access to OpenXML packages (amongst them Office2007 files).
Since this API is quite low-level, the Microsoft people introduced a new SDK built on top of System.IO.Packaging, which allows you to use strongly typed classes for document parts. Checkout a code sample on Wouter's blog and see for yourself: this SDK provides access to an OpenXML package in a much easier way than System.IO.Packaging. Download the SDK here.

2. PackageExplorer 3.0 beta

Wouter released a new (beta) version of his PackageExplorer, and I assume he uses the new SDK mentioned above. Main new feature seems to be adding document parts using a template system, allowing you and I to create an OpenXML package using a set pre-defined templates. You can download PackageExplorer on CodePlex.

3. Altova XML Spy supports OpenXML

I saw this on Altova's website: "XMLSpy provides powerful support for accessing, editing, transforming, and querying XML data saved in Microsoft® Office 2007 documents and other zipped files."
Says enough, I think. You can download a free trial to check if this all is true.

4. Trying to compile PHPExcel using Phalanger

The last few days, I've been trying to compile PHPExcel to a .NET class library using Phalanger. Phalanger is a PHP compiler for .NET.
Compiling works quite well, but not all class definitions are compiled into a usable .NET alternative... Creating and saving a Spreadshete currently works, but adding data into cells doesn't. I guess thats a feature that can not be missing :-)
I'll keep you informed on the progress of this. If anyone feels interested in porting this PHP library to C#, please contact me!

5. OpenXML for JAVA

JAVA people now also have an OpenXML library: OpenXML4J. Not production-stable yet, but alpha versions are available.