Maarten Balliauw {blog}

ASP.NET MVC, Microsoft Azure, PHP, web development ...

NAVIGATION - SEARCH

Data Driven Testing in Visual Studio 2008 - Part 2

This is the second post in my series on Data Driven Testing in Visual Studio 2008. The first post focusses on Data Driven Testing in regular Unit Tests. This part will focus on the same in web testing.

Web Testing

I assume you have read my previous post and saw the cool user interface I created. Let's first add some code to that, focussing on the TextBox_TextChanged event handler that is linked to TextBox1 and TextBox2.

[code:c#]

public partial class _Default : System.Web.UI.Page
{
    // ... other code ...

    protected void TextBox_TextChanged(object sender, EventArgs e)
    {
        if (!string.IsNullOrEmpty(TextBox1.Text.Trim()) && !string.IsNullOrEmpty(TextBox2.Text.Trim()))
        {
            int a;
            int b;
            int.TryParse(TextBox1.Text.Trim(), out a);
            int.TryParse(TextBox2.Text.Trim(), out b);

            Calculator calc = new Calculator();
            TextBox3.Text = calc.Add(a, b).ToString();
        }
        else
        {
            TextBox3.Text = "";
        }
    }
}

[/code]

It is now easy to run this in a browser and play with it. You'll notice 1 + 1 equals 2, otherwise you copy-pasted the wrong code. You can now create a web test for this. Right-click the test project, "Add", "Web Test...". If everything works well your browser is now started with a giant toolbar named "Web Test Recorder" on the left. This toolbar will record a macro of what you are doing, so let's simply navigate to the web application we created, enter some numbers and whatch the calculation engine do the rest:

Web Test Recorder

You'll notice an entry on the left for each request that is being fired. When the result is shown, click "Stop" and let Visual Studio determine what happened behind the curtains of your browser. An overview of this test recording session should now be available in Visual Studio.

Data Driven Web testing

There's our web test! But it's not data driven yet... First thing to do is linking the database we created in part 1 by clicking the "Add datasource  Add Datasource" button. Finish the wizard by selecting the database and the correct table. Afterwards, you can pick one of the Form Post Parameters and assign the value from our newly added datasource. Do this for each step in our test: the first step should fill TextBox1, the second should fill TextBox1 and TextBox2.

Bind Form Post Parameters

In the last recorded step of our web test, add a validation rule. We want to check whether our sum is calculated correct and is shown in TextBox3. Pick the following options in the "Add Validation Rule" screen. For the "Expected Value" property, enter the variable name which comes from our data source: {{DataSource1.CalculatorTestAdd.expected}}

image

If you now run the test, you should see success all over the place! But there's one last step to do though... Visual Studio 2008 will only run this test for the first data row, not for all other rows! To overcome this poblem, select "Run Test (Pause Before Starting" instead of just "Run Test". You'll notice the following hyperlink in the IDE interface:

Edit Run Settings

Click "Edit run Settings" and pick "One run per data source row". There you go! Multiple test runs are now validated ans should result in an almost green-bulleted screen:

image

kick it on DotNetKicks.com

Data Driven Testing in Visual Studio 2008 - Part 1

Last week, I blogged about code performance analysis in Visual Studio 2008. Since that topic provoked lots of comments (thank you Bart for associating "hotpaths" with "hotpants"), thought about doing another post on code quality in .NET.

This post will be the first of two on Data Driven Testing. This part will focus on Data Driven Testing in regular Unit Tests. The second part will focus on the same in web testing.

Data Driven Testing?

We all know unit testing. These small tests are always based on some values, which are passed throug a routine you want to test and then validated with a known result. But what if you want to run that same test for a couple of times, wih different data and different expected values each time?

Data Driven Testing comes in handy. Visual Studio 2008 offers the possibility to use a database with parameter values and expected values as the data source for a unit test. That way, you can run a unit test, for example, for all customers in a database and make sure each customer passes the unit test.

Sounds nice! Show me how!

You are here for the magic, I know. That's why I invented this nifty web application which looks like this:

Example application

This is a simple "Calculator" which provides a user interface that accepts 2 values, then passes these to a Calculator business object that calculates the sum of these two values. Here's the Calculator object:

[code:c#] 

public class Calculator
{
    public int Add(int a, int b)
    {
        return a + b;
    }
}

[/code]

Create Unit Tests...Now right-click the Add method, and select "Create Unit Tests...". Visual Studio will pop up a wizard. You can simply click "OK" and have your unit test code generated:

[code:c#]

/// <summary>
///A test for Add
///</summary>
[TestMethod()]
public void AddTest()
{
    Calculator target = new Calculator(); // TODO: Initialize to an appropriate value
    int a = 0; // TODO: Initialize to an appropriate value
    int b = 0; // TODO: Initialize to an appropriate value
    int expected = 0; // TODO: Initialize to an appropriate value
    int actual;
    actual = target.Add(a, b);
    Assert.AreEqual(expected, actual);
    Assert.Inconclusive("Verify the correctness of this test method.");
}

[/code]

As you see, in a normal situation we would now fix these TODO items and have a unit test ready in no time. For this data driven test, let's first add a database to our project. Create column a, b and expected. These do not have to represent names in the unit test, but it's always more clear. Also, add some data.

Data to test

Test View Great, but how will our unit test use these values while running? Simply click the test to be bound to data, add the data source and table name properties. Next, read your data from the TestContext.DataRow property. The unit test will now look like this:

[code:c#]

/// <summary>
///A test for Add
///</summary>
[DataSource("System.Data.SqlServerCe.3.5", "data source=|DataDirectory|\\Database1.sdf", "CalculatorTestAdd", DataAccessMethod.Sequential), DeploymentItem("TestProject1\\Database1.sdf"), TestMethod()]
public void AddTest()
{
    Calculator target = new Calculator();
    int a = (int)TestContext.DataRow["a"];
    int b = (int)TestContext.DataRow["b"];
    int expected = (int)TestContext.DataRow["expected"];
    int actual;
    actual = target.Add(a, b);
    Assert.AreEqual(expected, actual);
}

[/code]

Now run this newly created test. After the test run, you will see that the test is run a couple of times, one time for each data row in he database. You can also drill down further and check which values failed and which were succesful. If you do not want Visual Studio to use each data row sequential, you can also use the random accessor and really create a random data driven test.

Test results

Tomorrow, I'll try to do this with a web test and test our web interface. Stay tuned!

kick it on DotNetKicks.com

Code performance analysis in Visual Studio 2008

Visual Studio developer, did you know you have a great performance analysis (profiling) tool at your fingertips? In Visual Studio 2008 this profiling tool has been placed in a separate menu item to increase visibility and usage. Allow me to show what this tool can do for you in this walktrough.

An application with a smell…

Before we can get started, we need a (simple) application with a “smell”. Create a new Windows application, drag a TextBox on the surface, and add the following code:

[code:c#]

private void Form1_Load(object sender, EventArgs e)
{
    string s = "";
    for (int i = 0; i < 1500; i++)
    {
        s = s + " test";
    }
    textBox1.Text = s;
}

[/code]

You should immediately see the smell in the above code. If you don’t: we are using string.Concat() for 1.500 times! This means a new string is created 1.500 times, and the old, intermediate strings, have to be disposed again. Smells like a nice memory issue to investigate!

Profiling

Performance wizardThe profiling tool is hidden under the Analyze menu in Visual Studio. After launching the Performance Wizard, you will see two options are available: sampling and instrumentation. In a “real-life” situation, you’ll first want to sample the entire application searching for performance spikes. Afterwards, you can investigate these spikes using instrumentation. Since we only have one simple application, let’s instrumentate immediately.

Upon completing the wizard, the first thing we’ll do is changing some settings. Right-click the root node, and select Properties. Check the “Collect .NET object allocation information” and “Also collect .NET object lifetime information” to make our profiling session as complete as possible:

Profiling property pages

Launch with profilingYou can now start the performance session from the toolpane. Note that you have two options to start: Launch with profiling and Launch with profiling paused. The first will immediately start profiling, the latter will first start your application and wait for your sign to start profiling. This can be useful if you do not want to profile your application startup but only a certain event that is started afterwards.

After the application run, simply close it and wait for the summary report to appear:

Performance Report Summary 1

WOW! Seems like string.Concat() is taking 97% of the application’s memory! That’s a smell indeed... But where is it coming from? In a larger application, it might not be clear which method is calling string.Concat() this many times. To discover where the problem is situated, there are 2 options…

Discovering the smell – option 1

Option 1 in discovering the smell is quite straight-forward. Right-click the item in the summary and pick Show functions calling Concat:

Functions allocating most memory

You are now transferred to the “Caller / Callee” view, where all methods doing a string.Concat() call are shown including memory usage and allocations. In this particular case, it’s easy to see where the issue might be situated. You can now right-click the entry and pick View source to be transferred to this possible performance killer.

Possible performance killer

Discovering the smell – option 2

Visual Studio 2008 introduced a cool new way of discovering smells: hotpath tracking. When you move to the Call Tree view, you’ll notice a small flame icon in the toolbar. After clicking it, Visual Studio moves down the call tree following the high inclusive numbers. Each click takes you further down the tree and should uncover more details. Again, string.Concat() seems to be the problem!

Hotpath tracking

Fixing the smell

We are about to fix the smell. Let’s rewrite our application code using StringBuilder:

[code:c#]

private void Form1_Load(object sender, EventArgs e)
{
    StringBuilder sb = new StringBuilder();
    for (int i = 0; i < 1500; i++)
    {
        sb.Append(" test");
    }
    textBox1.Text = sb.ToString();
}

[/code]

In theory, this should perform better. Let’s run our performance session again and have a look at the results:

Performance Report Summary 2

Compare Peformance ReportsSeems like we fixed the glitch! You can now investigate further if there are other problems, but for this walktrough, the application is healthy now. One extra feature though: performance session comparison (“diff”). Simply pick two performance reports, right-click and pick Compare performance reports. This tool will show all delta values (= differences) between the two sessions we ran earlier:

Comparison report 

Update 2008-02-14: Some people commented on not finding the Analyze menu. This is only available in the Developer or Team Edition of Visual Studio. Click here for a full comparison of all versions.

Update 2008-05-29: Make sure to check my post on NDepend as well, as it offers even more insight in your code!

kick it on DotNetKicks.com

Indexing Word 2007 (docx) files with Zend_Search_Lucene

You may have noticed Microsoft released their Search Server 2008 a few weeks ago. Search Server delivers search capabilities to your organization quickly and easily. The PHP world currently does not have a full-featured solution like Search Server, but there's a building block which could be used to create something similar: Zend Framework's PHP port of Lucene. There is also a .NET port of Lucene available.

Lucene basically is an indexing and search technology, providing an easy-to-use API to create any type of application that has to do with indexing and searching. If you provide the right methods to extract data from any type of document, Lucene can index it. There are various indexer examples available for different file formats (PDF, HTML, RSS, ...), but none for Word 2007 (docx) files. Sounds like a challenge!

Source code

Want the full code? Download it here.

Prerequisites

Make sure you use PHP version 5.2, have php_zip and php_xml enabled, and have a working Zend Framework installation on your computer. Another useful thing is to have the Lucene manual pages aside along the way.

1. Creating an index

Creating an index Let's start with creating a Zend_Search_Lucene index. We will be needing the Zend Framework classes, so let's start with including them:

[code:c#] 

/** Zend_Search_Lucene */ 
require_once 'Zend/Search/Lucene.php';

[/code]

We will also be needing an index database. The following code snippets checks for an existing database first (in ./lucene_index/). If it exists, the snippets loads the index database, otherwise a new index database is created.

[code:c#]

// Index
$index = null;

// Verify if the index exists. If not, create it.
if (is_dir('./lucene_index/') == 1) {
    $index = Zend_Search_Lucene::open('./lucene_index/'); 
} else {
    $index = Zend_Search_Lucene::create('./lucene_index/');
}

[/code]

Now since the document root we are indexing might have different contents on every indexer run, let's first remove all documents from the existig index. Here's how:

[code:c#]

// Remove old indexed files
for ($i = 0; $i < $index->maxDoc(); $i++) {
    $index->delete($i);
}

[/code]

We'll create an index entry for the file test.docx. We will be adding some fields to the index, like the url where the original document can be found, the text of the document (which will be tokenized, indexed, but not completely stored as the index might grow too big fast!).

[code:c#]

// File to index
$fileToIndex = './test.docx';

// Index file
echo 'Indexing ' . $fileToIndex . '...' . "\r\n";

// Create new indexed document
$luceneDocument = new Zend_Search_Lucene_Document();

// Store filename in index
$luceneDocument->addField(Zend_Search_Lucene_Field::Text('url', $fileToIndex));

// Store contents in index
$luceneDocument->addField(Zend_Search_Lucene_Field::UnStored('contents', DocXIndexer::readDocXContents($fileToIndex)));

// Store document properties
$documentProperties = DocXIndexer::readCoreProperties($fileToIndex);
foreach ($documentProperties as $key => $value) {
    $luceneDocument->addField(Zend_Search_Lucene_Field::UnIndexed($key, $value));
}

// Store document in index
$index->addDocument($luceneDocument);

[/code]

After creating the index, there's one thing left: optimizing it. Zend_Search_Lucene offers a nice method to doing that in one line of code: $index->optimize(); Since shutdown of the index instance is done automatically, the $index->commit(); command is not necessary, but it's good to have it present so you know what happens at the end of the indexing process.

There, that's it! Our index (of one file...) is now ready! I must admit I did not explain all the magic... One piece of the magic is the DocXIndexer class whose method readDocXContents() is used to retrieve all text from a Word 2007 file. Here's how this method is built.

2. Retrieving the full text from a Word 2007 file

The readDocXContents() method mentioned earlier is the actual "magic" in this whole process. It basically reads in a Word 2007 (docx) file, loops trough all paragraphs and extracts all text runs from these paragraphs into one string.

A Word 2007 (docx) is a ZIP-file (container), which contains a lot of XML files. Some of these files describe a document and some of them describe the relationships between these files. Every XML file is validated against an XSD schema, which we'll define first:

[code:c#]

// Schemas
$relationshipSchema    = 'http://schemas.openxmlformats.org/package/2006/relationships';
$officeDocumentSchema     = 'http://schemas.openxmlformats.org/officeDocument/2006/relationships/officeDocument';
$wordprocessingMLSchema = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main';

[/code]

The $relationshipSchema is the schema name that describes a relationship between the OpenXML package (the ZIP-file) and the containing XML file ("part"). The $officeDocumentSchema is the main document part describing it is a Microsoft Office document. The $wordprocessingMLSchema is the schema containing all Word-specific elements, such as paragrahps, runs, printer settings, ... But let's continue coding. I'll put the entire code snippet here and explain every part later:

[code:c#]

// Returnvalue
$returnValue = array();

// Documentholders
$relations         = null;

// Open file
$package = new ZipArchive(); // Make sure php_zip is enabled!
$package->open($fileName);

// Read relations and search for officeDocument
$relations = simplexml_load_string($package->getFromName("_rels/.rels"));
foreach ($relations->Relationship as $rel) {
    if ($rel["Type"] == $officeDocumentSchema) {
        // Found office document! Now read in contents...
        $contents = simplexml_load_string(
            $package->getFromName(dirname($rel["Target"]) . "/" . basename($rel["Target"]))
        );

        $contents->registerXPathNamespace("w", $wordprocessingMLSchema);
        $paragraphs = $contents->xpath('//w:body/w:p');

        foreach($paragraphs as $paragraph) {
            $runs = $paragraph->xpath('//w:r/w:t');
            foreach ($runs as $run) {
                $returnValue[] = (string)$run;
            }
        }

        break;
    }
}

// Close file
$package->close();

// Return
return implode(' ', $returnValue);

[/code]

The first thing that is loaded, is the main ".rels" document, which contains a reference to all parts in the root of this OpenXML package. This file is parsed using SimpleXML into a local variable $relations. Each relationship has a type ($rel["Type"]), which we compare against the $officeDocumentSchema schema name. When that schema name is found, we dig deeper into the document, parsing it into $contents. Next on our todo list: register the $wordprocessingMLSchema for running an XPath query on the document.

[code:c#]

$contents->registerXPathNamespace("w", $wordprocessingMLSchema);

[/code]

We can now easily run an XPath query "//w:body/w:p", which retrieves all w:p childs (paragraphs) of the document's body:

[code:c#]

$paragraphs = $contents->xpath('//w:body/w:p');

[/code]

The rest is quite easy. In each paragraph, we run a new XPath query "//w:r/w:t", which delivers all text nodes withing the paragraph. Each of these text nodes is then added to the $returnValue, which will represent all text content in the main document part upon completion.

[code:c#]

foreach($paragraphs as $paragraph) {
    $runs = $paragraph->xpath('//w:r/w:t');
    foreach ($runs as $run) {
        $returnValue[] = (string)$run;
    }
}

[/code]

3. Searching the index

Searching the index Searching the index starts the same way as creating the index: you first have to load the database. After loading the index database, you can easily run a query on it. Let's search for the keywords "Code Access Security":

[code:c#]

// Search query
$searchFor = 'Code Access Security';

// Search in index
echo sprintf('Searching for: %s', $searchFor) . "\r\n";
$hits = $index->find( $searchFor );

echo sprintf('Found %s result(s).', count($hits)) . "\r\n";
echo '--------------------------------------------------------' . "\r\n";

foreach ($hits as $hit) {
    echo sprintf('Score: %s', $hit->score) . "\r\n";
    echo sprintf('Title: %s', $hit->title) . "\r\n";
    echo sprintf('Creator: %s', $hit->creator) . "\r\n";
    echo sprintf('File: %s', $hit->url) . "\r\n";
    echo '--------------------------------------------------------' . "\r\n";
}

[/code]

There you go! That's all there is to it. Want the full code? Download it here: LuceneIndexingDOCX.zip (96.03 kb)