Understanding How to Use XSL Transforms

2. January 2003 15:11 by Chris in dev  //  Tags:   //   Comments (0)

Note that this article was first published on 02/01/2003. The original article is available on DotNetJohn, where the code is also available for download and execution.

Original abstract: XSLT, XPATH and how to apply the concepts in .NET. Examines the concept of transformation and how an XSLT stylesheet defines a transformation by describing the relationship between an input tree and an output tree. Continues to look at the structure of a stylesheet, its main sub-components and introduces examples of what you might expect to see therein. Finally, the article examines how to utilise XSLT stylesheets in .NET.

Knowledge assumed: reasonable understanding of XML and ASP.NET / VB.NET.

Introduction

XML represents a widely accepted mechanism for representing data in a platform-neutral manner. XSLT is the XML based language that has been designed to allow transformation of XML into other structures and formats, such as HTML or other XML documents. XSLT is a template-based language that works in collaboration with the XPath language to transform XML documents.

Note that not all applications are suited to such an approach though there are benefits to be derived in all but the simplest problem domains. Suitable applications for implementation with XML/ XSLT are

  • those that require different views of the same data – hence delivering economies of scale to the developer/ organisation.
  • those where maintaining the distinction between data and User Interface elements (UI) is an important consideration – for example for facilitating productivity through specialisation within a development team.

 

.NET provides an XSLT processor which can take as input XML and XSLT documents and, via matching nodes with specified output templates, produce an output document with the desired structure and content.

I’ll examine the processor and the supporting classes as far as XSLT within .NET is concerned in the latter half of this article. First, XSLT:

XSLT

I’m only going to be able to scratch the surface of the XSLT language here but I shall attempt to highlight a few of the key concepts. It is important to remember that XSLT is a language in its own right and, further, it is one in transition only having been around for a few years now. It’s also a little different in mechanism to most you may have previously come across. XSLT is basically a declarative pattern matching language, and as such requires a different mindset and a little getting used to. It’s (very!) vaguely like SQL or Prolog in this regard. Saying that, there are ways to ‘hook in’ more conventional procedural code.

If its not too late, now would be a good time to get round to stating what the acronym XSLT stands for: eXtensible Stylesheet Language: Transformations. XSLT grew from a bigger language called XSL – as it developed the decision was made to split XSL into areas corresponding to XSLT for defining the structural transformations, and ‘the rest’ which is the formatting process of rendering the output. This may commonly be as pixels on a screen, for example, but could also be several other alternatives. ‘The rest’ is still officially called XSL, though has also come to be known as XSL-FO (XSL Formatting objects). That’s the last time we’ll mention XSL-FO.

As XSLT developed it became apparent that there was overlap between the expression syntax in XSLT for selecting parts of a document (XPath), and the XPointer language being developed for linking one document to another. The sensible decision was made to define a single language to undertake both purposes. XPath acts as a sub-language within an XSLT stylesheet. An XPath expression may be used for a variety of functions but typically it is employed to identify parts of the input XML document for subsequent processing. I’ll make no significant further effort in the following discourse to emphasise the somewhat academic distinction between XPath and XSLT, the former being such an important, and integral, component of the latter.

A typical XSLT stylesheet consists of a sequence of template rules, defining how elements should be processed when encountered in the XML input file. In keeping with the declarative nature of the XSLT language, you specify what outputs should be produced by particular input patterns, as distinct from a procedural model where you define the sequence of tasks to be performed.

A tree model similar to the XML DOM is employed by both XSLT and XPath. The different types existing in an XML document can be represented by different types of node in a tree view. In XPath the root node is not an element, the root is the parent of the outermost element, representing the document as a whole. The XSLT tree model can represent every well-formed XML document, as well as documents that are not well formed according to the W3C.

An XPath tree is made up 7 types of node largely corresponding to elements in the XML source document: root, element, text, attribute, comment, processing instruction and namespace. Each node has metadata created from the source document, in accordance with the type of node under consideration. Considering the node type in a little more detail:

As already mentioned the root node is a singular node that should not be confused with the document element – an outermost element that contains all elements in a valid XML document.

Element and attribute refer to your XML entities, e.g.

 <product id=’1type=’book>XSLT for Beginners</product> 

product is an element and id and type are attributes.

Comments nodes represent comments in the XML source written between <!-- and -->. Similarly processing instructions are represented in thw XML source between <? and ?> tags. Note, however, that the XML commonly found as the first element of the XML document is only impersonating a processing instruction – it is not represented as a node in the tree.

A text node is a sequence of characters in the PCDATA (‘parsed character’ data) part of an element.

The XML source tree is converted to a result tree via transformation by the XSLT processor using the instructions of the XSL stylesheet. Time for an example or two:

Most stylesheets contain a number of template rules of the form:

 <xsl:template match="/">
   <xsl:message>Started!</xsl:message>
   <html>
     . . . do other stuff . . .
   </html>
 </xsl:template> 

where the . . . do other stuff . . . might contain further template bodies to undertake further processing, e.g.

 <xsl:template match="/">
   <xsl:message>Started!</xsl:message>
   <html>
     <head></head>
     <body>
       <xsl:apply-templates>
     </body>
   </html>
 </xsl:template> 

As previously stated, both the input document and output document are represented by a tree structure. So, the <body> element above is a literal element that is simply copied over from the stylesheet to the result tree.

<xsl:apply-templates/> means select all the children of the current node in the source tree, finding the matching template rule for each one in the stylesheet and apply it. The results depend on the content of both the stylesheet and the XML document under consideration. Actually, if there is no template for the root node, the built in template is invoked which processes all the children of the root node.

Thus, the simplest way to process an XML document is to write a template rule for each kind of node that might be encountered, or at least that we are interested in and want to process. This is an example of ‘push’ processing and can be considered to be similar logically to Cascading StyleSheets (CSS) where one document defines the structure (HTML/ XML), and the second (the stylesheet) defines the appearance within this structure. The output is conditional on the structure of the XML document.

Push processing works well when the output is to have the same structure and sequence of data as the input, and the input data is predictable.

Listing 1: simple XML file: books.xml

 <?xml version="1.0"?>
 <Library>
   <Book>
     <Title>XSLT Programmers Reference</Title>
     <Publisher>Wrox</Publisher>
     <Edition>2</Edition>
     <Authors>
       <Author>Kay, Michael</Author>
     </Authors>
     <PublishedDate>April 2001</PublishedDate>
     <ISBN>1-816005-06-7</ISBN>
   </Book>
   <Book>
     <Title>Dynamical systems and fractals</Title>
     <Publisher>Cambridge University Press</Publisher>
     <Authors>
       <Author>Becker, Karl-Heinz</Author>
       <Author>Dorfler, Michael</Author>
       <Author>David Sussman</Author>
     </Authors>
     <PublishedDate>1989</PublishedDate>
     <ISBN>0-521-36910-X</ISBN>
   </Book>
 </Library> 

Listing 2: Example of push processing of books.xml: example1.xslt

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
 <xsl:template match="Library">
   <html>
   <head></head>
   <body>
   <h1>Library</h1>
   <table border="1">
   <tr>
   <td><b>Title</b></td>
   <td><b>PublishedDate</b></td>
   <td><b>Publisher</b></td>
   </tr>
   <xsl:apply-templates/>
   </table>
   </body>
   </html>
 </xsl:template>
 
 <xsl:template match="Book">
   <tr>
     <xsl:apply-templates select="Title"/>
     <xsl:apply-templates select="PublishedDate"/>
     <xsl:apply-templates select="Publisher"/>
   </tr>
 </xsl:template>
 
 <xsl:template match="Title | PublishedDate | Publisher ">
   <td><xsl:value-of select="."/></td>
 </xsl:template>
 
 </xsl:stylesheet>

Note that in the book template I’ve used <xsl:apply-templates select=" … rather than just <xsl:apply-templates … This is because there is data in the source XML in which we are not interested and if we just let the built in template rules do their stuff the additional data would be copied across to the output tree. I’ve already mentioned the existence of built in template rules: when apply-templates is invoked to process a node and there is no matching template rule in the stylesheet a built in template rule is used, according to the type of the node. For example, for elements apply-templates is called on child nodes and for text and element nodes the text is copied over to the result tree. Try making the modification and viewing the results.

Using the select attribute of apply-templates is one solution - being more careful about which nodes to process (rather than just saying ‘process all children of the current node). Another is to be more precise about how to process them (rather than just saying ‘choose the best-fit template rule). This is termed ‘pull’ processing and is achieved using the value-of command:

<xsl:value-of select=”price” />

In this alternative pull model the stylesheet provides the structure and the document acts wholly as a data source. Thus a ‘pull’ version of the above example would be:

Listing 3: Example of push processing of books.xml: example2.xslt

 <?xml version="1.0"?>
 <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
 
 <xsl:template match="/">
   <xsl:apply-templates/>
 </xsl:template>
 
 <xsl:template match="Library">
   <html>
   <head></head>
   <body>
   <h1>Library</h1>
   <table border="1">
   <tr>
   <td><b>Title</b></td>
   <td><b>PublishedDate</b></td>
   </tr>
     <xsl:apply-templates/>
   </table>
   </body>
   </html>
 </xsl:template>
 
 <xsl:template match="Book">
   <tr>
   <td><xsl:value-of select="Title"/></td>
   <td><xsl:value-of select="PublishedDate"/></td>
   </tr>
 </xsl:template>
 
 </xsl:stylesheet> 

These two examples are not hugely different but it is quite important you understand the small but important differences for future situations when you encounter more complex source documents and stylesheets. You can rely on the structure of the XML source document using template matching (push) or explicitly select elements, pulling them into the output document.

Other commands/ points worthy of note at this juncture (there are hundreds more for you to explore) are:

<xsl: for-each> which as you might guess, performs explicit processing of each of the specified nodes in turn.

<xsl: call-templates> invokes a specific template by name, rather than relying on pattern matching.

<xsl: apply-templates> can also take a mode attribute which allows you to make multiple passes through the XML data representation.

I’ve briefly introduced the basic XSLT concepts and, in particular, the push and pull models. The pull model is characterised by a few large templates and use of the <xsl:value-of> element so that the stylesheet controls the order of items in the output. In comparison the push model tends more towards smaller templates with the output largely following the structure of the XML source document.

I mentioned earlier that XSLT is often thought of as a declarative language. However, it also contains the flow control and looping instructions consistent with a procedural language. Typically, a push model stylesheet emphasizes the declarative aspects of the language, while the pull model emphasizes the procedural aspects.

Note the use of the word ‘typical’ - most stylesheets will contain elements of both push and pull models. However, it is useful to keep the two models in mind as it can make your stylesheet development simpler.

There you have it – we’ve scratched the surface of the XSLT and XPath languages and I’ll leave you to explore further. Both Wrox and O’Reilly have several books on the subject that have been well reviewed … take your pick if you want to delve deeper. Let me know if you’d like me to write another article on XSLT, building on what I’ve introduced here.

Time to see what .NET has to offer.

XSLT in .NET

First point of note: you can perform XSLT processing on the server or client (assuming your client browser has an XSLT processor). The usual client vs. server arguments pervade here: chiefly you’d like to utilise the processing power of the client machine rather than tying up server resources but can you be sure the client browser population is fit for purpose? If the answer to the latter is yes – the main requirement being that the XSLT you’ve written doesn’t generate errors in the client browser processor – then you can simply reference the XSLT stylesheet from the XML file and the specified transformation will be undertaken.

Returning to the server side processing options: you won’t be surprised to learn that it is the system.xml namespace where the classes and other namespaces relating to XSLT are found. The main ones are:

1. XpathDocument (system.xml.xpath)
This provides the faster option for XSLT transformation as it provides read only, cursor style access to XML data via its DOM. It has no public properties or methods to remember but does have several different constructors by which it accepts the following objects: XmlTextReader, textReader, stream and string path to an XML document.

2. XslTransform (system.xml.xsl)
This is the XSLT processor and hence the key class of interest to us. Three main steps to utilise: instantiate the transform object, loads the XSLT document into it and then transform the required XML document (accessed via the XPathDocument object created for the purpose).

3. XsltArgumentList:
Allows provision of parameters to XslTransform. XSLT defines a xsl:param element that can be used to hold information passed into the stylesheet from the XSLT processor. XslTArgumentList is used to achieve this.

Also of direct relevance are: XmlDocument and XmlDataDocument but I won’t be considering them further here … I’ll leave this to your own investigation.

Going to go straight to a simple example showing 1 and 2 and above in action:

Listing 4: .NET example: Transform.aspx

 <%@ Page language="vb" trace="false" debug="false"%>
 <%@ Import Namespace="System.Xml" %>
 <%@ Import Namespace="System.Xml.Xsl" %>
 <%@ Import Namespace="System.Xml.XPath" %>
 <%@ Import Namespace="System.IO" %>
 
 <script language="VB" runat="server">
   public sub Page_Load(sender as Object, e as EventArgs)
     Dim xmlPath as string = Server.MapPath("books.xml")
     Dim xslPath as string = Server.MapPath("example2.xslt")
 
     Dim fs as FileStream = new FileStream(xmlPath,FileMode.Open, FileAccess.Read)
     Dim reader as StreamReader = new StreamReader(fs,Encoding.UTF8)
     Dim xmlReader as XmlTextReader = new XmlTextReader(reader)
 
     'Instantiate the XPathDocument Class
     Dim doc as XPathDocument = new XPathDocument(xmlReader)
 
     'Instantiate the XslTransform Class
     Dim xslDoc as XslTransform = new XslTransform()
     xslDoc.Load(xslPath)
     xslDoc.Transform(doc,nothing,Response.Output)
 
     'Close Readers
     reader.Close()
     xmlReader.Close()
   end sub
 </script> 

As you can see this example uses the stylesheet example2.xsl as introduced earlier. Describing the code briefly: on page load strings are defined as the paths to the input files in the local directory. A FileStream object is instantiated and the XML document loaded into it. From this a StreamReader object is instantiated, and in turn a XmlTextReader from this. The DOM can then be constructed within the XPathDocument object from the XML source via the objects so far defined. We then need to instantiate the XSLTransform object, load the stylesheet as defined by the string xslPath, and actually call the transform method. The parameters are the XPathDocument object complete with DOM constructed from the XML document, any parameters passed to the stylesheet – none in this case, and the output destination of the result tree.

ASP.NET also comes complete with the ASP:Xml web control, making it easy to perform simple XSLT transformations in your ASP.NET pages. Use is as per any other web control, you simply supply the 2 input properties (DocumentSource and TransformSource) as parameters, either declaratively or programmatically. Here’s an example that does both, for demonstration and clarification purposes:

Listing 5: ASP:xml web control: Transform2.aspx

 <%@ Page language="vb" trace="true" debug="true"%>
 
 <script language="vb" runat="server">
 sub page_load()
   xslTrans.DocumentSource="books.xml"
   xslTrans.TransformSource="example2.xslt"
 end sub
 </script>
 
 <html>
 <body>
 <asp:xml id="xslTrans" runat="server" 
             DocumentSource="books.xml" TransformSource="example2.xslt" />
 
 </body>
 </html> 

Lastly, just to leave you with the thought that the place of XML/XSLT technology in the ASP.NET model is not clear-cut, as the server controls generate their own HTML. Does this leave XSLT redundant? Well, no … but we may need to be a little more creative in our thinking. For example, the flexibility of XML/XSLT can be combined with the power of ASP.NET server controls by using XSLT to generate the server controls dynamically, thus leveraging the best of both worlds. Perhaps I’ll leave this for another article. Let me know if you are interested.

References:

ASP.NET: Tips, tutorials and code Sams
XSLT Programmers Reference 2nd Edition Wrox
Professional XSL Wrox
Various Online Sources

You may run Transform.aspx by clicking Here.
You may run Transform2.aspx by clicking Here.
You may download the code by clicking Here.

About the author

I am Dr Christopher Sully (MCPD, MCSD) and I am a Cardiff, UK based IT Consultant/ Developer and have been involved in the industry since 1996 though I started programming considerably earlier than that. During the intervening period I've worked mainly on web application projects utilising Microsoft products and technologies: principally ASP.NET and SQL Server and working on all phases of the project lifecycle. If you might like to utilise some of the aforementioned experience I would strongly recommend that you contact me. I am also trying to improve my Welsh so am likely to blog about this as well as IT matters.

Month List