XML Resources


This file gives an inventory of XML tools and information about XML that are available at CERN. This local repository is continuously evolving as I add more useful information on XML and update interesting XML applications to their latest versions. This work is a vastly expanded version of an earlier collection by Sebastian Rahtz for an XML Seminar that he organized at the Oxford Computing Services in Summer 2000.

Highlights include:


Bookmark these sites

And locally here...

Specifications Tutorial Software Examples

XML Recommendation (1st and 2nd editions)
Annotated XML
Canonical XML
XML Infoset
XML Base
XML Include
OMDoc: A Standard for Open Mathematical Documents
XML Namespace
TEI Guidelines
Unicode material

Crane XSL training
XSLT Tutorial
XML and the Web
XML intro
XML Schema
XSLT & XPath
XML overview
XML Basics
XML intro
XML and Java
XLinks and XPointers
XML Fundamentals
Java futures
XSLT and Beyond
Advanced XML
Cuttingedge XML
XML Namespaces
XML Schemas
XML Hypertext
SVG (Chris Lilley)
SVG (Eurographics 2001)
Multilingual Web (tutorial)
WWW Character Model
XML talks
XML Quick Reference
Peter Flynn's XML FAQ
David Pawson's XSL FAQ
XSLT Quick Reference
ZVON XML and other tutorials

Dave Pawson's Docbook FAQ
LTG XML tools
XML Multi Schema Validator
TEI Lite
TEI Pizza Chef
TEI XSL stylesheets
Unicode entity mappings
XSL-FO Composer
XSLT Standard Library

Oxford Text Archive
XSLT examples
DTD overviews (LiveDTD)

Resource Catalogue

Annotated XML

The official XML 1.0 specification, with detailed explanatory and historical annotations by one of its editors, Tim Bray

Web site http://www.xml.com/axml/axml.html copied locally.

Crane XSL training

Description of an extensive tutorial covering all of the XSLT specification, by Ken Holman.

Web site http://www.cranesoftwrights.com/shareware/ copied locally.


Cascading Style Sheets, version 2, W3C recommendation (May 1998) W3C.

Web site http://www.w3.org/TR/REC-CSS2/ and copied locally (HTML, PDF).


XML Information Set, W3C Recommendation 24 October 2001.

Web site http://www.w3.org/TR/xml-infoset/ and copied locally (HTML)

XML Base

XML Base W3C Recommendation 27 June 2001

Web site http://www.w3.org/TR/xmlbase/ and copied locally (HTML)

Canonical XML

Canonical XML Version 1.0, W3C Recommendation 15 March 2001.

Web site http://www.w3.org/TR/xml-c14n/ and copied locally (HTML)

XML Include

XML Inclusions (XInclude) Version 1.0, W3C Working Draft 16 May 2001.

Web site http://www.w3.org/TR/xinclude/ and copied locally (HTML)


The Document Object Model (DOM) is the official W3C specification for a standard Application Programming Interface (API), defining how XML documents should be processed.

The spec has two `levels'. The Level 1 Specification (Version 1) appeared as a W3C Recommendation in 1 October, 1998 W3C. Web site http://www.w3.org/TR/REC-DOM-Level-1/ copied locally.

The Level 2 Specification appeared in September 2000 as a Proposed Recommendation consisting of six parts:

  1. Core Specification (locally.
  2. Events Specification (locally.
  3. HTML Specification (locally).
  4. Style Specification (locally.
  5. Traversal-Range Specification ( locally).
  6. Views Specification (locally.

Navigate through a few DTDs

The LiveDTD (perl) Tool of Robert Stayton allows you to convert a Document Type Definition (DTD) into a hypertext document. See the manual.


FOA is the first XSL-FO Authoring tool. It is a Java application that gives users a graphical interface to author XSL-FO stylesheets. With FOA you can generate pages, page sequences and fill them with content provided into one or more XML files. FOA will generate the XSLT stylesheet that transforms the XML content into an XSL-FO document. From FOA GUI is also possible to invoke an XSLT processor and an XSL-FO renderer, so you can see how the document looks like.

Web site Formatting Objects Authoring tool Web site. FOA is copied locally, see next. A tutorial is here FOA can be run by typing foa.sh (foa on Windows).

XSLT Extension library

EXSLT is an open community initiative to standardise and document extensions to XSLT. There are implementations of the various extension elements and functions.

The Web site should be consulted for the latest developments. Locally, we always keep a recent version with documentation as well as the implementations in the subdirectories of www.exslt.org/. To learn how to use the extension read the How To.


FOP is the world's first print formatter driven by XSL formatting objects. It is a Java 1.1 application that reads a formatting object tree and then turns it into a PDF document. The formatting object tree, can be in the form of an XML document (output by an XSLT engine like Xalan) or can be passed in memory as a DOM Document or (in the case of Xalan) SAX events. However, FOP is still work in progress, and does not implement the full XSL formatting objects specification.

Web site http://xml.apache.org/fop/. FOP is copied locally. Global documentation (not the API interface) is here. FOP can be run by typing fop.sh (fop on Windows).


Libxml is the XML C library developped for the Gnome project. On the Web it is at http://www.xmlsoft.org, with a copy locally here. The command xmllint.sh will let you use the XML parser in an interactive way (documentation is here). XML catalogs are supprted with the command xmlcatalog.sh, which has a short description here


Libxslt is a XSLT C library developped for the Gnome project. On the Web it is at http://www.xmlsoft.org, with a copy locally here. A tutorial is here. The command xmltproc.sh will let you use the XSLT parser in an interactive way (documentation is here).

LTG XML tools

Tools from Edinburgh University's Language Technology Group for processing arbitrary XML documents, including sggrep (an XML-aware version of the popular grep utility) and others.

Web site http://www.ltg.ed.ac.uk/software/xml/ copied locally.


MathML is an XML document type definition for the representation of mathematics, now supported by the W3C. It covers most aspects of mathematical typesetting and representation.

Web site http://www.w3.org/TR/REC-MathML/ copied locally.

MathML (second edition, W3C recommendation) MathML 2 copied locally. You also have the XML sources (see the readme file for more details).

OMDoc: A Standard for Open Mathematical Documents

OMDoc proposes a standard format for Open Mathematical Documents on the Web, including support for interactivity. OMDoc differs from the presentation-based approaches surveyed in that it concentrates on representing the meaning of mathematical formulae instead of their appearance. OMDoc is an extension of the OpenMath and MathML standards, and in particular of the content part of MathML.

More information is at the OMDoc home Page, of which a few files have been copied locally.

Math on the Web is a status report that reviews progress in the math, science, and education community for dealing with math on the Web.

Web site http://www.w3.org/TR/REC-MathML/ copied locally.

XML Multiple Schema Validator

The Sun Multi-Schema XML Validator is a Java tool to validate XML documents against several kinds of XML schemata. It supports DTD, RELAX Namespace, RELAX Core, RELAX NG, TREX, and a subset of W3C XML Schema Part 1.

Web site is at http://www.sun.com/software/xml/developers/multischema/ (you need to register to download). A runnable version of the validator is included locally with the help of of command xsv.sh. Documentation on how to run the validator is here.

Namespaces in XML

An XML namespace is a collection of names, identified by a URI reference, which are used in XML documents as element types and attribute names.

Web site http://www.w3.org/TR/1999/REC-xml-names-19990114/ copied locally.

Oxford Text Archive

The Oxford Text Archive has a very large archive of SGML and XML literary and linguistic texts, from which we have selected a few titles here, to demonstrate how the TEI Guidelines can be used in practice:


PassiveTex is a system developed by Sebastian Rahtz for printing XSL formatting objects, using the TeX document processing system

Web site http://users.ox.ac.uk/~rahtz/passivetex/ copied locally.


Resource Description Framework (RDF) is a W3C specification for the high level representation of metadata, currently expressed as an XML DTD.

Web site http://www.w3.org/RDF/ copied locally.


A small and very fast XML parser developed by Richard Tobin, and available for Windows or Unix.

Web site http://www.ltg.ed.ac.uk/~richard/rxp.html. The is a local copy of the command syntax. There is also some background documentation about the LT XML Library and tools (they are installed only on Linux).

At CERN on Linux and Solaris, to run just type rxp.


SAX is a simple, event-driven, applications programming interface for XML developed by David Megginson.

Web site http://www.megginson.com/SAX/index.html copied locally.


SAXON is a fast, lightweight, and 100% conformant XSLT processor developed by Michael Kay and available for Windows or Unix. It is packaged with a copy of Aelfred, a small and fast XML parser.

Web site copied locally. It can be run by typing saxon.sh with the directory containing the executables of the XML tools in your PATH (i.e., /afs/cern.ch/sw/XML/XMLBIN/bin/i386-linux/ on Linux at CERN). Documentation is here.

A very experimental and preliminary release of Saxon, which includes developments for the new 2.0 version of the XPATH and XSLT specifications, that is currently under development, is available via the command saxon7.sh. Documentation is here.


This W3C specification defines a new schema language, designed to replace the DTD language inherited by XML from SGML.

Tools for processing Schema language are starting to appear, but the standard does not yet appear to be sufficiently mature to warrant their inclusion here.

Web site http://www.w3.org/XML/Schema.html copied locally. The W3C Recommendation comes in three parts:


SMIL (Synchronized Multimedia Integration Language) defines an XML-based language that allows authors to write interactive multimedia presentations.

With SMIL 2.0, an author can describe the temporal behavior of a multimedia presentation, associate hyperlinks with media objects and describe the layout of the presentation on a screen.

SMIL syntax and semantics can be reused in other XML-based languages, in particular those who need to represent timing and synchronization. For example, SMIL 2.0 components are used for integrating timing into XHTML and into SVG.

Synchronized Multimedia Integration Language (SMIL 2.0, second edition, W3C proposed recommendation) SMIL 2 copied locally.


SMIL Animation (W3C Proposed Recommendation 19-July-2001) , copied locally.



SP is the most widely used SGML toolkit. Originally developed by James Clark, it includes a powerful parser, capable of processing both SGML and XML documents, and a suite of utilities for normalising SGML, converting it to XML, etc.

Web site http://www.jclark.com/sp/ copied locally (documentation); Web site ftp://ftp.jclark.com/pub/sp/ copied locally (software distribution).


The Standard Vector Graphics language is a W3C standard for the representation of vector graphics using XML.

Web site http://www.w3.org/TR/SVG/ copied locally.

Apache's Batik SVG interpreter offers a series of jar libraries. Information about the Batik tool is here.

At CERN, you can type batik.sh -browse (batik -browse on Windows) to view SVG files.

At CERN samples of SVG files are available in the Apache Batik sub-directory share/java/xml.apache.org/batik-1.5/samples under the XMLBIN root (/afs/cern.ch/sw/XML/XMLBIN on AFS, and \\cern.ch\dfs\experiments\sw\xml on Nice 2000).


The Text Encoding Initiative is an international scholarly effort which has published an extensive and detailed suite of SGML DTDs capable of handling a wide range of scholarly textual resources. We include here the most recently revised version, as published in 2002.

TEI Website http://www.tei-c.org/ or local.

TEI Guidelines

The Text Encoding Initiative Guideline were published in book form in 1994. We include here the full text of the Revised Reprint published in 1998 and in 2002, which documents and exemplifies the full SGML and XML DTD suite, in HTML format.

Web site http://www.tei-c.org/Guidelines copied locally.

TEI Lite

TEI Lite is a widely used application of the TEI's DTD suite available in XML form.

Web site http://www.tei-c.org/Lite/ copied locally.

TEI Pizza Chef

The PizzaChef is a tool for constructing customised views of the full TEI DTD Suite. Custom DTDs can be produced in SGML or in XML format.

Web site http://www.tei-c.org/pizza.html copied locally.

TEI XSL stylesheets

This is a modular suite of XSL stylesheets developed by Sebastian Rahtz which can be used to format TEI conformant XML documents as print or HTML.

Web site http://www.tei-c.org/Stylesheets/index.html copied locally.


Tidy is a utility developed by Dave Raggett which can be used to clean up HTML files, and also to convert them to XML.

Web site http://www.w3.org/People/Raggett/tidy/ copied documentation ( documentation ); tidy is installed to run locally, just type tidy.


Notes on Unicode Transcoders, written by Rick Jelliffe. This is essential reading for people interested in non-Latin scripts.

Web site http://www.ascc.net/xml/en/utf-8/transcode-index.html copied locally.

Unicode Material

We have a local copy of Code Charts in PDF. There is also a Unicode Character Name Index, as well as an Online Edition of The Unicode Standard, Version 3.0.

For other material one should consult the Unicode home page on the Web directly.

Unicode entity mappings

A set of entity files which map the ISO entity sets to their Unicode equivalent; there is also a detailed XML document ( unicode.xml ) which provides detailed mapping information for much of the western language and symbol part of Unicode.

Web site http://www.tei-c.org/XML/ copied locally


Xalan provides high-performance XSLT stylesheet processing. Xalan fully implements the W3C XSLT and XPath recommendations. Xalan is currently available in Java, and in C++ (the latter is not installed at CERN).

Web site Java , C++. The java version is copied locally. Documentation is in the Readme file. It contains details about the API, extension libraries, and about xsltc, the Apache/Xalan XSLT Compiler.

At CERN Xalan can be run by typing xalan.sh (xalan on Windows).


XED is a a lightweight validating XML editor developed by Henry Thompson available for Windows and Unix

Web site http://www.cogsci.ed.ac.uk/~ht/xed.html copied locally. Can be run by typing xed.sh.


Xerces provides XML parsing and generation. Fully-validating parsers are available for both Java and C++, implementing the W3C XML and DOM (Level 1 and 2) standards, as well as the defacto SAX (version 2) standard. The parsers are highly modular and configurable. Initial support for XML Schema (draft W3C standard) is provided.

Web site Java , C++ , Perl wrapper. The java version is copied locally, the C++ version is not installed at CERN.

Some documentation is on how to use the XML parser is here. That Web page gives you access to samples for using DOM, SAX, sockets, and the UI and XNI interfaces. It also features the API javadoc and JNI manual, as well as a FAQ.

Xerces can be run at CERN by typing xerces.sh (xerces on Windows).

IBM XSL-FO Composer

The IBM XSL FO Composer (XFC) is a partial implementation of XSL Formatting Objects described in the W3C Specification Extensible Stylesheet Language (XSL) Version 1.0 (see HTML, PDF).

The two tables linked below summarize elements of the Recommendation supported by the installed release.

Web site http://www.alphaworks.ibm.com/tech/xfc/. Examples are copied locally here. xfc can be run by typing xfc.sh (xfc on Windows).

XSLT Standard Library

The XSLT Standard Library, xsltsl, provides the XSLT developer with a set of XSLT templates for commonly used functions. These are implemented purely in XSLT, that is they do not use any extensions.

On the web this library is on the Sourgeforge repository, with a local copy here. After reading this documentation you can exercise the stylesheets themselves that are in the directory sourceforge.net/projects/xsltsl-1.1 with test examples in the subdirectory test.


This W3C Recommendation specifies the reformulation of HTML 4 as an XML DTD.

XHTML 1 is available on the Web site http://www.w3.org/TR/xhtml1/ copied locally.

XHTML Basic is available on the Web site http://www.w3.org/TR/2000/PR-xhtml-basic-20001103 copied locally.


XHTML 1.1 1.1 - Module-based XHTML is available on the Web site http://www.w3.org/TR/2001/REC-xhtml11-20010531/ copied locally.

Modularization of XHTML is available on the Web site http://www.w3.org/TR/xhtml-modularization, copied locally.


XML Linking Language (XLink) Version 1.0. W3C Recommendation 27 June 2001

Web site http://www.w3.org/TR/xlink/ copied locally.


The 1998 XML recommendation is at the Web site http://www.w3.org/TR/1998/REC-xml-19980210.html and copied locally (HTML in English, HTML in French, PDF in English).

Recently (October 2000) a second edition (including all the errata collected sine 1998) was published on the Web at http://www.w3.org/TR/2000/REC-xml-20001006.html and copied locally (plain HTML, difference highlighted, PDF).

XSLT Tutorial

Updates of a few chapters, including the one of XSLT, from Rusty Elliotte Harold's XML Bible , generously made freely available as a sampler by its author.

Web site http://www.ibiblio.org/xml/books/bible/ copied locally.


This W3C Proposal specifies a range of standardized hyperlinking functionalities in XML.

Web site http://www.w3.org/TR/xptr/ copied locally.

XML Quick Reference

This is a handy reference card for the essential details of XML made available by Mulberry Technologies.

Web site http://www.mulberrytech.com/quickref/XMLquickref.pdf copied locally.


This W3C Recommendation documents the XML Path Language (XPath), Version 1.0, which is used to address arbitrary parts of an XML document, and underlies an increasingly large number of XML retrieval applications.

Web site http://www.w3.org/TR/1999/REC-xpath-19991116.xml copied locally (XML) Web site http://www.w3.org/TR/1999/REC-xpath-19991116.html copied locally (HTML).


This W3C Candidate Recommendation documents the Extensible Stylesheet Language (XSL), which describes a standard language for formatting XML documents. It is still unfinished, but is already partially implemented by several suppliers (see PassiveTeX and FOP ).

Web site http://www.w3.org/TR/xsl/ copied locally (HTML , PDF ).


A Frequently-Asked Questions about the Extensible Markup Language. maintained by Peter Flynn.

Web site http://www.ucc.ie/xml copied locally.


A handy summary of XSL Frequently Asked Questions edited by David Pawson.

Web site http://www.dpawson.co.uk/xsl/xslfaq.html,

Dave Pawson, Nikolai Grigoriev, Karen Lease, and Arved Sandstrom, started putting together a book on XSL-FO. It can be found at Dave's Web site http://www.dpawson.co.uk/xsl/sect3/bk.html.


This W3C Recommendation documents the XSL Transformation Language (XSLT), Version 1.0

Web site http://www.w3.org/TR/1999/REC-xslt-19991116.xml copied locally (XML) Web site http://www.w3.org/TR/1999/REC-xslt-19991116.html copied locally (HTML).

XSLT examples

This is a collection of source files demonstrating how to use XSLT, taken from Michael Kay's XSLT Programmers' Reference.

Web site ftp://ftp.wrox.co.uk/professional/3129.zip copied locally.

XSLT Quick Reference

This is a handy reference card for the essential details of XSLT made available by Mulberry Technologies.

Web site http://www.mulberrytech.com/quickref/XSLTquickref.pdf copied locally.

XP and XT

XT is a fast and efficient XSLT processor developed by James Clark available for Windows or Unix. XP is Clark's XML parser which can be used with XT.

Web site http://www.jclark.com/xml/xt.html copied locally (documentation); Web site ftp://ftp.jclark.com/pub/xml/xt.zip is installed locally and can be run by typing xt.sh (xt on Windows). Similarly, XP is available at ftp://ftp.jclark.com/pub/xml/xp.zip and can be run with the script xp.sh. Documenstation is here.

Publicly available tutorials locally

XML and the Web by Tim Berners-Lee (HTML).

Extensible Markup Language (XML) - An Introduction by Michel Goossens (HTML).

XML Schemas by Roger L. Costello (directory with MS Powerpoint).

XSLT & XPath by Roger L. Costello (directory with MS Powerpoint).

XSL Tutorial by Norman Walsh (HTML Frames, HTML as single file).

XSL Concepts and Practical Use by Norman Wlash (HTML Frames).

XML, a new start for the Web by Michel Goossens (PDF).

XML Basics by Elliotte Rusty Harold (HTML).

Intro to XML by Elliotte Rusty Harold (HTML).

Processing XML with Java by Elliotte Rusty Harold (HTML).

XML DOM by Elliotte Rusty Harold (HTML).

XML SAX by Elliotte Rusty Harold (HTML).

XML DTDs by Elliotte Rusty Harold (HTML).

XLinks and XPointers by Elliotte Rusty Harold (HTML).

XML Fundamentals by Elliotte Rusty Harold ( HTML).

JDOM by Elliotte Rusty Harold (HTML).

Java 1.4 and Beyond by Elliotte Rusty Harold (HTML).

XSL Transformations by Elliotte Rusty Harold (HTML).

XSLT 2.0 and Beyond by Elliotte Rusty Harold (HTML).

Advanced XML by Elliotte Rusty Harold (HTML and HTML).

Cuttingedge XML by Elliotte Rusty Harold ( HTML).

XML Namespaces by Elliotte Rusty Harold ( HTML).

XML Schemas by Elliotte Rusty Harold (HTML).

Xinclude by Elliotte Rusty Harold (HTML).

XML Hypertext by Elliotte Rusty Harold (HTML).

SMIL: Multimedia for Everyone by Philipp Hoschka (HTML).

SVG: Scalable Vector Graphics by Chris Lilley (HTML).

SVG: Scalable Vector Graphics by I Herman, DA Duce, FRA Hopgood at Eurographics 2001 (SVG (requires SVG-capable browser or application).

PassiveTeX: from XML to PDF by Michel Goossens and Sebastian Rahtz (paper presented at TUG2000 in Oxford in August 2000) (PS).

Weaving the Multilingual Web by François Yergeau and Martin Dürst (HTML).

Towards Accessible Multimedia by Ian Jacobs, Marja-Riitta Koivunen, and Charles McCathieNevile (HTML).

WWW Character Model by Martin Dürst (HTML).

W3C Work on XHTML by Dave Raggett (HTML).

XLink by Daniel Veillard (HTML).

XPointer Character Model by Daniel Veillard (HTML).

XML Presentations at the Oxford University Computing Center

ZVON XML and related tutorials

Last updated: 30 August 2002 (Michel Goossens).