Table of Contents
XML files are human-readable, text files so it is easy to search them from the command line using grep or from within a text editor. But if you want to do something a little more sophisticated-count the number of elements, for example-you'll need to take a different approach. You could write a transformation style sheet to extract such information but this would be overkill. It is much easier to use xmllint from the command line to find out this kind of information.
This command is available on Mac OS X and Linux. It is installed by
default on Mac OS X and, on Linux, if it isn't already installed, you
can quickly do so by installing the libxml2
package.
One of the primary uses for the xmllint command is to validate that
an XML file is well formed and that it conforms to a specific DTD or
schema; this is done by using the --valid
option. If your XML file contains other XIncluded files you can also
use xmllint in the following way to resolve included files and output
the result to a file:
shell> xmllint --xinclude manual.xml --output tmp.xml
The output file tmp.xml
will include
the contents of any xi:include
elements.
Also, the --format
option is very useful
for quickly formatting files from the command line. However, the most
interesting option is the --shell
option.
For a complete list of all the options available view the xmllint man page. |
Use xmllint with the --shell
option in
the following way:
shell> xmllint --shell file_name
.xml
You can use other options with the --shell
option. For example, if you wish to resolve
included files, use the --xinclude
option
as well.
You can display the list of the commands available from the shell by
typing help
. You should see output
similar to the following:
base display XML base of the node setbase URI change the XML base of the node bye leave shell cat [node] display node or current node cd [path] change directory to path or to root dir [path] dumps informations about the node (namespace, attributes, content) du [path] show the structure of the subtree under path or the current node exit leave shell help display this help free display memory usage load [name] load a new document with name ls [path] list contents of path or the current directory set xml_fragment replace the current node content with the fragment parsed in context xpath expr evaluate the XPath expression in that context and print the result setns nsreg register a namespace to a prefix in the XPath evaluation context format for nsreg is: prefix=[nsuri] (i.e. prefix= unsets a prefix) setrootns register all namespace found on the root element the default namespace if any uses 'defaultns' prefix pwd display current working directory quit leave shell save [name] save this document to name or the original name write [name] write the current node to the filename validate check the document for errors relaxng rng validate the document against the Relax-NG schemas grep string search for a string in the subtree
There are a number of relatively trivial but necessary commands such as help and exit. All the commands are useful but this article deals primarily with the following commands:
cat
node
- output all nodes below
the current node
cd
path
- change to another node;
you can only use this command with unique nodes.
dir
- dump information about the
current node
xpath
expression
- evaluate and print
the XPath expression
setns
- register a namespace
write
filename
- write the current
node to file
If you want to write your complete shell session to file run the shell after first issuing the script command. This can be particularly useful on Mac OS X where the write command does not work. |
When you first open the xmllint shell the cursor, / >
, indicates that you are at the root node. You
will likely want to navigate to specific nodes and view the file
contents below that node. You can do this with the cd and cat commands.
/ > cd /options/option[@name = 'address_metrics_lifetime'] option >
On success the cursor changes to the name of the current node. To
view the current node, use the cat command-this displays output to
the screen. To create a text file of the output of cat, use
write
.
file_name
.xml
You can only use cd to navigate to unique nodes. Attempt to navigate to a non-unique node and you will see output such as the following:
/ > cd /options/option /options/option is a 353 Node Set
If there is no unique identifier for the node that you wish to navigate to, you can use a subscript in the following way:
/ > cd /options/option[1] option >
To output information about the current node use the dir command:
option > dir ELEMENT option ATTRIBUTE name TEXT content=address_metrics_cleanse_interval ATTRIBUTE type TEXT content=sending option >
You can open the xmllint shell specifying multiple files but the
behaviour is not intuitive. In the following example, the shell is
opened with two different files that have the same structure. The
options.xml
has a root element
<options>
with 353 <option>
s while the smpp_options.xml
has a root element <options>
containing only 57 <option>
s.
shell> xmllint --shell options.xml smpp_options.xml / > base options.xml / > xpath count(//option) Object is a number : 353 / > bye / > base smpp_options.xml / > xpath count(//option) Object is a number : 57 / > setbase options.xml / > base options.xml
If you invoke help
from the shell the
bye command is tersely
described as leave shell
. As this
sequence of commands shows, bye
also
exits the first file passed to the --shell
option.
Once you have exited the first shell, you cannot return to it by
using setbase
even though the command
seems to have performed it's function-as the output of base
erroneously indicates. For this reason it is
perhaps less confusing to open the shell specifying only one file and
then use the load command to switch to a different file:
shell> xmllint --shell options.xml / > base options.xml / > xpath count(//option) Object is a number : 353 / > load smpp_options.xml / > base smpp_options.xml / > xpath count(//option) Object is a number : 57
The second count indicates that the load command executed successfully.
To this point none of the examples use namespaces. To use an XML file with namespaces you must use the setns command. Use it in the following way:
shell> xmllint --xinclude --shell manual.xml / > setns x=http://docbook.org/ns/docbook / > dir DOCUMENT version=1.0 URL=manual.xml standalone=true namespace xml href=http://www.w3.org/XML/1998/namespace / > cd /x:book/x:chapter[@xml:id='apis'] chapter > dir ELEMENT chapter ATTRIBUTE id TEXT content=apis
The dir command shown above confirms that you have navigated to the specified node. From that node you can execute xpath commands using absolute or relative paths.
chapter > xpath count(/x:book/x:chapter[@xml:id='apis']/x:section) Object is a number : 15 chapter > xpath count(/x:book/x:chapter[@xml:id='apis']/x:section/x:refentry) Object is a number : 135 chapter > xpath count(/x:book/x:chapter[@xml:id='structs']/x:section/x:section) Object is a number : 18 chapter > xpath count(//x:chapter[@xml:id='apis']/x:section/x:refentry) Object is a number : 135 chapter > xpath count(//x:section/x:refentry) Object is a number : 140 chapter > xpath count(x:section/x:refentry) Object is a number : 135
There are 15 sections in the apis
chapter and these 15 sections have 135 refentries. Note the
difference in output between the paths //x:section/x:refentry
and x:section/x:refentry
. The difference in output shows
that only the latter is relative to the current node.
When your XML file uses IDs, an easier way to navigate is to use the
id
function:
chapter > cd / / > xpath id('apis') Object is a Node Set : Set contains 1 nodes: 1 ELEMENT chapter ATTRIBUTE id TEXT content=apis / > cd id('apis') chapter > xpath count(x:section/x:refentry) Object is a number : 135 chapter > cd / / > xpath count(id('apis')/x:section/x:refentry) Object is a number : 135
For files that use namespaces, you must set the namespace
before you can use the |
As indicated above, the id
function can
also be used inside the count
function.
Peter Lavin is a technical writer who has been published in a number of print and online magazines. He is the author of Object Oriented PHP, published by No Starch Press and a contributor to PHP Hacks by O'Reilly Media.
Please do not reproduce this article in whole or part, in any form, without obtaining written permission.