Using XPath to find XML Elements with Inline/Default Namespace and Null Prefix

On the last SDL Tridion Community Webinar, Dominic Cronin suggested a great alternative to Regex for finding, adding, removing, or replacing certain elements or attributes, such as Component Links within Components (or Pages).  The more data-safe approach is to manipulate the Component source as an XML document [via XPath] rather than regex.  I gave it a shot, except that getting (what seems like) a simple XPath query to work took way more effort than I anticipated.  All of this due to a little unknown detail about XPath queries for items in the default namespace without a prefix.

In this scenario, the Tridion XML source for a component has elements with inline (or default) xhtml namespaces declared right on the element that we’re searching for, and no prefix. For example, in the XML below such elements are the anchor and paragraph tags:

<?xml version=”1.0″ encoding=”UTF-16″?>
<Content xmlns=”uuid:7F1745CA-CCFE-4032-BAD7-5699F4149D3B”>
<Title>Hello Internets!</Title>
<Introduction>
<div style=”TEXT-ALIGN: left” xmlns=”http://www.w3.org/1999/xhtml”>Welcome to our website.    This is some random content that isn’t doing very much, except <a href=”http://google.com”>this link</a>
</div>
</Introduction>
<Body>
<p xmlns=”http://www.w3.org/1999/xhtml”>This is some more body content that has been put into the page.</p>
</Body>
</Content>

A basic XPath query such as myXmlDocument.SelectNode(“//p”) should return all the <p> elements in the document, however, since they’re part of the default “http://www.w3.org/1999/xhtml” namespace, you get nothing back from your search – very annoying: thanks XPath engine for making it so hard! If only there had been a prefix, e.g. <p xmlns:xhtml=”http://www.w3.org/1999/xhtml”> then the XPath query would be a no brainer: myXmlDocument.SelectNode(“//xhtml:p”), and we’d select all the <p> nodes in the document within that namespace.

Actually, even though you don’t see a prefix in the above example, there actually is one. It’s called the NULL prefix. And the way you select nodes with a NULL prefix in C# is as follows:

XmlDocument componentXml = componentItem.GetAsXmlDocument();
XmlNamespaceManager nsMgr = new XmlNamespaceManager(componentXml.NameTable);
nsMgr.AddNamespace(“null”, “http://www.w3.org/1999/xhtml”);
foreach (XmlElement pNode in componentXml.SelectNodes(“//null:p“, nsMgr))
{
    //do stuff with the pNode,
    //such as (random example) remove it completely from the document.

    pNode.ParentNode.RemoveChild(pNode);
}

Notice above, the xhtml namespace is added to the NamespaceManager with the “null” prefix, and then the XPath query selects the <p> tag with that prefix? That’s it!

Happy coding!

5 thoughts on “Using XPath to find XML Elements with Inline/Default Namespace and Null Prefix

  1. NIce tip.
    I’ve just discovered this myself during some XSLT transformation work of Tridion Component XML. Also, I think you can call that prefix whatever you like – it doesn’t have to be “null”.

  2. Even if the xhtml namespace had used a prefix, you’d still have to specify a namespace when selecting nodes from it using xpath. The important thing to always remember about namespaces is that the prefix doesn’t matter, all that matters is the namespace it resolves to.

    So in your case, you could have added the namespace to your manager like this:
    nsMgr.AddNamespace(“xhtml”, “http://www.w3.org/1999/xhtml”);

    And then select the paragraphs using:
    componentXml.SelectNodes(“//xhtml:p“, nsMgr)

    Better yet, add two namespaces:


    nsMgr.AddNamespace(“comp”, “uuid:7F1745CA-CCFE-4032-BAD7-5699F4149D3B");
    nsMgr.AddNamespace(“xhtml”, “http://www.w3.org/1999/xhtml”);

    And select the paragraphs with a more specific XPath (meaning it will need to investigate fewer nodes and thus be faster):

    componentXml.SelectNodes(“/comp:Content/comp:Introduction//xhtml:p“, nsMgr)

    If you start using the Component namespace regularly, you might want to consider specifying a more readable value for it in the Schema that the Components are based on. All Tridion cares about is that the Schema’s namespace is unique, it doesn’t care what the exact value is. Remember though: if you change the Schema’s target namespace after you created some Components, you’ll have to update the namespace in those Components too.

  3. Well there you are Nickoli: you are now officially up the learning curve for this stuff. It’s actually not to bad as long as you make sure you are looking at the raw XML of the component when you do this. XPath’s behaviour does make sense – imagine if you wanted to distinguish between two elements with the same name and different namespaces… well that’s the whole point of namepaces isn’t it?

    If you want to write more generic code – for example to process stuff in an embedded schema (which would be in the content namespace of the schema which embeds it) then the namespaceURI property of a Tridion schema is useful.

    Alternatively, you can write ugly and less accurate XPaths with stuff like

    /tcm:Component/tcm:Data/tcm:Content/*[local-name()='banana']/*[local-name='cabbage']

  4. It’s funny, but many xpath generator tools out there fail when it comes to the scenario in this post. Thanks for your comments to help thoroughly explain and understand the concept.

  5. Thanks, that helped me!
    It’s great to have tridion tips out there in the internet… it was so difficult to find information just one or two years ago!!

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>