On the last SDL Tridion Community Webinar, Dominic Cronin suggested a great alternative to Regex for finding, adding, removing, or replacing certain elements or attributes, such as Component Links within Components (or Pages). The more data-safe approach is to manipulate the Component source as an XML document [via XPath] rather than regex. I gave it a shot, except that getting (what seems like) a simple XPath query to work took way more effort than I anticipated. All of this due to a little unknown detail about XPath queries for items in the default namespace without a prefix.
In this scenario, the Tridion XML source for a component has elements with inline (or default) xhtml namespaces declared right on the element that we’re searching for, and no prefix. For example, in the XML below such elements are the anchor and paragraph tags:
<?xml version=”1.0″ encoding=”UTF-16″?>
<div style=”TEXT-ALIGN: left” xmlns=”http://www.w3.org/1999/xhtml”>Welcome to our website. This is some random content that isn’t doing very much, except <a href=”http://google.com”>this link</a>
<p xmlns=”http://www.w3.org/1999/xhtml”>This is some more body content that has been put into the page.</p>
A basic XPath query such as myXmlDocument.SelectNode(“//p”) should return all the <p> elements in the document, however, since they’re part of the default “http://www.w3.org/1999/xhtml” namespace, you get nothing back from your search – very annoying: thanks XPath engine for making it so hard! If only there had been a prefix, e.g. <p xmlns:xhtml=”http://www.w3.org/1999/xhtml”> then the XPath query would be a no brainer: myXmlDocument.SelectNode(“//xhtml:p”), and we’d select all the <p> nodes in the document within that namespace.
Actually, even though you don’t see a prefix in the above example, there actually is one. It’s called the NULL prefix. And the way you select nodes with a NULL prefix in C# is as follows:
XmlDocument componentXml = componentItem.GetAsXmlDocument();
XmlNamespaceManager nsMgr = new XmlNamespaceManager(componentXml.NameTable);
foreach (XmlElement pNode in componentXml.SelectNodes(“//null:p“, nsMgr))
//do stuff with the pNode,
//such as (random example) remove it completely from the document.
Notice above, the xhtml namespace is added to the NamespaceManager with the “null” prefix, and then the XPath query selects the <p> tag with that prefix? That’s it!