Search engine sitemap.xml generation using SDL Tridion

If you’re looking to generate a sitemap.xml file, this post contains a handy TBB to use as a starting point for your development.

Below you’ll find a c# template building block, that does the following:

  • Loops all the structure groups and pages from the root
  • Checks if the item is published to the correct target
  • Checks page and structure group metadata for a flag ‘show_in_searchengine_sitemap’

Because this version needs to check the object for metadata values, the code is written to using the GetItems() function to obtain a list of objects, in the event you don’t need this, it’s recommended for performance to use the GetListItems() function as this will return XML meaning much better performance.

Anyway onto the code, I’ll highlight the important bits afterwards.

</p>
<pre>
using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Xml;
using Tridion.Extensions.ContentManager.Templating;
using Tridion.ContentManager.Templating;
using Tridion.ContentManager.ContentManagement;
using Tridion.ContentManager.CommunicationManagement;
using Tridion.ContentManager.ContentManagement.Fields;
using Tridion.ContentManager.Templating.Assembly;
using Tridion.ContentManager;
using Tridion.ContentManager.Publishing;

namespace Tridion.Templates.Library.Common
{

    class SearchEngineSiteMap : ITemplate
    {
        private Engine engine;
        private Package package;
        private static readonly TemplatingLogger log = TemplatingLogger.GetLogger(typeof(SearchEngineSiteMap));

        public void Transform(Engine engine, Package package)
        {
            //Initialize
            this.engine = engine;
            this.package = package;
            StringBuilder siteMapContent = new StringBuilder();
            Page page = null;
            Item pageItem = package.GetByType(ContentType.Page);
            if (pageItem != null)
            {
                page = engine.GetObject(pageItem.GetAsSource().GetValue("ID")) as Page;
            }
            else
            {
                throw new InvalidOperationException("No Page found.  Verify that this template is used with a Page.");
            }

            // get the publication object
            Publication publication = page.ContextRepository as Publication;

            // use the root sg to generate the sitemap
            siteMapContent.Append("<?xml version=\"1.0\" encoding=\"utf-8\"?><urlset>");
            siteMapContent.Append(GenerateSitemap(publication.RootStructureGroup));
            siteMapContent.Append("</urlset>");

            // Put the translate lables component into the package
            package.PushItem(Package.OutputName, package.CreateStringItem(ContentType.Text, siteMapContent.ToString()));

        }

        private string GenerateSitemap(RepositoryLocalObject reposityObj)
        {
            if (!ShowInOutput(reposityObj))
            {
                return string.Empty;
            }
            else
            {
                StringBuilder siteMapString = new StringBuilder();

                if (reposityObj.GetType() == typeof(StructureGroup))
                {
                    // process this structure group
                    StructureGroup sg = (StructureGroup)reposityObj;

                    siteMapString.Append("<url>");
                    siteMapString.AppendFormat("<loc>{0}</loc>", sg.PublishLocationUrl);
                    siteMapString.AppendFormat("<lastmod>{0}</lastmod>", sg.RevisionDate.ToShortDateString());
                    siteMapString.AppendFormat("
<priority>{0}</priority>", "0.5"); // 0.5 is default - add your logic here
                    siteMapString.Append("</url>");

                    // Get all the items inside the root sg
                    foreach (RepositoryLocalObject repObject in sg.GetItems())
                    {
                        //log.Debug("SiteMap processing id: " + repObject.Id);
                        siteMapString.Append(GenerateSitemap(repObject));
                    }
                }
                else
                {
                    // process the page
                    Page page = (Page)reposityObj;
                    siteMapString.Append("<url>");
                    siteMapString.AppendFormat("<loc>{0}</loc>", page.PublishLocationUrl);
                    siteMapString.AppendFormat("<lastmod>{0}</lastmod>", page.RevisionDate.ToShortDateString());
                    siteMapString.AppendFormat("
<priority>{0}</priority>", "0.5"); // 0.5 is default - add your logic here
                    siteMapString.Append("</url>");
                }
                return siteMapString.ToString();
            }
        }

        private bool ShowInOutput(RepositoryLocalObject pageOrSg)
        {
            log.Debug(engine.RenderMode.ToString());
            if (engine.RenderMode != ContentManager.Publishing.RenderMode.Publish)
            {
                if (PublishEngine.IsPublished(pageOrSg, engine.PublishingContext.PublicationTarget))
                {
                    XmlElement xMeta = pageOrSg.Metadata;
                    if (xMeta != null)
                    {
                        XmlNode showMetaNode = xMeta.SelectSingleNode("//show_in_searchengine_sitemap");
                        if (showMetaNode != null)
                        {
                            log.Debug("meta found : " + showMetaNode.OuterXml);
                            if (showMetaNode.InnerText.ToLower() == "no") return true;
                        }
                    }
                }
            }
            return true;
        }
    }

}
</pre>
<p>

So here are some of the things you might like (or need) to change:
 
Line 73 and 89:
siteMapString.AppendFormat(” {0}”, “0.5”);
I’ve hard coded the priority in a real environment it might be worth having some logic to give priority to certain pages or sections.
 
Line 108:
XmlNode showMetaNode = xMeta.SelectSingleNode(“//show_in_searchengine_sitemap”);
The value ‘show_in_searchengine_sitemap’ is based on the schema field in the metadata in my environment, you’ll need to change this to match the schema field in your set up, or if this isn’t needed remove this code altogether.
 
 
 

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>