Some thoughts on CMS/CDN Integration

Creating a architecture presentation the other day helped to crystallise some thoughts in my head on integrating a CMS like Tridion, and Content Delivery Networks (CDNs).

I have been involved with such matters before (see my SDL Tridion World Article on how to technically integrate a CDN through Storage Extensions) but I thought it was worth sharing my ideas on the considerations when working with a CMS and CDN.

SDL Tridion’s Enterprise Content Management features are a good match for companies with a truly global digital presence and audience. Such companies are also those most likely to benefit from the scaling features offered by a global Content Delivery Network, so Tridion + CDN is a hot topic.

I think the problem can be most simply boiled down to the following 3 questions:

  1. What assets do you want to cache using a CDN?
  2. How do the assets get into the CDN cache?
  3. How do we invalidate the cache when assets are (re/un)published?

Lets address them one by one.

What assets do you want to cache using a CDN?

The main benefits of using a CDN are many. A CDN puts your content physically closer to the visitor, so page load times are faster and they have a more positive web experience. The reduction of traffic to your own servers means you have a less extensive and expensive infrastructure to maintain in-house, or with your hosting provider. Finally CDNs are designed to cope with fluctuating demand – they are better equipped to deal with traffic flashpoints caused by an event or campaign.

Understanding this is key to establishing what to cache. There is little point caching your whole site. Archived or infrequently visited pages can just as easily be served from your own servers, focus on the parts of your site(s) that have significant load. Maybe 10% of your pages drive 90% of the site traffic, perhaps it is only campaign micro-sites which create a heavy load. This kind of analysis gives a good place to start analysing.

If we look at the assets themselves, there are some no-brainers: Static design files such as CSS, Javascript, logos and other design images – these don’t change much and are requested every time a new visitor comes to your site, regardless of the page viewed. 

Then you start to move into content: Video, Images and other high bandwidth assets which are for the most part static (do not change often, and are not different depending on who is accessing them)

Then you have the pages themselves. Probably you can divide these up into static (contains content that does not depend on the context in which it is viewed), dynamic (for example a list of items that can be filtered, or is dynamically generated, or a page dependent on personalization), and transactional (forms and other pages that have an explicit interaction with the user)

How do the assets get into the CDN cache?

There are 2 approaches here. The simplest is the Pull approach, whereby you publish the asset as normal to your own servers, and the CDN requests it using a normal HTTP request to your website. The second approach is to Push content to the CDN’s infrastructure. Here you hook into the publishing process, and additionally send the assets to the CDN (using whatever API, or protocol the CDN vendor makes available). 

Wherever possible, I would recommend the Pull approach. You do not need to understand the workings of the CDN, and retain control over how your assets are rendered (suppose you are publishing pages containing application code – for these to work when published on the CDN you will need to deploy your web app to the CDN, which may be complicated or even not possible).

The main advantage of the Push approach is that cache invalidation can be simpler, but as we see in the next section, this can be easily implemented in conjunction with the Pull approach also.

How do we invalidate the cache when assets are (re/un)published?

As important as getting the content cached by the CDN is knowing when to expire it. Expiring too often negates effect of using CDN, but expiring not enough means dead or old content shown to visitors.
 
Cache invalidation for pages can be complex, even what might be considered a static page contains dynamic elements, such as navigation, personalization, lists of related content and dynamic links.
 
Invalidation logic can get complex, so an element of pragmatism is needed  to avoid the problem becoming unmanageable. The main methods to invalidate the cache are as follows:
 
  • Have the CDN cache expire based on HTTP Header information (check out the Expires, Cache-Control and Vary headers)
  • Notify the CDN that something has changed, at the point of publishing or (re/un-publishing) an asset
  • Have the whole site or sections of the site invalidated on a schedule (eg nightly)
  • Manually flush the Cache using the GUI of the CDN

The first approach is the simplest and represents a no-code integration with the CDN if you are using the Pull approach. If you want you can allow the expiration to be controlled by business logic in the web application and/or configured by content editors specifying metadata on the assets or their organizational items.

As mentioned previously, you will get the second approach for free if you implement a Push method of integration – when you push the asset to the CDN, it will know that it needs to flush any old version from the cache (be sure to handle un-publishing also). This approach can easily be integrated also for Pull integrations, by implementing a Storage extension. Here you can hook into the publish transaction and add a CDN notification for the assets published (or re/un-published). The technical details on how to do this are given in my SDL Tridion World article. Its worth noting that I would abstract any functionality to connect to a CDN to push assets or make notifications into a separate webservice, and then hook this into your storage extension. The benefits of this are that you can code this in whatever technology makes sense (Storage extensions must be Java) and you can also call this from other parts of your system architecture if required (for example, hook up an external DAM, PIM or eCommerce system to provide page invalidation when updates are made from these).

With a dynamic website where content is shared and linked across many pages,  it is worthwhile considering the third approach. High-volume and volatile sections of the site could be flushed on a short schedule (perhaps every 10 minutes to an hour) whereas the whole site might perhaps be flushed on a nightly basis.

The last approach is always useful as a back-up. There will always be a time when parts of the site need to be flushed, and this could be the best way to do this quickly.

Some final thoughts

The above considerations have hopefully given you plenty of food for thought if you are looking at CDN integration. What seems initially like a simple concept, can quickly become complex and perhaps there are more questions than answers, as every site has different content and different requirements and restrictions on caching possibilities.

The best advice is to be pragmatic – like any form of caching there is always a compromise to be made and you will never find a perfect solution (if you do – let me know!). 

Below I list some final thoughts on the subject which I didn’t manage to fit into the sections above.

  • Avoid tying your publishing model and web application architecture to a particular CDN – there are many many CDN providers out there, and you may well want to switch in the future (for performance, functionality, or financial reasons) 
  • Make sure your site works normally without a CDN. This is related to the point above but also makes it much easier to manage your staging or preview websites, plus other environments (dev, test etc.) which probably will not use CDN.
  • When notifying a CDN of updates to the site and using a Pull integration, CDNs often have the possibility to do a instant update of the cache (CDN requests asset immediately) or on-demand (CDN waits until the next time the asset is requested by a visitor). If you are doing bulk publishing, the first approach can result in high volume of traffic to your own webservers (from the CDN, requesting all the updated pages). The second approach will better distribute the load on your servers through time.
  • For static assets like JS and CSS, consider putting the version number as part of the file name (eg styles-v23.css) along with making CDN cache invalidation easier, this will also help you overcome issues with browser caching.
 

3 thoughts on “Some thoughts on CMS/CDN Integration

  1. Excellent breakdown of the who, what, how, and why’s for CDN with Tridion, Will. I like the points on complexity and practicality. Sometimes it’s hard to realize the trade-offs and impossibilities of having competing requirements (e.g. cached/fast but instantaneously update-able/removable).

  2. Nice post Will. Thanks for sharing.

    I’ve seen customers work with the same CDN vendor using the same approach (for example Pull) yet for one customer it works perfectly but not for another it doesn’t work well at all. My conclusion was that the master services contract that the client has with the CDN vendor often dictates what you must use with Tridion (I’m talking about the case when Tridion is migrated to from another technology).

    So often you’re tied to one specific model, and the beauty of Tridion is that it is flexible enough to hook up to the CDN any way the customer wants (or needs).

  3. Pingback: Client-side Templating with JSON, oData and Angular | SDL Tridion Developer

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>