The schemas are the easy bit… right?

After you have been developing with SDL Tridion for a while, its easy to be a bit blasé about Schemas. Thats the easy bit right? Bung a few fields in, mandatory or not? multi-value? Configure the RTF fields a bit and off we go… on to the tricky parts: templates, integrations with 3rd party systems, automation etc. etc.

Making a schema is so simple that it is easy to forget how complex good schema design is. In this article I hope to bring a reminder to implementors of all levels of ability just what makes good schema design. 

Lets start with a simple example. Below you see a news article from the website . Grab a paper and pen and take a minute to look at it, then scribble down a schema design (you know the kind of thing, a table with rows for each schema field containing field name, field type, multivalue?, mandatory? & other notes/configuration). Hopefully its obvious, but I have highlighted the bit of the page we are interested in with a red box.

Emirates News

Are you done? How did it go? Did you come up with something like this?

Name Type Notes
Headline Text Mandatory, single value
Date Datetime Mandatory, single value
Subheading Text Non Mandatory, single value
Body Text RTF, Mandatory, single value

Or did you find yourself wanting to ask some questions before you felt comfortable putting something down on paper? If you did not then you are already slithering down the muddy slope of bad schema design. There is nothing wrong with using an HTML or Graphic design as a start point for schema design, however it is just that – a start point.

So what are the questions I would be asking, and why?

Question 1: Where else is this content shown?

The schema above will serve reasonably well for the page shown, but as we all know, the advantage of using a CMS like SDL Tridion with its Blueprinting features and separation of Content from Design is that the same piece of content can be reused in different formats throughout a single or group of websites, and even in other channels. We most likely have a news listing or summary page which shows each article with a piece of summary text. We may also want to use this news item in an Email newsletter, showing a short intro which then links through to the website. The website search may also display a summary. So lets add a Summary RTF field – thats a good start, but will this handle all of the above? Perhaps 3 different fields are required, you also need to engage in dialogue with the Content owners to know for sure.

Question 2. What ‘invisible’ information is there?

Typically web content has metadata, very least a description and/or set of keywords for SEO purposes. So lets add 2 metadata fields – Description (Text) and Keywords (Text) both mandatory. But wait a minute, most content we create for web will use this, so wouldn’t it be useful to have a common definition of our standard web metadata? Better to create an embedded schema which we can reuse to define metadata for all our web content schemas and not just news. What other invisible information is there? Look at navigation, do we display this content in a navigation menu, or breadcrumb? Does it make sense to use the Headline? Maybe much of the time it does, but for a lengthy Headline we might want to add an override. OK, we can simply update the embeddable schema with a Navigation Override (Text) field and its there for use across all our web content schemas.

Question 3. Who is creating the content?

We just added Description and Keywords metadata fields. In an organization that takes SEO particularly seriously there may be a specialist Editor whose task it is to ensure all content has consistent and effective SEO metadata. Perhaps we don’t want our normal editor’s to fill in this information, but rather leave it to the SEO guy. Better make those fields non-mandatory then. If we want to block publishing to live if no values are set, put a check in the Template code. Over-reliance on mandatory fields can be frustrating for an organization which creates its content over separate sessions, with different roles, or in a (online or offline) Workflow.

Question 4. Is there a content classification strategy?

This is related to question 2. Perhaps Emirates has a set of standard tags used to classify content. This piece of news is about a destination (Washington DC). There may be other content related to destinations (special offers, hotels, city guides, multimedia etc.). SDL Tridion’s taxonomies functionality provides an excellent way of tagging content with a common taxonomy which can be used to create a richer website visitor experience (this could be as simple as showing related images in the Emirates Media Player in the centre of the page, to more complex personalization logic like tracking the types of news article a user reads and delivering targetted special offers on other pages). We could add a few fields here, based on the classification strategy, but for now lets just add a Destination (Text, multivalue, selected from list based on Taxonomy). Is this classification important to your website functionality? If so, consider making it mandatory, and put it as a content field, near the top, rather than hidden on the metadata tab, or at the bottom of the field list.

Question 5. Are parts of the content reused or localized?

Look at the Emirates news above again. Maybe you noticed this already, but at the bottom there is a section with a heading ‘About Emirates’. Is this a standard footer for news articles (and perhaps other content types)? If so, we need to separate it out in the schema design. Lets create a Footer (Component Link, non-mandatory) field. Now we can manage one or more standard footers separately from the news articles themselves. Use this same process with regards to localization – this is not particularly applicable to the example we have here, but it may be that only part of the content and metadata needs to be localized for a local website or market, in which case the content model should support it. Split the localizable parts into a separate schema which is linked from the main schema. An example of this might be hotel information. We have a number of characteristics which remain unchanged, such as the hotel name, address, number of bedrooms etc. but also a number of elements which we wish to translate such as a description, list of amenities etc. If we modelled this with a single schema, there would be a big headache to update all the localized versions if the name, address or number of rooms needed an update.

Question 6. What functionality do we want to support?

Its possible that you want to show a filterable, sortable list of news items on your site to allow visitors to browse current or perhaps even archived news articles. Perhaps you want to have an RSS feed containing 10 news items ordered by article date. The content model needs to support this. If our date field is a content, rather than metadata field, our options to implement dynamic functionality on the website are limited. Standard content fields cannot be used for sorting or filtering (the exception to this is taxonomy fields, like the Destination field we added earlier). Lets make that Date field metadata.

Question 7. When and what to RTF?

We have a big RTF field for our content. This is a very free format field where editors can add all sorts of markup (images, bullets, links, tables and more). Do we want this? Do our editors have enough skill to manage this properly? Does the HTML design require knowledge of particular CSS classes and HTML elements to provide the look and feel required? should we really have this kind of markup in our content? It is very easy to cross the boundary which separates Content from Design in an RTF field and there is no right answer in how to handle this. On the one hand one big RTF can be very flexible and efficient to work with (for an editor with the right skill level), on the other hand you could easily end up with a big mess on your page as you have less control over the way the content is presented. Deeper analysis is needed to find the right balance, but at the very least ensure you define carefully the elements which are allowed in the RTF (there are a lot of configurable options), any filtering required (especially if content is copy pasted from other web pages, or MS Word) and consider creating an embeddable schema for your RTF so you can define this configuration once and reuse across other schemas.

Are we done? Well we have a better vision of a solid content model for right now, but what about the future? A good content model should also be capable of adapting to future requirements and organizational changes. Now think about where the organization might be in a year’s time and revisit all those questions above. Is there the likelyhood that content will be used in new channels (email, mobile web, mobile apps)? Are you going to to implement workflows in later phases? If so it will be useful to ensure that if you have different workflows, you one schema per workflow, even if the fields are the same. Are you likely to want to implement content queries in the website, with filtering/sorting or taxonomy based browsing? Are you going to implement personalization? If so, how does this relate to your content? If you get a solid content model from the start, there may not be any need to revisit and republish all your content.

So did you have all that covered in your original scribble? I expect there to be some things I missed out, or make some statements that not everyone agrees on so please share your experiences and ideas.

One thought on “The schemas are the easy bit… right?

  1. Great explanation on schema design, I especially like the reminder on metadata. I’ve seen schema that skip the metadata tab because the content is being rendered to XML (for the time being).

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>