When Virus Scanning Goes Wrong

Recently I’ve been working with a client that has fairly heavy publishing requirements. The blueprint contains over 300 publications, each containing a couple hundred pages. This means a release with widespread impacts can easily lead to 10-20 thousand published items. Luckily, we’ve scaled out publishing to the point where this ordinarily isn’t a problem. However, we were still noticing intermittent problems with our deployer. These came up in two main ways:

  • Publishing getting stuck in a “throttled state”, even when all other tasks in the queue were either successfully published or waiting for publish
  • The publishing queue showing all publishes stuck on a “ready for transfer state”, even though the files on the file system were being updated

As I mentioned, this seemed to happen intermittently and randomly. Sometimes we could get 5000 items published before running into issues, other times it was only a few hundred. Of course, the first thing we did was to check the deployer logs. In there, we found the following error around the times publishing would fail:

com.tridion.deployer.ProcessingException: Unable to read the Schema file

This error was a new one to our team, and there was no sign of a solution in Stack Exchange, so we had to do a little investigation. Our first thought was that maybe a file was getting missed by Tridion as it was preparing its publish packages and these missing files were breaking the deployer. We were skeptical of this explanation because we thought it was unlikely that Tridion was randomly omitting key files when publishing, and when we turned off cleanup of the incoming folder and had a chance to examine the packages this was confirmed. Packages that triggered errors and packages that didn’t were identical. So, what else could prevent the deployer from finding what it needed? For a while this stumped us. Then we happened to be looking through the Tridion wish list our client keeps for us, and one entry caught our eye:

Check if real time virus scanning is happening on the incoming folder on the deployer

I had always assumed this was our client being keen on security and wanting to make sure the incoming folder was being scanned, but we started wondering, what if this was actually the cause of our publishing woes? What if each incoming publish package was already being scanned for viruses, and what if the virus scanner was locking files just as the deployer needed them? So, we did check if real time virus scanning was happening on the incoming folder, and low and behold, it was. After some extensive discussion with the client’s security team we convinced them to let us add our incoming folder to the exception list for virus scanning. Testing left us optimistic that we had found the source of our issues, and, one release to our staging environment and one release to our production environment, each with over 20,000 pages published and not a single pause in publishing (a new record for unbroken publishing, but some 35,000 pages) we feel quite confident the virus scanning was our problem.

Once we had fixed this we got to thinking, could virus scanning be causing any other issues in our environments? Naturally, we thought of Content Porter, which actually works similarly to the publishing process with Tridion, with zip files being unzipped and processed in a temporary folder. Also similarly to publishing, Content Porter had been a sore point for this client for a long time, specifically how long it took on the unzipping and processing steps. This sounded so much like the same issue we’d just solved that we convinced the security team to add the Content Porter temp folder to the virus scanner exemption list. Since then, Content Porting has only been the usual adventure we’re accustomed to with other environments, with no special hangups for this client. Mission accomplished again!

Obviously, virus scanning is still important, and it shouldn’t be disabled in random places throughout your environment. However, it is worth being aware that it can cause some unexpected issues if you’re not careful about when and where it’s running.

2 thoughts on “When Virus Scanning Goes Wrong

  1. This is indeed something to watch out for. The problem is that the transport package file is written to disk with a .busy extension so that the deployer does not see it as interesting before it’s finished writing. When the file is finished writing, the file lock is relinquished, and the uploader immediately tries to rename the file without the .busy extension. In between these two moments, the virus scanner takes a lock on the file, which it still has when the rename attempt takes place.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>