SI4T-Solr 1.2 for Web 8

Looking at code you wrote three years ago can sometimes be a confronting experience. In case of the code I wrote for SI4T, this was no different. After you go through the six stages of debugging and cringe at the aestestics of the code base, the only thing you can do is rewrite the stuff with the knowledge of today and be solaced by the fact that your code is actually used in production environments and not breaking too much.

The main reason to revisit SI4T-Solr and its brother, the SI4T Storage Extension, is the fact that in Web 8, subtle changes have been made in how it loads overridden DAO Factories. While working on a release of si4t-se, for which the changes are described in this post, we took the chance to overhaul SI4T-Solr a bit as well.

The biggest change in version 1.2 is that SI4T-Solr uses a minimum JRE version of 1.8, mainly because the SDL Web 8 CD stack also uses that as the minimum version. The big consequence of it is that while previous versions of SI4T-Solr were built on Solr versions 4.0.0 to 4.4.0, the minimum version to build against in order to use the JRE 1.8 is 4.8.0. Luckily, Solr and it’s Java client SolrJ are very good at keeping things compatible between major and minor versions, so reworking that took a minimal amount of time. It does mean that it is recommended, but not required, to upgrade your Solr instance to at least that version as well to guarantee proper indexing.

The other big change in SI4T-Solr is that the embedded option is removed. This option allowed SI4T to connect to a Solr instance directly on the filesystem to index items. This was thought to be handy in case of bulk indexing, but in practice it turned out to be rarely needed – bulk indexing over http and REST is just as good. The embedded option was giving a lot of overhead in dependencies: it meant that you had to have pretty much the entire Solr stack in your lib directory. As of SI4T-Solr version 1.2, you don’t need to have a dependency on solr-core anymore, which is great because the list of dependencies you have to put into the Deployer service then becomes minimal:

    <dependency>
        <groupId>org.si4t</groupId>
        <artifactId>si4t-se</artifactId>
        <version>1.2</version>
    </dependency>
    <dependency>
        <groupId>org.apache.solr</groupId>
        <artifactId>solr-solrj</artifactId>
        <!-- anything over SolrJ 4.8 goes -->
        <version>4.10.2</version>
    </dependency>
    <dependency>
        <groupId>commons-io</groupId>
        <artifactId>commons-io</artifactId>
        <version>2.4</version>
    </dependency>
    <dependency>
        <groupId>org.apache.httpcomponents</groupId>
        <artifactId>httpclient</artifactId>
        <version>4.3.3</version>
    </dependency>
    <dependency>
        <groupId>com.tridion</groupId>
        <artifactId>cd_model</artifactId>
        <version>${tridion.version}</version>
    </dependency>
    <dependency>
        <groupId>com.tridion</groupId>
        <artifactId>cd_core</artifactId>
        <version>${tridion.version}</version>
    </dependency>
    <!-- Web 8 new -->
    <dependency>
        <groupId>com.tridion</groupId>
        <artifactId>cd_common_util</artifactId>
        <version>${tridion.version}</version>
    </dependency>

    <dependency>
        <groupId>com.tridion</groupId>
        <artifactId>cd_common_config_legacy</artifactId>
        <version>${tridion.version}</version>
    </dependency>

On other change caused by Web 8, is that the good old com.tridion.configuration package was moved out to a new CD dependency called cd_common_config_legacy. This dependency in turn depends on cd_common_util and both are required to build. If you deploy SI4T into your deployer, they are already present, as are cd_core and cd_model.

The only jar files you will have to put in de Deployer’s services/deployer-service directory are:
– org.si4t:si4t-se:1.2
– org.si4t:si4t-solr:1.2
– org.apache.solr:solrj:4.8.0+
– org.apache.httpcomponents:httpclient:4.3.3
– commons-io:commons-io:2.4
– org.apache.httpcomponents.httpcore:4.3
– org.apache.httpcomponents.httpmime:4.3.1
– org.apache.zookeeper:zookeeper:3.4.6
– org.codehaus.woodstox:wstx-asl:3.2.7
– org.noggit:noggit:0.5

Finally, we also took some time to make development on SI4T a bit easier. We converted the codebase to a Maven project and introduced Git Flow in order to have the release strategy a bit more in line with common practice. Combined with finally having a proper releases page and not having jar files released in git branches anymore, it’s safe to say SI4T has matured a little more this year.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <s> <strike> <strong>