Something more about storing triples
October 27, 2010 1 Comment
First of all it’s to be said that all the available triplestores, both the open source packages as the commercial services do their job. (In the picture: a triple) It’s a matter of preference and what you expect from the system. The most common opensource sytems that are widely in us as off May 2009 are listed are:
- Openlink Virtuoso (Virtuoso)
- Jena SDB (ARQ)
- Mulgara (Native)
- Sesame (OpenRDF – Native)
The systems that use native stores performed best, as discussed in a BioPortal report. Mulgara outperforms the Jena SDB (linked to another database). This is of course to be expected, since those systems do not depend on the implementation of a third party (MySQL) storage system. Recently Mulgara also connects with the Jena API and integrates with Sesame. It thus seems to support a broader framework. If a choice is made for Sesame, it’s better to also use the Sesame Native store. However there is a possibility to use it with Mulgara. We discussed Virtuoso in a previous post.
A W3C test of many RDF Storage Systems in terms of their support for SPARQL learns that ARQ, OpenRDF and RDF::Query succeeded in the tests perfectly. At the moment it’s not exactly clear what a bad score means here, but I assume it means that the compliance with standard SPARQL queries isn’t that good. This affects the RDF interoperability and cross-compatibility of the systems.
For web development, ARC is the most adopted system – thanks to the popular CMS Drupal. At first sight it’s the most friendly one, however we would like to see how it performs. We don’t quite expect any scalability issues. PHP – MySQL is a well proven combination and supported by many webhosts. Easy to deploy-it-yourself.
In case I find that I need a more performant system is needed, I have to take a look at Sesame and a Jena – Mulgara combination. Jena also has a native store now. So first we should find out the API specifics. In another related post I wrote about an article Passant et al. wrote about microblogging, in which they were testing several SPARUL implementations for their platform. They chose for ARC, Jena and Openlink Virtuoso.
In any way it’s already obvious that all systems, RDF Storage API’s and databases, are moving forward and becoming more user-friendly. However it doesn’t seem like a good idea to nest the implementation and architecture of my semantic profiling application to deep in any of them. I’m considering to use a storage layer that solves this issue and allows a smooth switch between different systems. Due to time restrictions I might be forced to make a choice based on merely the advertising and some proven cases. This last shouldn’t worry me because all systems, including those I mentioned in this post, have good cases.
Related articles
- The D2RQ Plattform v0.7 – User Manual (wiwiss.fu-berlin.de)
- Semantic Microblogging Architecture (laurensgoessemantic.wordpress.com)
- Is Twitter RDFization or triplification by Virtuoso usable? (laurensgoessemantic.wordpress.com)

Pingback: SPARQL – Parte I – Como Instalar o Virtuoso (Banco de Dados de Triplas) « Renan Oliveira [Blog]