Something more about storing triples

6-10. RDF Graph Data Model of Triples
Image by Peter Morville via Flickr

First of all it’s to be said that all the available triplestores, both the open source packages as the commercial services do their job. (In the picture: a triple) It’s a matter of preference and what you expect from the system. The most common opensource sytems that are widely in us as off May 2009 are listed are:

The systems that use native stores performed best, as discussed in a BioPortal report. Mulgara outperforms the Jena SDB (linked to another database). This is of course to be expected, since those systems do not depend on the implementation of a third party (MySQL) storage system. Recently Mulgara also connects with the Jena API and integrates with Sesame. It thus seems to support a broader framework. If a choice is made for Sesame, it’s better to also use the Sesame Native store. However there is a possibility to use it with Mulgara. We discussed Virtuoso in a previous post.

A W3C test of many RDF Storage Systems in terms of their support for SPARQL learns that ARQ, OpenRDF and RDF::Query succeeded in the tests perfectly. At the moment it’s not exactly clear what a bad score means here, but I assume it means that the compliance with standard SPARQL queries isn’t that good. This affects the RDF interoperability and cross-compatibility of the systems.

For web development, ARC is the most adopted system – thanks to the popular CMS Drupal. At first sight it’s the most friendly one, however we would like to see how it performs. We don’t quite expect any scalability issues. PHP – MySQL is a well proven combination and supported by many webhosts. Easy to deploy-it-yourself.

In case I find that I need a more performant system is needed, I have to take a look at Sesame and a Jena – Mulgara combination. Jena also has a native store now. So first we should find out the API specifics. In another related post I wrote about an article Passant et al. wrote about microblogging, in which they were testing several SPARUL implementations for their platform. They chose for ARC, Jena and Openlink Virtuoso.

In any way it’s already obvious that all systems, RDF Storage API’s and databases, are moving forward and becoming more user-friendly. However it doesn’t seem like a good idea to nest the implementation and architecture of my semantic profiling application to deep in any of them. I’m considering to use a storage layer that solves this issue and allows a smooth switch between different systems. Due to time restrictions I might be forced to make a choice based on merely the advertising and some proven cases. This last shouldn’t worry me because all systems, including those I mentioned in this post, have good cases.


About laurensdv
Computer Science Student, interested in creating more innovating user experiences for information access. Fond of travelling around Europe!

One Response to Something more about storing triples

  1. Pingback: SPARQL – Parte I – Como Instalar o Virtuoso (Banco de Dados de Triplas) « Renan Oliveira [Blog]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: