SPARQL Endpoint set-up and load any twitter profile into the RDF Store

This weekend I optimized the triplification and annotation process for every twitter user. From now on it is possible to load any twitter user and store the annotated triples in the ARC2 TripleStore. A SPARQL endpoint allows querying

For now you can be load your own twitter account and associated tweets into the system with this url:

http://linkeddata.semanticprofiling.net/interlinking/provider.php?user=your_username

Contribute to the semantic web and do it NOW! 🙂 Any questions or extreme load times, I will be happily to look into and fix it!

The SPARQL Endpoint can be accessed on:

http://linkeddata.semanticprofiling.net/interlinking/endpoint_handler.php

An example query is all persons who live in Austria:

SPARQL Example Query
SPARQL Example Query
It produces the following result:
SPARQL Query Result

SPARQL Query Result

For sure there are more users that actual have a reference to Austria. But therefore we must optimize the interlinking between the difference locations. This will be the subject of the next iteration. Ideally we also want to get rid the uri’s that are actually data queries and rather point them to an RDF resource or a resource that exists in the local namespace.Not optimal URI
Such a referencing is implemented for persons and tags already.
Better referenced URI for tags and persons

Better referenced URI for tags and persons

Furthermore it has been made sure that these URL have at least a basic representation
Local referenced tags URI html representation

Local referenced tags URI html representation

The first 90 Grabeeter users were loaded in more than 1h (about 102 minutes). This is quite slow. A representative datasource for profiling should contain about 10 000 users – based on current studies, so this will take about 200 hours (8 full days) to load.

Loading users into triplestore

Loading users into triplestore

Dynamic loading of users based on a random path has proven useful in some cases. Or a path could start from a user and then peform some kind of heuristic search through its friends and its friends friends and so on. Thereby activating first a sneak peak profile to calculate the node weights and so determining an optimal path. But then search time would increase significantly. This problem is matter for the next iteration.

Currently there are about 100 users loaded in the TripleStore and they are the source of 1,8M triples annotated data (About 500 MB storage in the database). Finally we will need an estimate of 50GB data. Originally the performance bottleneck and physical storage issues were not a part of this project, in the sense that this project is not going to look into finetuning and storage optimization. But it is important to find at least some intelligent way to find some more intelligent way of how to load the users without the need of a 200 hour update cycle.
Advertisements

About laurensdv
Computer Science Student, interested in creating more innovating user experiences for information access. Fond of travelling around Europe!

4 Responses to SPARQL Endpoint set-up and load any twitter profile into the RDF Store

  1. Pingback: Tweets that mention SPARQL Endpoint set-up and load any twitter profile into the RDF Store « Laurens goes semantic… -- Topsy.com

  2. Pingback: How to Analyze Wikileaks Data – R SPARQL « DECISION STATS

  3. Pingback: Profiling and Discovery API functions for Grabeeter (TUGraz), 1st version « Laurens goes semantic…

  4. Pingback: Profiling and Discovery API functions for Grabeeter (TUGraz), 1st version | Laurens goes semantic

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: