The Twitter data extraction begins!
November 9, 2010 5 Comments
Today I started with the implementation of the extraction package. The package contains two models: one for the user’s profile and one for the user’s tweets. They then will be annotated in the “Annotator” module and converted into a list of simple triples in the “Triplifier” module. It is being implemented in PHP. There is a very good API: PHP Twiter with OAuth to help with this task.
I started with the User Profile Model. First I had to get familiar with the Twitter API and revise the PHP basics. Therefore I created a page that simply returns all information about a user and does some old-fashioned html formatting, just to make it readable. You can try the script yourself with any public Twitter user profile. I added a screenshot with the result of my own Twitter profile.
http://linkeddata.semanticprofiling.net/test/example_noauth.php?user=USERNAME
The next step is to convert this table into triples. In the extraction phase it is not necessary yet to format them into RDF. I created a simple PHP class TripleTree to represent them. Basically it contains a subject and a mapping to all their properties and their objects. This makes it a lot easier to collect object properties that link to the same subject node. This TripleTree is then represented under the form of a table. The subject is written bold on top of the table. The left column contains the property links and the right column the object nodes. In the screenshot you can see the result for my Twitter profile.
http://linkeddata.semanticprofiling.net/test/tripleview_noauth.php?user=USERNAME
The next step is to grab the tweets from Grabeteer and connect them to the user profile with another triple. Therefore we will need a TripleTree for each post and a TripleTree for the user’s timeline. The user’s timeline will form the connection between the user and all of its posts. This will be done inside the annotator module. Then all the tripletrees will be converted into a list of simple triples. This list will be sent to the Interlinking layer.
Related articles
- Is Twitter RDFization or triplification by Virtuoso usable? (laurensgoessemantic.wordpress.com)
- Semantic Microblogging Architecture (laurensgoessemantic.wordpress.com)




Pingback: From a valid RDF/XML for Twitter users to a dynamic SPARQL Endpoint « Laurens goes semantic…
Pingback: Thinking about interlinking data contexts « Laurens goes semantic…
Pingback: From a valid RDF/XML for Twitter users to a dynamic SPARQL Endpoint | Laurens goes semantic
Pingback: Thinking about interlinking data contexts | Laurens goes semantic
Good luck with that. “RDfizing” twitter data sounds like an incredibly complicated challenge due to the realtime aspect of a tweet. However, starting by targeting only fraction of it sounds like a fair start.
What are your views on the use of a semantic twitter?
Michael