Are hashtags a good choice as linked data identifiers?

In this project hashtags are used as the most important identifiers to link users. The tags are considered as good identifiers because the user intends and attaches the hash to engage in a conversation in which others use this hashtag. But does it make sense to conclude that they are also good identifiers in linked data?

In the paper “Making Sense of Twitter” [1], David Laniado and Peter Mika first took a look at whether hashtags behave as strong identifiers, and thus whether they could serve as identifiers for the Semantic Web. Twitter users have adopted the convention of adding a hash at the beginning of a word to turn it into a hashtag. Hashtags are meant to be identifiers for discussions that revolve around the same topic. When used appropriately, searching on these hashtags would return messages that belong to the same conversation (even if they don’t contain the same keywords), and thereby solving the aggregation prob- lem. Coincidentally, this is the same function that strong identifiers (URIs) play in the Semantic Web. The questions they asked then is which hashtags behave as strong identifiers (if any), and if they could be mapped to concept identifiers in the Semantic Web?According to the authors there are a number of desirable criteria that a hashtag should fulfill in this role, similar to how ‘cool URIs’ are differentiated from poor URIs: frequency, specificity, consistency in usage and stability over time.  In line with previous works on the analysis of folksonomy systems [2], they capture the semantics of the hashtags by their usage in the social media system. In particular, they represented the meaning of hashtags using a Vector Space Model (VSM) [3].  For this study they relied on a dataset of 539,432,680 messages, collected over the whole month of November 2009 (about 18 million per day).

In order to assess how well their metrics were able to indicate which hashtags represent stable concepts with a unique identity, they have performed a manual evaluation on a random sample of 257 hashtags. Slightly more than half of the tags (137) could be associated to a Freebase entry; this is higher than the number of named entities because Freebase contains also some general terms. As expected, most application and sentiment tags could not be mapped to Freebase. Only 33% of application and 14% of sentiment tags could be resolved, and many of these mappings are rough approximations of the intended meaning.

Laniado and Mika found that not all hashtags are used in the same way, not all of them aggregate messages around a community or a topic, not all of them endure in time, and not all of them have an actual meaning. In this work they had addressed the issue of evaluating Twitter hashtags as strong identifiers, as a first step in order to bridge the gap between Twitter and the Semantic Web. The first contribution of this paper stands in the formalization of the problem, and in the elaboration of a number of desired properties for a good hashtag to serve as a URI. Based on these data, they had tested the results obtained with the algorithms described in their paper, showing how a combination of the proposed measures can help in the task of assessing which tags are more likely to represent valuable identifiers. These results are promising, with respect to the perspective of anchoring Twitter hashtags to Semantic Web URIs, and to detect concepts and entities valuable to be treated as new identifiers.



  1. Laniado, D., Mika. P.: Making sense of twitter. In: The Semantic Web–ISWC 2010 (2010)
  2. Cattuto, C., Benz, D., Hotho, A., Stumme, G.: Semantic grounding of tag related- ness in social bookmarking systems. In: Sheth, A.P., Staab, S., Dean, M., Paolucci, M., Maynard, D., Finin, T., Thirunarayan, K. (eds.) ISWC 2008. LNCS, vol. 5318, pp. 615–631. Springer, Heidelberg (2008)
  3. Raghavan, V.V., Wong, S.K.M.: A critical analysis of vector space model for infor- mation retrieval. Journal of the American Society for Information Science (1986)



About laurensdv
Computer Science Student, interested in creating more innovating user experiences for information access. Fond of travelling around Europe!

2 Responses to Are hashtags a good choice as linked data identifiers?

  1. Pingback: The Origin of Hashtags and their role in the upcoming Web 3.0 | iUriel

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: