Open Innovation Problem Solver Search

This project presents a case in which technical scientists are linked to each other. The intention is to connect scientists that have similar interests. In “Open Innovation” experts of different institutions and companies try to collaborate and increase the rate of technological innovation. M. Stankovic wrote a paper [1] to check if linked data contributes in these efforts.

According to this paper of M. Stankovic there are many potential sources of evidence about users’ interest and expertise (e.g., research papers, blogs, activities) are becoming ubiquitously present as Linked Data. In their paper they presented a research effort for suggesting the right way to search for potential Open Innovation problem solvers in Linked Data sources, by looking at the structure of available data sources. In addition, the author sought to develop ways of suggesting domains of expertise that are in some way relevant to the domain of the Open Innovation problem, in order to enable a cross-domain solution transfer.

Expert finding approaches mostly took a legacy data set (research paper pdf files [2], e-mails, documents, etc.) and tried to extract structured data containing some traces about user’s knowledge and interest. These data are later used for ranking them by the level of expertise. In each of those approaches there is an implicit assumption about what does it take to be an expert, and what makes one a better expert then another. The dataset that is used partially determines this assumption. An approach that uses research papers as a corpus, implicitly assumes that persons who wrote a research paper on a certain topic are experts on the topic. The author called such assumptions expertise hypotheses.An example expertise hypothesis would be: “authors of two or more research papers on a particular topic can be considered as experts on that topic”. Different expert search approaches use different expertise hypotheses.

This project expert case is: “Twitter users that are tweeting about certain conference have an interest in that research topic”. Even more accurate: “Two Twitter users who are constantly tweeting about similar scientific conferences are experts in a similar research topic”.

The authors noted a difference from the time before Linked Data (LD). At that time expert search approaches had to focus on a particular hypothesis based on the data that was available to them. LD-based approaches can benefit from plenty of different kinds of traces and can choose among many expertise hypotheses. Therefore the choice of an appropriate expertise hypothesis becomes a challenge for LD-based expert finding. The authors proposed a way to suggest appropriate expertise hypotheses for a given problem topic, based on the type of user trace that is used in those hypotheses. They argued that experts from different domains would use different communication channels (e.g., one domain mostly tweets, the other mostly blogs) and leave different user traces. Detection of such patterns would allow to choose the expertise hypotheses that rely on traces significant for the given domain. M. Stankovic proposed to explore the structure of LD and establish LD metrics that would help to identify good evidence types for particular topics. These metrics might also be beneficial in choosing the right data set in a scenario of running distributed queries over several data sets.

Hence our framework could also easily support more hypotheses considering the fact that it is based on linked data. As stated many times before. We limit the evaluation to a proof-of-concept for one hypothese.

A simple metric was defined as the number of available instances. Further on, it would be interesting to know the number of instances of a certain type, having some particular concept property. The author thus defined another metric as a set of concepts that identify topics that are associated with the instances to be counted. A similar metric is in fact already used by systems that use data summaries to accelerate query execution like [3]. Those graph summaries could directly serve as a source of that metric. If the data taken into account is a representative subset of world’s data, then higher values of should indicate that most of interactions around a particular topic are happening on a particular type of medium, and thus the use of such sources might result in higher precision.

This projects framework uses a similar metric. To calculate the relevance of a certain item, the count of unweighted links to that item is computed. To improve results, weighted links could be considered. This is however not in the scope of this thesis.

M. Stankovic assumed that prevailing use of particular topics with particular type of trace instances could positively influence the effectiveness of expert search. They constructed a sample, rich enough data set with different types of traces and many different topics. Such a data set will enabled them to launch experiments for evaluating the correlation of particular LD metrics with the precision and recall of expert identification. They first imported existing public LD data about user traces (mostly publications). In addition Sindice.com gave us a number of public data sources containing blog and publications data. To enrich the set we constructed an extractor for conference event tweets.

The author has identified main challenges for the Semantic Web technologies to help perform Open Innovation on the Web. They have designed an approach for finding potential solvers on LD, by picking the right expertise hypothesis.

 

References

  1. Stankovic, M., Open Innovation and Semantic Web: Problem Solver Search on Linked Data. iswc2010.semanticweb.org
  2. Buitelaar, P., & Eigner, T. (2008). Topic Extraction from Scientific Literature for Competency Management.The 7th International Semantic Web Conference. Karlsruhe, Germany.
  3. Harth, A., Hose, K., Karnstedt, M., Polleres, A., & Sattler, K. (2010). Data Summaries for On- Demand Queries over Linked Data. In Proceedings of the 17th international conference on World Wide Web, WWW2010 (pp. 411-420). Raleigh, NC, USA: ACM Press.
Advertisements

About laurensdv
Computer Science Student, interested in creating more innovating user experiences for information access. Fond of travelling around Europe!

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: