Proceedings

show/hide abstracts

The Semantic Web - ISWC 2013 -- 12th International Semantic Web Conference, Sydney, Australia, October 21-25, 2013, Proceedings, Part II

Doctoral Consortium Paper

Assessing Content Value for Digital Publishing through Relevance and Provenance-based Trust , Tom De Nies , 417-424 , [OpenAccess] , [Publisher]
Due to the abundance of content on the Web, content authors and publishers have a pressing need for systems that select content that is valuable for them, is trustworthy and is related to their own work. Additionally, the value of their own work needs to be assessed before it is published, to guarantee high value for the consumer. In this doctoral research, we investigate how to use Semantic Web technologies to automatically assess the value of content that is – or is about to be – digitally published. To achieve this, we propose methods to assess the relevance of content to existing publications, retrieve or reconstruct its provenance, and derive a trust assessment from this provenance. We discuss our evaluation methods, and present some preliminary results.
Crowdsourcing Ontology Verification , Jonathan Mortensen , 441-448 , [OpenAccess] , [Publisher]
As the scale and complexity of ontologies increases, so too do errors and engineering challenges. It is frequently unclear, however, to what degree extralogical ontology errors negatively affect the application that the ontology underpins. For example, “Shoe SubClassOf Foot” may be correct logically, but not in a human interpretation. Indeed, such errors, not caught by reasoning, are likely to be domain-speciﬁc, and thus identifying salient ontology errors requires consideration of the domain. There are both automated and manual methods that provide ontology quality assurance. Nevertheless, these methods do not readily scale as ontology size increases, and do not necessarily identify the most salient extralogical errors. Recently, crowdsourcing has enabled solutions to complex problems that computers alone cannot solve. For instance, human workers can quickly and more accurately identify objects in images at scale. Crowdsourcing presents an opportunity to develop methods for ontology quality assurance that overcome the current limitations of scalability and applicability. In this work, I aim (1) to determine the effect of extralogical ontology errors in an example domain, (2) to develop a scalable framework for crowdsourcing ontology veriﬁcation that overcomes current ontology Q/A method limitations, and (3) to apply this framework to ontologies in use. I will then evaluate the method itself and also its effect in the context of a speciﬁc domain. As an example domain, I will use biomedicine, which applies many large-scale ontologies. Thus, this work will enable scalable quality assurance for extralogical errors in biomedical ontologies.
Interactive Pay as you go Relational-to-Ontology Mapping , Christoph Pinkel , 449-456 , [OpenAccess] , [Publisher]
Ontology Based Data Access (OBDA) enables access to relational data with a complex structure through ontologies as conceptual domain models. To this end, mappings are required. A key aim of OBDA is to facilitate access to data with a complex structure. Ironically, though, in today’s existing OBDA systems mappings typically need to be compiled by hand, which is a complex and labor intensive task. Additionally, existing semi-automatic mapping approaches suffer from high human effort for cleaning up results. Fully automatic approaches, on the other side, suffer from a lack of precision and/or recall. In setups where the correctness of query results is crucial but the initial human effort must still be kept be small as possible, neither approach is acceptable. This situation calls for a guided, pay as you go feedback process for human mapping validation. We envision a comprehensive suite of methods and techniques that work well with one another in a seamless mapping process and support mapping construction in the context of OBDA. This suite will in part consist of a recombination and adaptation of various existing methods, but will also comprise newly devised algorithms and techniques.
The effects of Licensing on Open Data: Computing a measure of health for our Scholarly Record , Richard Hosking and Mark Gahegan , 425-432 , [OpenAccess] , [Publisher]
As data collections become established in key disciplines, some of the longstanding barriers to data sharing become to dissolve; yet others remain. While metadata and ontologies help overcome the problems of finding and interpreting data, the lack of clarity over licensing remains a real impediment to data reuse. Freedom from legal restriction and uncertainty is essential for the effective sharing, combining and deriving of data from these distributed collections. Reuse and recombination of data will be greatly facilitated by expanding the definition of the semantic web to include the semantics of data licensing. We aim to express licensing terms in a computable manner, within the context of research practice, enabling us to infer the resulting state of rights, obligations and conditions that are inherited by derived and recombined datasets, using a mixed bag of licenses. Building off this we aim to simulate the effects of varying licensing practices within communities, proposing a measure of health of our scholarly record based on compatibility and restrictiveness of the licenses contained therein.
Utilising Provenance to Enhance Social Computation , Milan Markovic , 433-440 , [OpenAccess] , [Publisher]
Many online platforms employ networks of human workers to perform computational tasks that can be difficult for a machine (e.g. reporting travel disruption). Such systems have to make a range of decisions, for example, selection of suitable workers for a task. In this paper we present an approach that utilises Semantic Web technologies and provenance to support such decision-making processes.

Evaluation Track Paper

Crowdsourcing Linked Data Quality Assessment , Maribel Acosta,Amrapali Zaveri,Elena Simperl,Dimitris Kontokostas,Sören Auer and Jens Lehmann , 257-272 , [OpenAccess] , [Publisher]
In this paper we look into the use of crowdsourcing as a means to handle Linked Data quality problems that are challenging to be solved automatically. We analyzed the most common errors encountered in Linked Data sources and classiﬁed them according to the extent to which they are likely to be amenable to a speciﬁc form of crowdsourcing. Based on this analysis, we implemented a quality assessment methodology for Linked Data that leverages the wisdom of the crowds in different ways: (i) a contest targeting an expert crowd of researchers and Linked Data enthusiasts; complemented by (ii) paid microtasks published on Amazon Mechanical Turk. We empirically evaluated how this methodology could efﬁciently spot quality issues in DBpedia. We also investigated how the contributions of the two types of crowds could be optimally integrated into Linked Data curation processes. The results show that the two styles of crowdsourcing are complementary and that crowdsourcing-enabled quality assessment is a promising and affordable way to enhance the quality of Linked Data.
Evaluating and benchmarking SPARQL query containment solvers , Melisachew Wudage Chekol,Jérôme Euzenat,Pierre Genevès and Nabil Layaïda , 401-416 , [OpenAccess] , [Publisher]
Query containment is the problem of deciding if the answers to a query are included in those of another query for any queried database. This problem is very important for query optimization purposes. In the SPARQL context, it can be equally useful. This problem has recently been investigated theoretically and some query containment solvers are available. Yet, there were no benchmarks to compare theses systems and foster their improvement. In order to experimentally assess implementation strengths and limitations, we provide a rst SPARQL containment test benchmark. It has been designed with respect to both the capabilities of existing solvers and the study of typical queries. Some solvers support optional constructs and cycles, while other solvers support pro jection, union of conjunctive queries and RDF Schemas. No solver currently supports all these features or OWL entailment regimes. The study of query demographics on DBPedia logs shows that the vast ma jority of queries are acyclic and a signicant part of them uses UNION or pro jection. We thus test available solvers on their domain of applicability on three dierent benchmark suites. These experiments show that (i) tested solutions are overall functionally correct, (ii) in spite of its complexity, SPARQL query containment is practicable for acyclic queries, (iii) state-of-the-art solvers are at an early stage both in terms of capability and implementation.
Evaluation measures for ontology matchers in supervised matching scenarios , Dominique Ritze,Heiko Paulheim and Kai Eckert , 385-400 , [OpenAccess] , [Publisher]
Precision and Recall, as well as their combination in terms of F-Measure, are widely used measures in computer science and generally applied to evaluate the overall performance of ontology matchers in fully automatic, unsupervised scenarios. In this paper, we investigate the case of supervised matching, where automatically created ontology alignments are veriﬁed by an expert. We motivate and describe this use case and its characteristics and discuss why traditional, F-measure based evaluation measures are not suitable for this use case. Therefore, we investigate several alternative evaluation measures and propose the use of Precision@N curves as a mean to assess different matching systems for supervised matching. We compare the ranking of several state of the art matchers using Precision@N curves to the traditional F-measure based ranking, and discuss means to combine matchers in a way that optimizes the user support in supervised ontology matching.
Geographica: A Benchmark for Geospatial RDF Stores , George Garbis,Kostis Kyzirakos and Manolis Koubarakis , 337-352 , [OpenAccess] , [Publisher]
Geospatial extensions of SPARQL like GeoSPARQL and stSPARQL have recently been deﬁned and corresponding geospatial RDF stores have been implemented. However, there is no widely used benchmark for evaluating geospatial RDF stores which takes into account recent advances to the state of the art in this area. In this paper, we develop a benchmark, called Geographica, which uses both real-world and synthetic data to test the offered functionality and the performance of some prominent geospatial RDF stores.
Introducing Statistical Design of Experiments to SPARQL Endpoint Evaluation , Kjetil Kjernsmo and John S. Tyssedal , 353-368 , [OpenAccess] , [Publisher]
This paper argues that the common practice of benchmarking is inadequate as a scientiﬁc evaluation methodology. It further attempts to introduce the empirical tradition of the physical sciences by using techniques from Statistical Design of Experiments applied to the example of SPARQL endpoint performance evaluation. It does so by studying full as well as fractional factorial experiments designed to evaluate an assertion that some change introduced in a system has improved performance. This paper does not present a ﬁnished experimental design, rather its main focus is didactical, to shift the focus of the community away from benchmarking towards higher scientiﬁc rigor.
NoSQL Databases for RDF: An Empirical Evaluation , Philippe Cudré-Mauroux,Iliya Enchev,Sever Fundatureanu,Paul Groth,Albert Haque,Andreas Harth,Felix Leif Keppmann,Daniel Miranker,Juan Sequeda and Marcin Wylot , 305-320 , [OpenAccess] , [Publisher]
Processing large volumes of RDF data requires sophisticated tools. In recent years, much effort was spent on optimizing native RDF stores and on repurposing relational query engines for large-scale RDF processing. Concurrently, a number of new data management systems— regrouped under the NoSQL (for “not only SQL”) umbrella—rapidly rose to prominence and represent today a popular alternative to classical databases. Though NoSQL systems are increasingly used to manage RDF data, it is still difficult to grasp their key advantages and drawbacks in this context. This work is, to the best of our knowledge, the ﬁrst systematic attempt at characterizing and comparing NoSQL stores for RDF processing. In the following, we describe four different NoSQL stores and compare their key characteristics when running standard RDF benchmarks on a popular cloud infrastructure using both single-machine and distributed deployments.
On Correctness in RDF stream processor benchmarking , Daniele Dell'Aglio,Jean-Paul Calbimonte,Marco Balduini,Oscar Corcho and Emanuele Della Valle , 321-336 , [OpenAccess] , [Publisher]
Two complementary benchmarks have been proposed so far for the evaluation and continuous improvement of RDF stream processors: SRBench and LSBench. They put a special focus on different features of the evaluated systems, including coverage of the streaming extensions of SPARQL supported by each processor, query processing throughput, and an early analysis of query evaluation correctness, based on comparing the results obtained by different processors for a set of queries. However, none of them has analysed the operational semantics of these processors in order to assess the correctness of query evaluation results. In this paper, we propose a characterization of the operational semantics of RDF stream processors, adapting well-known models used in the stream processing engine community: CQL and SECRET. Through this formalization, we address correctness in RDF stream processor benchmarks, allowing to determine the multiple answers that systems should provide. Finally, we present CSRBench, an extension of SRBench to address query result correctness veriﬁcation using an automatic method.
SPARQL Web-Querying Infrastructure: Ready for Action? , Carlos Buil-Aranda,Aidan Hogan,Jürgen Umbrich and Pierre-Yves Vandenbussche , 273-288 , [OpenAccess] , [Publisher]
Hundreds of public SPARQL endpoints have been deployed on the Web, forming a novel decentralised infrastructure for querying billions of structured facts from a variety of sources on a plethora of topics. But is this infrastructure mature enough to support applications? For 427 public SPARQL endpoints registered on the DataHub, we conduct various experiments to test their maturity. Regarding discoverability, we nd that only one-third of endpoints make descriptive meta-data available, making it dicult to locate or learn about their content and capabilities. Regarding interoperability, we nd patchy support for established SPARQL features like ORDER BY as well as (understandably) for new SPARQL 1.1 features. Regarding eciency, we show that the performance of endpoints for generic queries can vary by up to 34 orders of magnitude. Regarding availability, based on a 27-month long monitoring experiment, we show that only 32.2% of public endpoints can be expected to have (monthly) two-nines uptimes of 99100%.
String Similarity Metrics for Ontology Alignment , Michelle Cheatham and Pascal Hitzler , 289-304 , [OpenAccess] , [Publisher]
Ontology alignment is an important part of enabling the semantic web to reach its full potential. The vast ma jority of ontology alignment systems use one or more string similarity metrics, but often the choice of which metrics to use is not given much attention. In this work we evaluate a wide range of such metrics, along with string pre-processing strategies such as removing stop words and considering synonyms, on different types of ontologies. We also present a set of guidelines on when to use which metric. We furthermore show that if optimal string similarity metrics are chosen, those alone can produce alignments that are competitive with the state of the art in ontology alignment systems. Finally, we examine the improvements possible to an existing ontology alignment system using an automated string metric selection strategy based upon the characteristics of the ontologies to be aligned.
Towards a systematic benchmarking of ontology-based query rewriting systems , Jose Mora and Oscar Corcho , 369-384 , [OpenAccess] , [Publisher]
Query rewriting is one of the fundamental steps in ontologybased data access (OBDA) approaches. It takes as inputs an ontology and a query written according to that ontology, and produces as an output a set of queries that should be evaluated to account for the inferences that should be considered for that query and ontology. Different query rewriting systems give support to different ontology languages with varying expressiveness, and the rewritten queries obtained as an output do also vary in expressiveness. This heterogeneity has traditionally made it difficult to compare different approaches, and the area lacks in general commonly agreed benchmarks that could be used not only for such comparisons but also for improving OBDA support. In this paper we compile data, dimensions and measurements that have been used to evaluate some of the most recent systems, we analyse and characterise these assets, and provide a uniﬁed set of them that could be used as a starting point towards a more systematic benchmarking process for such systems. Finally, we apply this initial benchmark with some of the most relevant OBDA approaches in the state of the art.

Semantic Web In Use Track Paper

A Linked-Data-driven and Semantically-enabled Journal Portal for Scientometrics , Yingjie Hu,Krzysztof Janowicz,Grant Mckenzie,Kunal Sengupta and Pascal Hitzler , 113-128 , [OpenAccess] , [Publisher]
The Semantic Web journal by IOS Press follows a unique open and transparent process during which each submitted manuscript is available online together with the full history of its successive decision statuses, assigned editors, solicited and voluntary reviewers, their full text reviews, and in many cases also the authors’ response letters. Combined with a highly-customized, Drupal-based journal management system, this provides the journal with semantically rich manuscript time lines and networked data about authors, reviewers, and editors. These data are now exposed using a SPARQL endpoint, an extended Bibo ontology, and a modular Linked Data portal that provides interactive scientometrics based on established and new analysis methods. The portal can be customized for other journals as well.
Cross-language Semantic Retrieval and Linking of E-gov Services , Fedelucio Narducci,Matteo Palmonari and Giovanni Semeraro , 129-144 , [OpenAccess] , [Publisher]
Public administrations are aware of the advantages of sharing Open Government Data in terms of transparency, development of improved services, collaboration between stakeholders, and spurring new economic activities. Initiatives for the publication and interlinking of government service catalogs as Linked Open Data (lod) support the interoperability among European administrations and improve the capability of foreign citizens to access services across Europe. However, linking service catalogs to reference lod catalogs requires a signiﬁcant effort from local administrations, preventing the uptake of interoperable solutions at a large scale. The web application presented in this paper is named CroSeR (Cross-language Service Retriever) and supports public bodies in the process of linking their own service catalogs to the lod cloud. CroSeR supports different European languages and adopts a semantic representation of e-gov services based on Wikipedia. CroSeR tries to overcome problems related to the short textual descriptions associated to a service by embodying a semantic annotation algorithm that enriches service labels with emerging Wikipedia concepts related to the service. An experimental evaluation carried-out on e-gov service catalogs in ﬁve different languages shows the effectiveness of our model.
Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis , Christian Bizer,Kai Eckert,Robert Meusel,Hannes Mühleisen,Michael Schuhmacher and Johanna Völker , 17-32 , [OpenAccess] , [Publisher]
More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our ﬁndings to be veriﬁed and to be used as starting points for further domain-speciﬁc investigations as well as for focused information extraction endeavors.
Entity recommendations in Web Search , Roi Blanco,Berkant Barla Cambazoglu,Peter Mika and Nicolas Torzec , 33-48 , [OpenAccess] , [Publisher]
While some web search users know exactly what they are looking for, others are willing to explore topics related to an initial interest. Often, the user’s initial interest can be uniquely linked to an entity in a knowledge base. In this case, it is natural to recommend the explicitly linked entities for further exploration. In real world knowledge bases, however, the number of linked entities may be very large and not all related entities may be equally relevant. Thus, there is a need for ranking related entities. In this paper, we describe Spark, a recommendation engine that links a user’s initial query to an entity within a knowledge base and provides a ranking of the related entities. Spark extracts several signals from a variety of data sources, including Yahoo! Web Search, Twitter, and Flickr, using a large cluster of computers running Hadoop. These signals are combined with a machine learned ranking model in order to produce a ﬁnal recommendation of entities to user queries. This system is currently powering Yahoo! Web Search result pages.
Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery , Carole Goble,Alasdair J. G. Gray,Lee Harland,Karen Karapetyan,Antonis Loizou,Ivan Mikhailov,Yrjana Rankka,Stefan Senger,Valery Tkachenko,Antony Williams and Egon Willighagen , 65-80 , [OpenAccess] , [Publisher]
The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.
Integrating NLP using Linked Data , Sebastian Hellmann,Jens Lehmann,Sören Auer and Martin Brümmer , 97-112 , [OpenAccess] , [Publisher]
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF speciﬁcation (NIF 2.0) and the result of a developer study.
Publishing the Norwegian Petroleum Directorate’s FactPages as Semantic Web Data , Martin G. Skjæveland,Espen H. Lian and Ian Horrocks , 161-176 , [OpenAccess] , [Publisher]
This paper motivates, documents and evaluates the process and results of converting the Norwegian Petroleum Directorate’s Fact-Pages, a well-known and diverse set of tabular data, but with little and incomplete schema information, stepwise into other representations where in each step more semantics is added to the dataset. The different representations we consider are a regular relational database, a linked open data dataset, and an ontology. For each conversion step we explain and discuss necessary design choices which are due to the speciﬁc shape of the dataset, but also those due to the characteristics and idiosyncrasies of the representation formats. We additionally evaluate the output, performance and cost of querying the different formats using questions provided by users of the FactPages.
Real-time Urban Monitoring in Dublin using Semantic and Stream Technologies , Simone Tallevi-Diotallevi,Spyros Kotoulas,Luca Foschini,Freddy Lecue and Antonio Corradi , 177-192 , [OpenAccess] , [Publisher]
Several sources of information, from people, systems, things, are already available in most modern cities. Processing these continuous ﬂows of information and capturing insight poses unique technical challenges that span from response time constraints to data heterogeneity, in terms of format and throughput. To tackle these problems, we focus on a novel prototype to ease real-time monitoring and decision-making processes for the City of Dublin with three main original technical aspects: (i) an extension to SPARQL to support efﬁcient querying of heterogeneous streams; (ii) a query execution framework and runtime environment based on IBM InfoSphere Streams, a high-performance, industrial strength, stream processing engine; (iii) a hybrid RDFS reasoner, optimized for our stream processing execution framework. Our approach has been validated with real data collected on the ﬁeld, as shown in our Dublin City video demonstration. Results indicate that real-time processing of city information streams based on semantic technologies is indeed not only possible, but also efﬁcient, scalable and low-latency.
Reasoning on crowd-sourced semantic annotations to facilitate cataloguing of 3D artefacts in the cultural heritage domain , Chih-Hao Yu,Tudor Groza and Jane Hunter , 225-240 , [OpenAccess] , [Publisher]
The 3D Semantic Annotation (3DSA) system expedites the classiﬁcation of 3D digital surrogates from the cultural heritage domain, by leveraging crowd-sourced semantic annotations. More speciﬁcally, the 3DSA system generates high-level classiﬁcations of 3D ob jects by applying rule-based reasoning across community-generated annotations and low-level shape and size attributes. This paper describes a particular use of the 3DSA system – cataloguing Greek pottery. It also describes our novel approach to rule-based reasoning that is modelled on concepts inspired from Markov logic networks. Our evaluation of this approach demonstrates its efficiency, accuracy and versatility, compared to classical rule-based reasoning.
Semantic Data and Models Sharing in systems Biology: The Just Enough Results Model and the SEEK Platform , Katherine Wolstencroft,Stuart Owen,Olga Krebs,Quyen Ngyuen,Jacky. L. Snoep,Wolfgang Mueller and Carole Goble , 209-224 , [OpenAccess] , [Publisher]
Research in Systems Biology involves integrating data and knowledge about the dynamic processes in biological systems in order to understand and model them. Semantic web technologies should be ideal for exploring the complex networks of genes, proteins and metabolites that interact, but much of this data is not natively available to the semantic web. Data is typically collected and stored with free-text annotations in spreadsheets, many of which do not conform to existing metadata standards and are often not publica lly released. Along with initiatives to promote more data sharing, one of the main challenges is therefore to semantically annotate and extract this data so that it is available to the research community. Data annotation and curation are expensive and undervalued tasks that have enormous benefits to the discipline as a whole, but fewer benefits to the individual data producers. By embedding semantic annotation into spreadsheets, however, and automat ically extracting this data into RDF at the time of repository submission, the process of producing standards-compliant data, that is available for semantic web querying, can be achieved without adding additional overheads to laboratory data management. This paper describes these strategies in the context of semantic data management in the SEEK. The SEEK is a web-based resource for sharing and exchanging Systems Biology data and models that is underpinned by the JERM ontology (Just Enough Results Model), which describes the relationships between data, models, protocols and experiments. The SEEK was originally developed for SysMO, a large European Systems Biology consortium studying micro-organisms, but it has since had widespread adoption across European Systems Biology.
Social listening of City Scale Events using the Streaming Linked Data Framework , Marco Balduini,Emanuele Della Valle,Daniele Dell'Aglio,Themis Palpanas,Mikalai Tsytsarau and Cristian Confalonieri , 1-16 , [OpenAccess] , [Publisher]
City-scale events may easily attract half a million of visitors in hundreds of venues over just a few days. Which are the most attended venues? What do visitors think about them? How do they feel before, during and after the event? These are few of the questions a city-scale event manger would like to see answered in real-time. In this paper, we report on our experience in social listening of two city-scale events (London Olympic Games 2012, and Milano Design Week 2013) using the Streaming Linked Data Framework.
The Energy Management Adviser at EDF , Pierre Chaussecourte,Birte Glimm,Ian Horrocks,Boris Motik and Laurent Pierre , 49-64 , [OpenAccess] , [Publisher]
The EMA (Energy Management Adviser) aims to produce personalised energy saving advice for EDF’s customers. The advice takes the form of one or more ‘tips’, and personalisation is achieved using semantic technologies: customers are described using RDF, an OWL ontology provides a conceptual model of the relevant domain (housing, environment, and so on) and the different kinds of tips, and SPARQL query answering is used to identify relevant tips. The current prototype provides tips to more than 300,000 EDF customers in France at least twice a year. The main challenges for our future work include providing a timely service for all of the 35 million EDF customers in France, simplifying the system’s maintenance, and providing new ways for interacting with customers such as via a Web site.
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model , Amrapali Zaveri,Joao Ricardo Nickenig Vissoci,Cinzia Daraio and Ricardo Pietrobon , 241-256 , [OpenAccess] , [Publisher]
Europe has a high impact on the global biomedical literature, having contributed with a growing number of research articles and a signiﬁcant citation impact. However, the impact of research and development generated by European countries on economic, educational and healthcare performance is poorly understood. The recent Linking Open Data (LOD) project has made a lot of data sources publicly available and in human-readable formats. In this paper, we demonstrate the utility of LOD in assessing the impact of Research and Development (R&D) on the economic, education and healthcare performance in Europe. We extract relevant variables from two LOD datasets, namely World Bank and Eurostat. We analyze the data for 20 out of the 27 European countries over a span of 10 years (1999 to 2009). We use a Structural Equation Modeling (SEM) approach to quantify the impact of R&D on the different measures. We perform different exploratory and conﬁrmatory factorial analysis evaluations which gives rise to four latent variables that are included in the model: (i) Research and Development (R&D), (ii) Economic Performance (EcoP), (iii) Educational Performance (EduP), (iv) Healthcare performance (HcareP) of the European countries. Our results indicate the importance of R&D to the overall development of the European educational and healthcare performance (directly) and economic performance (indirectly). The results also shows the practical applicability of LOD to estimate this impact.
Using Semantic Web in ICD-11: Three Years Down the Road , Tania Tudorache,Csongor I Nyulas,Natasha F. Noy and Mark Musen , 193-208 , [OpenAccess] , [Publisher]
The World Health Organization is using Semantic Web technologies in the development of the 11th revision of the International Classiﬁcation of Diseases (ICD-11). Health officials use ICD in all United Nations member countries to compile basic health statistics, to monitor health-related spending, and to inform policy makers. In 2010, we published a paper in the ISWC In Use track reporting on our experience in the ﬁrst six months with building and deploying iCAT, a Semantic Web platform to support the collaborative authoring of ICD-11. Three years since our original publication, 270 domain experts around the world have used iCAT to author more than 45,000 classes, to perform more than 260,000 changes, and to create more than 17,000 links to external medical terminologies. During the last three years, the collaboration processes, modeling and tooling have evolved signiﬁcantly, and we have learned important lessons, which we will report in this paper. We describe the beneﬁts of using semantic technologies as an infrastructure, which proved to be critical in making support for this rapid evolution possible. To our knowledge, this effort is the only real-world pro ject supporting the collaborative authoring of ontologies at this scale, and which, at the same time, has a high visibility and impact for the health care around the world. We believe that the insights that we gained and the lessons that we learned after four years into this large-scale pro ject will be useful to others who need to support similar collaborative pro jects.
Using the past to explain the present: interlinking current affairs with archives via the Semantic Web , Yves Raimond,Michael Smethurst,Andrew McParland and Christopher Lowis , 145-160 , [OpenAccess] , [Publisher]
The BBC has a very large archive of programmes, covering a wide range of topics. This archive holds a signiﬁcant part of the BBC’s institutional memory and is an important part of the cultural history of the United Kingdom and the rest of the world. These programmes, or parts of them, can help provide valuable context and background for current news events. However the BBC’s archive catalogue is not a complete record of everything that was ever broadcast. For example, it excludes the BBC World Service, which has been broadcasting since 1932. This makes the discovery of content within these parts of the archive very difficult. In this paper we describe a system based on Semantic Web technologies which helps us to quickly locate content related to current news events within those parts of the BBC’s archive with little or no pre-existing metadata. This system is driven by automated interlinking of archive content with the Semantic Web, user validations of the resulting data and topic extraction from live BBC News subtitles. The resulting inter-links between live news subtitles and the BBC’s archive are used in a dynamic visualisation enabling users to quickly locate relevant content. This content can then be used by journalists and editors to provide historical context, background information and supporting content around current affairs.
When History Matters - Assessing Reliability for the Reuse of Scientific Workflows , José Manuel Gómez-Pérez,Esteban García-Cuesta,Aleix Garrido and José Enrique Ruiz , 81-96 , [OpenAccess] , [Publisher]
Scientiﬁc workﬂows play an important role in computational research as essential artifacts for communicating the methods used to produce research ﬁndings. We are witnessing a growing number of efforts that treat workﬂows as ﬁrst-class artifacts for sharing and exchanging scientiﬁc knowledge, either as part of scholarly articles or as standalone ob jects. However, workﬂows are not born to be reliable, which can seriously damage their reusability and trustworthiness as knowledge exchange instruments. Scientiﬁc workﬂows are commonly sub ject to decay, which consequently undermines their reliability over their lifetime. The reliability of workﬂows can be notably improved by advocating scientists to preserve a minimal set of information that is essential to assist the interpretations of these workﬂows and hence improve their potential for reproducibility and reusability. In this paper we show how, by measuring and monitoring the completeness and stability of scientiﬁc workﬂows over time we are able to provide scientists with a measure of their reliability, supporting the reuse of trustworthy scientiﬁc knowledge.

The Semantic Web - ISWC 2013 -- 12th International Semantic Web Conference, Sydney, Australia, October 21-25, 2013, Proceedings, Part I

A Confidentiality Model for Ontologies , Piero Bonatti and Luigi Sauro , 17-32 , [OpenAccess] , [Publisher]
We illustrate several novel attacks to the conﬁdentiality of knowledge bases (KB). Then we introduce a new conﬁdentiality model, sensitive enough to detect those attacks, and a method for constructing secure KB views. We identify safe approximations of the background knowledge exploited in the attacks; they can be used to reduce the complexity of constructing secure KB views.
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources , Mohsen Taheriyan,Craig Knoblock,Pedro Szekely and José Luis Ambite , 593-608 , [OpenAccess] , [Publisher]
Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task. Most of the related work focuses on automatic annotation with classes or properties of source attributes or input and output parameters. However, constructing a source model that includes the relationships between the attributes in addition to their semantic types remains a largely unsolved problem. In this paper, we present a graph-based approach to hypothesize a rich semantic description of a new target source from a set of known sources that have been modeled over the same domain ontology. We exploit the domain ontology and the known source models to build a graph that represents the space of plausible source descriptions. Then, we compute the top k candidates and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic descriptions of future data sources. Our evaluation shows that our method produces models that are twice as accurate than the models produced using a state of the art system that does not learn from prior models.
A Query Tool for EL with Non-monotonic rules , Vadim Ivanov,Matthias Knorr and Joao Leite , 209-224 , [OpenAccess] , [Publisher]
We present the Protégé plug-in NoHR that allows the user to take an E L+⊥ ontology, add a set of non-monotonic (logic programming) rules – suitable e.g. to express defaults and exceptions – and query the combined knowledge base. Our approach uses the well-founded semantics for MKNF knowledge bases as underlying formalism, so no restriction other than DL-safety is imposed on the rules that can be written. The tool itself builds on the procedure SLG(O) and, with the help of OWL 2 EL reasoner ELK, pre-processes the ontology into rules, whose result together with the non-monotonic rules serve as input for the topdown querying engine XSB Prolog. With the resulting plug-in, even queries to very large ontologies, such as SNOMED CT, augmented with a large number of rules, can be processed at an interactive response time after one initial brief pre-processing period. At the same time, our system is able to deal with possible inconsistencies between the rules and an ontology that alone is consistent.
A decision procedure for SHOIQ with transitive closure of roles , Chan Le Duc,Myriam Lamolle and Olivier Curé , 257-272 , [OpenAccess] , [Publisher]
The Semantic Web makes an extensive use of the OWL DL ontology language, underlied by the SHOIQ description logic, to formalize its resources. In this paper, we propose a decision procedure for this logic extended with the transitive closure of roles in concept axioms, a feature needed in several application domains. The most challenging issue we have to deal with when designing such a decision procedure is to represent inﬁnitely non-tree-shaped models, which are different from those of SHOIQ ontologies. To address this issue, we introduce a new blocking condition for characterizing models which may have an inﬁnite non-tree-shaped part.
A snapshot of the OWL Web , Nicolas Matentzoglu,Samantha Bail and Bijan Parsia , 321-336 , [OpenAccess] , [Publisher]
Tool development for and empirical experimentation in OWL ontology engineering require a wide variety of suitable ontologies as input for testing and evaluation purposes and detailed characterisations of real ontologies. Empirical activities often resort to (somewhat arbitrarily) hand curated corpora available on the web, such as the NCBO BioPortal and the TONES Repository, or manually selected sets of well-known ontologies. Findings of surveys and results of benchmarking activities may be biased, even heavily, towards these datasets. Sampling from a large corpus of ontologies, on the other hand, may lead to more representative results. Current large scale repositories and web crawls are mostly uncurated and suffer from duplication, small and (for many purposes) uninteresting ontology ﬁles, and contain large numbers of ontology versions, variants, and facets, and therefore do not lend themselves to random sampling. In this paper, we survey ontologies as they exist on the web and describe the creation of a corpus of OWL DL ontologies using strategies such as web crawling, various forms of de-duplications and manual cleaning, which allows random sampling of ontologies for a variety of empirical applications.
Bringing Math to LOD: A Semantic Publishing Platform Prototype for Scientific Collections in Mathematics , Olga Nevzorova,Nikita Zhiltsov,Danila Zaikin,Olga Zhibrik,Alexander Kirillovich,Vladimir Nevzorov and Evgeniy Birialtsev , 369-384 , [OpenAccess] , [Publisher]
We present our work on developing a software platform for mining mathematical scholarly papers to obtain a Linked Data representation. Currently, the Linking Open Data (LOD) cloud lacks up-to-date and detailed information on professional level mathematics. To our mind, the main reason for that is the absence of appropriate tools that could analyze the underlying semantics in mathematical papers and effectively build their consolidated representation. We have developed a holistic approach to analysis of mathematical documents, including ontology based extraction, conversion of the article body as well as its metadata into RDF, integration with some existing LOD data sets, and semantic search. We argue that the platform may be helpful for enriching user experience on modern online scientiﬁc collections.
Complete Query Answering Over Horn Ontologies Using a Triple Store , Yujiao Zhou,Yavor Nenov,Bernardo Cuenca Grau and Ian Horrocks , 703-718 , [OpenAccess] , [Publisher]
In our previous work, we showed how a scalable OWL 2 RL reasoner can be used to compute both lower and upper bound query answers over very large datasets and arbitrary OWL 2 ontologies. However, when these bounds do not coincide, there still remain a number of possible answer tuples whose status is not determined. In this paper, we show how in the case of Horn ontologies one can exploit the lower and upper bounds computed by the RL reasoner to efficiently identify a subset of the data and ontology that is large enough to resolve the status of these tuples, yet small enough so that the status can be computed using a fully-ﬂedged OWL 2 reasoner. The resulting hybrid approach has enabled us to compute exact answers to queries over datasets and ontologies where previously only approximate query answering was possible.
Completeness Statements about RDF Data Sources and Their Use for Query Answering , Fariz Darari,Werner Nutt,Giuseppe Pirrò and Simon Razniewski , 65-80 , [OpenAccess] , [Publisher]
With thousands of RDF data sources available on the Web covering disparate and possibly overlapping knowledge domains, the problem of providing high-level descriptions (in the form of metadata) of their content becomes crucial. In this paper we introduce a theoretical framework for describing data sources in terms of their completeness. We show how existing data sources can be described with completeness statements expressed in RDF. We then focus on the problem of the completeness of query answering over plain and RDFS data sources augmented with completeness statements. Finally, we present an extension of the completeness framework for federated data sources.
Controlled Query Evaluation over OWL 2 RL Ontologies , Bernardo Cuenca Grau,Evgeny Kharlamov,Egor V. Kostylev and Dmitriy Zheleznyakov , 49-64 , [OpenAccess] , [Publisher]
We study conﬁdentiality enforcement in ontology-based information systems where ontologies are expressed in OWL 2 RL, a proﬁle of OWL 2 that is becoming increasingly popular in Semantic Web applications. We formalise a natural adaptation of the Controlled Query Evaluation (CQE) framework to ontologies. Our goal is to provide CQE algorithms that (i) ensure conﬁdentiality of sensitive information; (ii) are efficiently implementable by means of RDF triple store technologies; and (iii) ensure maximality of the answers returned by the system to user queries (thus restricting access to information as little as possible). We formally show that these requirements are in conﬂict and cannot be satisﬁed without imposing restrictions on ontologies. We propose a fragment of OWL 2 RL for which all three requirements can be satisﬁed. For the identiﬁed fragment, we design a CQE algorithm that has the same computational complexity as standard query answering and can be implemented by relying on state-of-the-art triple stores.
DAW: Duplicate-AWare Federated Query Processing over the Web of Data , Muhammad Saleem,Axel-Cyrille Ngonga Ngomo,Josiane Xavier Parreira,Helena Deus and Manfred Hauswirth , 561-576 , [OpenAccess] , [Publisher]
Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines – DARQ, SPLENDID, and FedX – with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can signiﬁcantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.
Discovering Missing Semantic Relations between Entities in Wikipedia , Mengling Xu,Zhichun Wang,Rongfang Bie,Juanzi Li,Chen Zheng,Wantian Ke and Mingquan Zhou , 657-670 , [OpenAccess] , [Publisher]
Wikipedia’s infoboxes contain rich structured information of various entities, which have been explored by the DBpedia pro ject to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia’s instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia’s infoboxes, so that the missing semantic relations between entities can be established. Our approach ﬁrst identiﬁes entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively ﬁnd the missing relations between entities, and it signiﬁcantly outperforms the baseline methods in terms of both precision and recall.
DynamiTE: Parallel Materialization of Dynamic RDF Data , Jacopo Urbani,Alessandro Margara,Ceriel Jacobs,Frank Van Harmelen and Henri Bal , 641-656 , [OpenAccess] , [Publisher]
One of the main advantages of using semantically annotated data is that machines can reason on it, deriving implicit knowledge from explicit information. In this context, materializing every possible implicit derivation from a given input can be computationally expensive, especially when considering large data volumes. Most of the solutions that address this problem rely on the assumption that the information is static, i.e., that it does not change, or changes very infrequently. However, the Web is extremely dynamic: online newspapers, blogs, social networks, etc., are frequently changed so that outdated information is removed and replaced with fresh data. This demands for a materialization that is not only scalable, but also reactive to changes. In this paper, we consider the problem of incremental materialization, that is, how to update the materialized derivations when new data is added or removed. To this purpose, we consider the ρdf RDFS fragment [12], and present a parallel system that implements a number of algorithms to quickly recalculate the derivation. In case new data is added, our system uses a parallel version of the well-known semi-naive evaluation of Datalog. In case of removals, we have implemented two algorithms, one based on previous theoretical work, and another one that is more efficient since it does not require a complete scan of the input. We have evaluated the performance using a prototype system called DynamiTE , which organizes the knowledge bases with a number of indices to facilitate the query process and exploits parallelism to improve the performance. The results show that our methods are indeed capable to recalculate the derivation in a short time, opening the door to reasoning on much more dynamic data than is currently possible.
Elastic and scalable processing of Linked Stream Data in the Cloud , Danh Le Phuoc,Hoan Nguyen Mau Quoc,Chan Le Van and Manfred Hauswirth , 273-288 , [OpenAccess] , [Publisher]
Linked Stream Data extends the Linked Data paradigm to dynamic data sources. It enables the integration and joint processing of heterogeneous stream data with quasi-static data from the Linked Data Cloud in near-real-time. Several Linked Stream Data processing engines exist but their scalability still needs to be in improved in terms of (static and dynamic) data sizes, number of concurrent queries, stream update frequencies, etc. So far, none of them supports parallel processing in the Cloud, i.e., elastic load proﬁles in a hosted environment. To remedy these limitations, this paper presents an approach for elastically parallelizing the continuous execution of queries over Linked Stream Data. For this, we have developed novel, highly efficient, and scalable parallel algorithms for continuous query operators. Our approach and algorithms are implemented in our CQELS Cloud system and we present extensive evaluations of their superior performance on Amazon EC2 demonstrating their high scalability and excellent elasticity in a real deployment.
Empirical Study of Logic-Based Modules: Cheap Is Cheerful , Chiara Del Vescovo,Pavel Klinov,Bijan Parsia,Ulrike Sattler,Thomas Schneider and Dmitry Tsarkov , 81-96 , [OpenAccess] , [Publisher]
For ontology reuse and integration, a number of approaches have been devised that aim at identifying modules, i.e., suitably small sets of “relevant” axioms from ontologies. Here we consider three logically sound notions of modules: MEX modules, only applicable to inexpressive ontologies; modules based on semantic locality, a sound approximation of the ﬁrst; and modules based on syntactic locality, a sound approximation of the second (and thus the ﬁrst), widely used since these modules can be extracted from OWL DL ontologies in time polynomial in the size of the ontology. In this paper we investigate the quality of both approximations over a large corpus of ontologies, using our own implementation of semantic locality, which is the ﬁrst to our knowledge. In particular, we show with statistical signiﬁcance that, in most cases, there is no difference between the two module notions based on locality; where they differ, the additional axioms can either be easily ruled out or their number is relatively small. We classify the axioms that explain the rare differences into four kinds of “culprits” and discuss which of those can be avoided by extending the deﬁnition of syntactic locality. Finally, we show that differences between MEX and locality-based modules occur for a minority of ontologies from our corpus and largely affect (approximations of ) expressive ontologies – this conclusion relies on a much larger and more diverse sample than existing comparisons between MEX and syntactic locality-based modules.
Exploring Scholarly Data with Rexplore , Francesco Osborne,Enrico Motta and Paul Mulholland , 449-464 , [OpenAccess] , [Publisher]
Despite the large number and variety of tools and services available today for exploring scholarly data, current support is still very limited in the context of sensemaking tasks, which go beyond standard search and ranking of authors and publications, and focus instead on i) understanding the dynamics of research areas, ii) relating authors ‘semantically’ (e.g., in terms of common interests or shared academic trajectories), or iii) performing fine-grained academic expert search along multiple dimensions. To address this gap we have developed a novel tool, Rexplore, which integrates statistical analysis, semantic technologies, and visual analytics to provide effective support for exploring and making sense of scholarly data. Here, we describe the main innovative elements of the tool and we present the results from a task-centric empirical evaluation, which shows that Rexplore is highly effective at providing support for the aforementioned sensemaking tasks. In addition, these results are robust both with respect to the background of the users (i.e., expert analysts vs. ‘ordinary’ users) and also with respect to whether the tasks are selected by the evaluators or proposed by the users themselves.
FedSearch: efficiently combining structured queries and full-text search in a SPARQL federation , Andriy Nikolov,Andreas Schwarte and Christian Hütter , 417-432 , [OpenAccess] , [Publisher]
Combining structured queries with full-text search provides a powerful means to access distributed linked data. However, executing hybrid search queries in a federation of multiple data sources presents a number of challenges due to data source heterogeneity and lack of statistical data about keyword selectivity. To address these challenges, we present FedSearch – a novel hybrid query engine based on the SPARQL federation framework FedX. We extend the SPARQL algebra to incorporate keyword search clauses as ﬁrst-class citizens and apply novel optimization techniques to improve the query processing efficiency while maintaining a meaningful ranking of results. By performing on-the-ﬂy adaptation of the query execution plan and intelligent grouping of query clauses, we are able to reduce signiﬁcantly the communication costs making our approach suitable for top-k hybrid search across multiple data sources. In experiments we demonstrate that our optimization techniques can lead to a substantial performance improvement, reducing the execution time of hybrid queries by more than an order of magnitude.
Federated Entity Search using On-The-Fly Consolidation , Daniel M. Herzig,Roi Blanco,Peter Mika and Thanh Tran , 161-176 , [OpenAccess] , [Publisher]
Nowadays, search on the Web goes beyond the retrieval of textual Web sites and increasingly takes advantage of the growing amount of structured data. Of particular interest is entity search, where the units of retrieval are structured entities instead of textual documents. These entities reside in different sources, which may provide only limited information about their content and are therefore called “uncooperative”. Further, these sources capture complementary but also redundant information about entities. In this environment of uncooperative data sources, we study the problem of federated entity search, where redundant information about entities is reduced on-the-ﬂy through entity consolidation performed at query time. We propose a novel method for entity consolidation that is based on using language models and completely unsupervised, hence more suitable for this on-the-ﬂy uncooperative setting than state-of-the-art methods that require training data. Further, we apply the same language model technique to deal with the federated search problem of ranking results returned from different sources. Particular novel are the mechanisms we propose to incorporate consolidation results into this ranking. We perform experiments using real Web queries and data sources. Our experiments show that our approach for federated entity search with on-the-ﬂy consolidation improves upon the performance of a state-of-the-art preference aggregation baseline and also beneﬁts from consolidation.
Getting Lucky in Ontology Search: A Data-Driven Evaluation Framework for Ontology Ranking , Natasha F. Noy,Paul Alexander,Rave Harpaz,Trish Whetzel,Raymond Fergerson and Mark Musen , 433-448 , [OpenAccess] , [Publisher]
With hundreds, if not thousands, of ontologies available today in many different domains, ontology search and ranking has become an important and timely problem. When a user searches a collection of ontologies for her terms of interest, there are often dozens of ontologies that contain these terms. How does she know which ontology is the most relevant to her search? Our research group hosts BioPortal, a public repository of more than 330 ontologies in the biomedical domain. When a term that a user searches for is available in multiple ontologies, how do we rank the results and how do we measure how well our ranking works? In this paper, we develop an evaluation framework that enables developers to compare and analyze the performance of different ontology-ranking methods. Our framework is based on processing search logs and determining how often users select the top link that the search engine offers. We evaluate our framework by analyzing the data on BioPortal searches. We explore several different ranking algorithms and measure the effectiveness of each ranking by measuring how often users click on the highest ranked ontology. We collected log data from more than 4,800 BioPortal searches. Our results show that regardless of the ranking, in more than half the searches, users select the ﬁrst link. Thus, it is even more critical to ensure that the ranking is appropriate if we want to have satisﬁed users. Our further analysis demonstrates that ranking ontologies based on page view data signiﬁcantly improves the user experience, with an approximately 26% increase in the number of users who select the highest ranked ontology for the search.
Incremental Reasoning in OWL EL without Bookkeeping , Yevgeny Kazakov and Pavel Klinov , 225-240 , [OpenAccess] , [Publisher]
We describe a method for updating the classiﬁcation of ontologies expressed in the E L family of Description Logics after some axioms have been added or deleted. While incremental classiﬁcation modulo additions is relatively straightforward, handling deletions is more problematic since it requires retracting logical consequences that are no longer valid. Known algorithms address this problem using various forms of bookkeeping to trace the consequences back to premises. But such additional data can consume memory and place an extra burden on the reasoner during application of inferences. In this paper, we present a technique, which avoids this extra cost while being very efﬁcient for small incremental changes in ontologies. The technique is freely available as a part of the open-source E L reasoner ELK and its efﬁciency is demonstrated on naturally occurring and synthetic data.
Indented Tree or Graph? A Usability Study of Ontology Visualization Techniques in the Context of Class Mapping Evaluation , Bo Fu,Natalya F. Noy and Margaret-Anne Storey , 113-128 , [OpenAccess] , [Publisher]
Research effort in ontology visualization has largely focused on developing new visualization techniques. At the same time, researchers have paid less attention to investigating the usability of common visualization techniques that many practitioners regularly use to visualize ontological data. In this paper, we focus on two popular ontology visualization techniques: indented tree and graph. We conduct a controlled usability study with an emphasis on the effectiveness, efficiency, workload and satisfaction of these visualization techniques in the context of assisting users during evaluation of ontology mappings. Findings from this study have revealed both strengths and weaknesses of each visualization technique. In particular, while the indented tree visualization is more organized and familiar to novice users, subjects found the graph visualization to be more controllable and intuitive without visual redundancy, particularly for ontologies with multiple inheritance.
Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds , Xingjian Zhang,Dezhao Song,Sambhawa Priya and Jeff Heflin , 671-686 , [OpenAccess] , [Publisher]
In this paper we present the infrastructure of the contextual tag cloud system which can execute large volumes of queries about the number of instances that use particular ontological terms. The contextual tag cloud system is a novel application that helps users explore a large scale RDF dataset: the tags are ontological terms (classes and properties), the context is a set of tags that deﬁnes a subset of instances, and the font sizes reﬂect the number of instances that use each tag. It visualizes the patterns of instances speciﬁed by the context a user constructs. Given a request with a speciﬁc context, the system needs to quickly ﬁnd what other tags the instances in the context use, and how many instances in the context use each tag. The key question we answer in this paper is how to scale to Linked Data; in particular we use a dataset with 1.4 billion triples and over 380,000 tags. This is complicated by the fact that the calculation should, when directed by the user, consider the entailment of taxonomic and/or domain/range axioms in the ontology. We combine a scalable preprocessing approach with a specially-constructed inverted index and use three approaches to prune unnecessary counts for faster intersection computations. We compare our system with a state-of-the-art triple store, examine how pruning rules interact with inference and analyze our design choices.
Knowledge Graph Identification , Jay Pujara,Hui Miao,Lise Getoor and William Cohen , 529-544 , [OpenAccess] , [Publisher]
Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a know ledge graph. The extractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as know ledge graph identiﬁcation. In order to perform this task, we must reason jointly about candidate facts and their associated extraction conﬁdences, identify coreferent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced probabilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL pro ject containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with signiﬁcantly lower running time.
ORCHID – Reduction-Ratio-Optimal Computation of Geo-Spatial Distances for Link Discovery , Axel-Cyrille Ngonga Ngomo , 385-400 , [OpenAccess] , [Publisher]
The discovery of links between resources within knowledge bases is of crucial importance to realize the vision of the Semantic Web. Addressing this task is especially challenging when dealing with geo-spatial datasets due to their sheer size and the potential complexity of single geo-spatial objects. Yet, so far, little attention has been paid to the characteristics of geo-spatial data within the context of link discovery. In this paper, we address this gap by presenting Orchid, a reduction-ratio-optimal link discovery approach designed especially for geo-spatial data. Orchid relies on a combination of the Hausdorff and orthodromic metrics to compute the distance between geo-spatial objects. We ﬁrst present two novel approaches for the efficient computation of Hausdorff distances. Then, we present the space tiling approach implemented by Orchid and prove that it is optimal with respect to the reduction ratio that it can achieve. The evaluation of our approaches is carried out on three real datasets of different size and complexity. Our results suggest that our approaches to the computation of Hausdorff distances require two orders of magnitude less orthodromic distances computations to compare geographical data. Moreover, they require two orders of magnitude less time than a naive approach to achieve this goal. Finally, our results indicate that Orchid scales to large datasets while outperforming the state of the art signiﬁcantly.
On the Status of Experimental Research on the Semantic Web , Heiner Stuckenschmidt,Michael Schuhmacher,Johannes Knopp,Christian Meilicke and Ansgar Scherp , 577-592 , [OpenAccess] , [Publisher]
Experimentation is an important way to validate results of Semantic Web and Computer Science research in general. In this paper, we investigate the development and the current status of experimental work on the Semantic Web. Based on a corpus of 500 papers collected from the International Semantic Web Conferences (ISWC) over the past decade, we analyse the importance and the quality of experimental research conducted and compare it to general Computer Science. We observe that the amount and quality of experiments are steadily increasing over time. Unlike hypothesised, we cannot conﬁrm a statistically signiﬁcant correlation between a paper’s citations and the amount of experimental work reported. Our analysis, however, shows that papers comparing themselves to other systems are more often cited than other papers.
One License to Compose Them All: a deontic logic approach to data licensing on the Web of Data , Guido Governatori,Antonino Rotolo,Serena Villata and Fabien Gandon , 145-160 , [OpenAccess] , [Publisher]
In the domain of Linked Open Data a need is emerging for developing automated frameworks able to generate the licensing terms associated to data coming from heterogeneous distributed sources. This paper proposes and evaluates a deontic logic semantics which allows us to deﬁne the deontic components of the licenses, i.e., permissions, obligations, and prohibitions, and generate a composite license compliant with the licensing items of the composed different licenses. Some heuristics are proposed to support the data publisher in choosing the licenses composition strategy which better suits her needs w.r.t. the data she is publishing.
Ontology-Based Data Access: Ontop of Databases , Mariano Rodriguez-Muro,Roman Kontchakov and Michael Zakharyaschev , 545-560 , [OpenAccess] , [Publisher]
We present the architecture and technologies underpinning the OBDA system Ontop and taking full advantage of storing data in relational databases. We discuss the theoretical foundations of Ontop : the tree-witness query rewriting, T -mappings and optimisations based on database integrity constraints and SQL features. We analyse the performance of Ontop in a series of experiments and demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality.
Pattern Based Knowledge Base Enrichment , Lorenz Bühmann and Jens Lehmann , 33-48 , [OpenAccess] , [Publisher]
Although an increasing number of RDF knowledge bases are published, many of those consist primarily of instance data and lack sophisticated schemata. Having such schemata allows more powerful querying, consistency checking and debugging as well as improved inference. One of the reasons why schemata are still rare is the effort required to create them. In this article, we propose a semi-automatic schemata construction approach addressing this problem: First, the frequency of axiom patterns in existing knowledge bases is discovered. Afterwards, those patterns are converted to SPARQL based pattern detection algorithms, which allow to enrich knowledge base schemata. We argue that we present the ﬁrst scalable knowledge base enrichment approach based on real schema usage patterns. The approach is evaluated on a large set of knowledge bases with a quantitative and qualitative result analysis.
Personalized Best Answer Computation in Graph Databases , Michael Ovelgönne,Noseong Park,V.S. Subrahmanian,Elizabeth K. Bowman and Kirk A. Ogaard , 465-480 , [OpenAccess] , [Publisher]
Though subgraph matching has been extensively studied as a query paradigm in semantic web and social network data environments, a user can get a large number of answers in response to a query. Just like Google does, these answers can be shown to the user in accordance with an importance ranking. In this paper, we present scalable algorithms to ﬁnd the top-K answers to a practically important subset of SPARQL-queries, denoted as importance queries, via a suite of pruning techniques. We test our algorithms on multiple real-world graph data sets, showing that our algorithms are efficient even on networks with up to 6M vertices and 15M edges and far more efficient than popular triple stores.
ProSWIP: Property-based Data Access for Semantic Web Interactive Programming , Silviu Homoceanu,Philipp Wille and Wolf-Tilo Balke , 177-192 , [OpenAccess] , [Publisher]
The Semantic Web has matured from a mere theoretical vision to a variety of ready-to-use linked open data sources currently available on the Web. Still, with respect to application development , the Web community is just starting to develop new paradigms in which data as the main driver of applications is promoted to first class status. Relying on properties of resources as an indicator for the type, property-based typing is such a paradigm. In this paper, we inspect the feasibility of property-based typing for accessing data from the linked open data cloud. Problems in terms of transparency and quality of the selected data were noticeable. To alleviate these problems, we developed an iterative approach that builds on human feedback.
QODI: Query as Context in Automatic Data Integration , Aibo Tian,Juan F. Sequeda and Daniel Miranker , 609-624 , [OpenAccess] , [Publisher]
QODI is an automatic ontology-based data integration system (OBDI). QODI is distinguished in that the ontology mapping algorithm dynamically determines a partial mapping speciﬁc to the reformulation of each query. The query provides application context not available in the ontologies alone; thereby the system is able to disambiguate mappings for different queries. The mapping algorithm decomposes the query into a set of paths, and compares the set of paths with a similar decomposition of a source ontology. Using test sets from three real world applications, QODI achieves favorable results compared with AgreementMaker, a leading ontology matcher, and an ontology-based implementation of the mapping methods detailed for Clio, the state-of-the-art relational data integration and data exchange system.
Real-time RDF extraction from unstructured data streams , Daniel Gerber,Sebastian Hellmann,Lorenz Bühmann,Tommaso Soru,Axel-Cyrille Ngonga Ngomo and Ricardo Usbeck , 129-144 , [OpenAccess] , [Publisher]
The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide signiﬁcant beneﬁts to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reﬂects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.
Secure Manipulation of Linked Data , Sabrina Kirrane,Ahmed Abdelrahman,Alessandra Mileo and Stefan Decker , 241-256 , [OpenAccess] , [Publisher]
When it comes to publishing data on the web, the level of access control required (if any) is highly dependent on the type of content exposed. Up until now RDF data publishers have focused on exposing and linking public data. With the advent of SPARQL 1.1, the linked data infrastructure can be used, not only as a means of publishing open data but also, as a general mechanism for managing distributed graph data. However, such a decentralised architecture brings with it a number of additional challenges with respect to both data security and integrity. In this paper, we propose a general authorisation framework that can be used to deliver dynamic query results based on user credentials and to cater for the secure manipulation of linked data. Speciﬁcally we describe how graph patterns, propagation rules, conﬂict resolution policies and integrity constraints can together be used to specify and enforce consistent access control policies.
Semantic Message Passing for Generating Linked Data from Tables , Varish Mulwad,Tim Finin and Anupam Joshi , 353-368 , [OpenAccess] , [Publisher]
We describe work on automatically inferring the intended meaning of tables and representing it as RDF linked data, making it available for improving search, interoperability and integration. We present implementation details of a joint inference module that uses knowledge from the linked open data (LOD) cloud to jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns. We also implement a novel Semantic Message Passing algorithm which uses LOD knowledge to improve existing message passing schemes. We evaluate our implemented techniques on tables from the Web and Wikipedia.
Semantic Rule Filtering for Web-Scale Relation Extraction , Andrea Moro,Hong Li,Sebastian Krause,Feiyu Xu,Roberto Navigli and Hans Uszkoreit , 337-352 , [OpenAccess] , [Publisher]
Web-scale relation extraction is a means for building and extending large repositories of formalized knowledge. This type of automated knowledge building requires a decent level of precision, which is hard to achieve with automatically acquired rule sets learned from unlabeled data by means of distant or minimal supervision. This paper shows how precision of relation extraction can be considerably improved by employing a wide-coverage, general-purpose lexical semantic network, i.e., BabelNet, for effective semantic rule ﬁltering. We apply Word Sense Disambiguation to the content words of the automatically extracted rules. As a result a set of relation-speciﬁc relevant concepts is obtained, and each of these concepts is then used to represent the structured semantics of the corresponding relation. The resulting relation-speciﬁc subgraphs of BabelNet are used as semantic ﬁlters for estimating the adequacy of the extracted rules. For the seven semantic relations tested here, the semantic ﬁlter consistently yields a higher precision at any relative recall value in the high-recall range.
Simplified OWL Ontology Editing for the Web: Is WebProtégé Enough? , Matthew Horridge,Tania Tudorache,Jennifer Vendetti,Csongor Nyulas,Mark Musen and Natasha F. Noy , 193-208 , [OpenAccess] , [Publisher]
Ontology engineering is a task that is notorious for its difficulty. As the group that developed Protégé, the most widely used ontology editor, we are keenly aware of how difficult the users perceive this task to be. In this paper, we present the new version of WebProtégé that we designed with two main goals in mind: (1) create a tool that will be easy to use while still accounting for commonly used OWL constructs; (2) support collaboration and social interaction around distributed ontology editing as part of the core tool design. We designed this new version of the WebProtégé user interface empirically, by analysing the use of OWL constructs in a large corpus of publicly available ontologies. Since the beta release of this new WebProtégé interface in January 2013, our users from around the world have created and uploaded 519 ontologies on our server. In this paper, we describe the key features of the new tool and our empirical design approach. We evaluate language coverage in WebProtégé by assessing how well it covers the OWL constructs that are present in ontologies that users have uploaded to WebProtégé. We evaluate the usability of WebProtégé through a usability survey. Our analysis validates our empirical design, suggests additional language constructors to explore, and demonstrates that an easy-to-use web-based tool that covers most of the frequently used OWL constructs is sufficient for many users to start editing their ontologies.
Simplifying Description Logic Ontologies , Nadeschda Nikitina and Sven Schewe , 401-416 , [OpenAccess] , [Publisher]
We discuss the problem of minimizing TBoxes expressed in the light-weight description logic E L, which forms a basis of some large ontologies like SNOMED, Gene Ontology, NCI and Galen. We show that the minimization of TBoxes is intractable (NP-complete). While this looks like a bad news result, we also provide a heuristic technique for minimizing TBoxes. We prove the correctness of the heuristics and show that it provides optimal results for a class of ontologies, which we deﬁne through an acyclicity constraint over a reference relation between equivalence classes of concepts. To establish the feasibility of our approach, we have implemented the algorithm and evaluated its effectiveness on a small suite of benchmarks.
Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets , Ziqi Zhang,Anna Lisa Gentile,Eva Blomqvist,Isabelle Augenstein and Fabio Ciravegna , 687-702 , [OpenAccess] , [Publisher]
The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to ﬁnd the right objects to annotate with. This decreases the quality of data and may eventually hamper its usability over large scale. This paper describes Statistical Knowledge Patterns (SKP) as a means to address this issue. SKPs encapsulate key information about ontology classes, including synonymous properties in (and across) datasets, and are automatically generated based on statistical data analysis. SKPs can be effectively used to automatically normalise data, and hence increase recall in querying. Both pattern extraction and pattern usage are completely automated. The main beneﬁts of SKPs are that: (1) their structure allows for both accurate query expansion and restriction; (2) they are context dependent, hence they describe the usage and meaning of properties in the context of a particular class; and (3) they can be generated oﬄine, hence the equivalence among relations can be used efficiently at run time.
TRM – Learning Dependencies between Text and Structure with Topical Relational Models , Veli Bicer,Thanh Tran and Yongtao Ma , 1-16 , [OpenAccess] , [Publisher]
Text-rich structured data become more and more ubiquitous on the Web and on the enterprise databases by encoding heterogeneous structural information between entities such as people, locations, or organizations and the associated textual information. For analyzing this type of data, existing topic modeling approaches, which are highly tailored toward document collections, require manually-deﬁned regularization terms to exploit and to bias the topic learning towards structure information. We propose an approach, called Topical Relational Model, as a principled approach for automatically learning topics from both textual and structure information. Using a topic model, we can show that our approach is effective in exploiting heterogeneous structure information, outperforming a state-of-the-art approach that requires manually-tuned regularization.
TRank: Ranking Entity Types Using the Web of Data , Alberto Tonon,Michele Catasta,Gianluca Demartini,Philippe Cudré-Mauroux and Karl Aberer , 625-640 , [OpenAccess] , [Publisher]
Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more speciﬁc types, which may be relevant or not given the document context. For example, one can ﬁnd on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we deﬁne the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to ﬁnd the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.
The Combined Approach to OBDA: Taming Role Hierarchies using Filters , Carsten Lutz,Inanc Seylan,David Toman and Frank Wolter , 305-320 , [OpenAccess] , [Publisher]
The basic idea of the combined approach to query answering in the presence of ontologies is to materialize the consequences of the ontology in the data and then use a limited form of query rewriting to deal with inﬁnite materializations. While this approach is efficient and scalable for ontologies that are formulated in the basic version of the description logic DL-Lite, it incurs an exponential blowup during query rewriting when DL-Lite is extended with the popular role hierarchies. In this paper, we show how to replace the query rewriting with a ﬁltering technique. This is natural from an implementation perspective and allows us to handle role hierarchies without an exponential blowup. We also carry out an experimental evaluation that demonstrates the scalability of this approach.
The Logic of Extensional RDFS , Enrico Franconi,Claudio Gutierrez,Alessandro Mosca,Giuseppe Pirrò and Riccardo Rosati , 97-112 , [OpenAccess] , [Publisher]
The normative version of RDF Schema (RDFS) gives non-standard (intensional) interpretations to some standard notions such as classes and properties, thus departing from standard set-based semantics. In this paper we develop a standard set-based (extensional) semantics for the RDFS vocabulary while preserving the simplicity and computational complexity of deduction of the intensional version. This result can positively impact current implementations, as reasoning in RDFS can be implemented following common set-based intuitions and be compatible with OWL extensions.
Towards Constructive Evidence of Data Flow-oriented Web Service Composition , Freddy Lecue , 289-304 , [OpenAccess] , [Publisher]
Automation of service composition is one of the most interesting challenges facing the Semantic Web and the Web of services today. Despite approaches which are able to infer a partial order of services, its data ﬂow remains implicit and difﬁcult to be automatically generated. Enhanced with formal representations, the semantic links between output and input parameters of services can be then exploited to infer their data ﬂow. This work addresses the problem of effectively inferring data ﬂow between services based on their representations. To this end, we introduce the non standard Description Logic reasoning join, aiming to provide a “constructive evidence” of why services can be connected and how non trivial links (many to many parameters) can be inferred in data ﬂow. The preliminary evaluation provides evidence in favor of our approach regarding the completeness of data ﬂow.
Towards an automatic creation of localized versions of DBpedia , Alessio Palmero Aprosio,Claudio Giuliano and Alberto Lavelli , 481-496 , [OpenAccess] , [Publisher]
DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, the number of accomplished mappings is still small and limited to most frequent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings. We compared our results with the current mappings on a random sample re-annotated by the authors. We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a signiﬁcant improvement in recall and speed. Speciﬁcally, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not yet available. The resource is made available in an open format.
Type Inference on Noisy RDF Data , Heiko Paulheim and Christian Bizer , 497-512 , [OpenAccess] , [Publisher]
Type information is very valuable in knowledge bases. However, most large open knowledge bases are incomplete with respect to type information, and, at the same time, contain noisy and incorrect data. That makes classic type inference by reasoning difficult. In this paper, we propose the heuristic link-based type inference mechanism SD-Type, which can handle noisy and incorrect data. Instead of leveraging T-box information from the schema, SDType takes the actual use of a schema into account and thus is also robust to misused schema elements.
What's in a 'nym'? Synonyms in Biomedical Ontology Matching , Catia Pesquita,Cosmin Stroe,Daniel Faria,Emanuel Santos,Isabel Cruz and Francisco Couto , 513-528 , [OpenAccess] , [Publisher]
To bring the Life Sciences domain closer to a Semantic Web realization it is fundamental to establish meaningful relations between biomedical ontologies. The successful application of ontology matching techniques is strongly tied to an effective exploration of the complex and diverse biomedical terminology contained in biomedical ontologies. In this paper, we present an overview of the lexical components of several biomedical ontologies and investigate how different approaches for their use can impact the performance of ontology matching techniques. We propose novel approaches for exploring the different types of synonyms encoded by the ontologies and for extending them based both on internal synonym derivation and on external ontologies. We evaluate our approaches using AgreementMaker, a successful ontology matching platform that implements several lexical matchers, and apply them to a set of four benchmark biomedical ontology matching tasks. Our results demonstrate the impact that an adequate consideration of ontology synonyms can have on matching performance, and validate our novel approach for combining internal and external synonym sources as a competitive and in many cases improved solution for biomedical ontology matching.

Proceedings of ISWC 2013 Doctoral Consortium

Adaptive Navigation through Semantic Annotations and Service Descriptions , Ruben Verborgh
Enriching Ontologies through Data , Mahsa Chitsaz
Explaining data patterns using background knowledge from Linked Data , Ilaria Tiddi
NLP for Interlinking Multilingual LOD , Tatiana Lesnikova
Ontology Evolution for End-User Communities , Peter Goodall
Ontology-based top-k query answering over massive, heterogeneous, and dynamic data , Daniele Dell'Aglio
Optimizing RDF stores by coupling General-purpose Graphics Processing Units and Central Processing Units , Bassem Makni
Semantic Interpretation of Mobile Phone Records Exploiting Background Knowledge , Zolzaya Dashdorj
Television meets the Web: a Multimedia Hypervideo Experience , José Luis Redondo-García
Utilising Provenance to Enhance Social Computation , Milan Markovic

Proceedings of ISWC 2013 Industry Track

A Semantic Approach to Data Center Management , Andriy Nikolov,Michael Schmidt,Christian Hütter,Timo Weber and Peter Haase
DISQOVER Links For Lives: Building a Linked Data UX (User eXperience) for Federated Query and Faceted Search Linking HealthCare and Life sciences Data for All Users , Hans Constandt
How Semantic Technologies supercharge a platform for context-aware applications , David Damen
Integrating Relational Databases with the Semantic Web: Four Scenarios , Daniel P. Miranker and Juan F. Sequeda
OData and the Semantic Web , Michael Pizzo and Evelyne Viegas
Semantic Business Architecture Modelling in Financial Industry Regulation , Terry Roach

Proceedings of ISWC 2013 Poster and Demo

A Distributed Reasoning Platform to Preserve Energy in Wireless Sensor Networks , Femke Ongenae,Stijn Verstichel,Maarten Wijnants and Filip De Turck
A distributed reasoning platform is presented to reduce the energy consumption of Wireless Sensor Networks (WSNs) offering geospatial services by minimizing the amount of wireless communication. It combines local, rule-based reasoning on the sensors and gateways with global, ontology-based reasoning on the back-end servers. The Semantic Sensor Network (SNN) Ontology was extended to model the WSN energy consumption. Two prototypes are presented: the Personal Parking Assistant (PPA) and Garbage Bin Tampering Monitor (GBTM).
A Hybrid Natural Language Approach to Manage Semantic Interoperability for Public Health Analytics , Maxime Lavigne,Arash Shaban-Nejad,Anya Okhmatovskaia,Luke Mondor and David L. Buckeridge
This paper discusses the integration of an ontology with a natural language query engine to calculate and interpret epidemiological indicators for population health assessments. In this paper, we discuss the application of this approach to one type of possible query, which retrieves health determinants, causally associated with diabetes mellitus.
A Machine Reader for the Semantic Web , Aldo Gangemi,Valentina Presutti,Francesco Draicchio and Andrea Giovanni Nuzzolese
FRED is a machine reading tool for converting text into internally well-connected and quality linked-data-ready ontologies in web-service-acceptable time. It implements a novel approach for ontology design from natural language sentences, combining Discourse Representation Theory (DRT), linguistic frame semantics, and Ontology Design Patterns (ODP). The tool is based on Boxer which implements a DRT-compliant deep parser. The logical output of Boxer is enriched with semantic data from VerbNet (or FrameNet) frames and transformed into RDF/OWL by means of a mapping model and a set of heuristics following best practices of OWL ontologies and RDF data design. The current version of the tool includes Earmark-based markup, and enrichment with WSD and NER off-the-shelf components.
A Protein Annotation Framework Empowered with Semantic Reasoning , Jemma Wu,Edmond Breen,Xiaomin Song,Brett Cooke and Mark Molloy
This paper presents an association discovery framework for proteins based on semantic annotations from biomedical literature. An automatic ontology-based annotation method is used to create a semantic protein annotation knowledge base. A semantic reasoning service enables realisation reasoning on original annotations to infer more accurate associations and executes semantic query transformation. A case study on protein-disease association discovery on a real-world colorectal cancer dataset is presented.
A Restful Interface for RDF Stream Processors , Marco Balduini and Emanuele Della Valle
This poster proposes a minimal, backward compatible and combinable restful interface for RDF Stream Engine.
A Search Interface for Researchers to Explore Affinities in a Linked Data Knowledge Base , Laurens De Vocht,Selver Softic,Erik Mannens,Rik Van de Walle and Martin Ebner
Research information is widely available on the Web. Both as peer-reviewed research publications or as resources shared via (micro)blogging platforms or other Social Media. Usually the platforms supporting this information exchange have an API that allows access to the structured content. This opens a new way to search and explore research information. In this paper, we present an approach that visualizes interactively an aligned knowledge base of these resources. We show that visualizing resources, such as conferences, publications and proceedings, expose affnities between researchers and those resources. We characterize each affinity, between researchers and resources, by the amount of shared interests and other commonalities.
A Study on the Correspondence between FCA and $\mathcal{ELI}$ Ontologies , Melisachew Wudage Chekol and Amedeo Napoli
The description logic $\mathcal{EL}$ has been used to support ontology design in various domains, and especially in biology and medecine. $\mathcal{EL}$ is known for its efficient reasoning and query answering capabilities. By contrast, ontology design and query answering can be supported and guided within an FCA framework. Accordingly, in this paper, we propose a formal transformation of $\mathcal{ELI}$ (an extension of $\mathcal{EL}$ with \textit{inverse roles}) ontologies into an FCA framework, i.e. $K_\mathrm{\mathcal{ELI}}$, and we provide a formal characterization of this transformation. Then we show that SPARQL query answering over $\mathcal{ELI}$ ontologies can be reduced to lattice query answering over $K_\mathrm{\mathcal{ELI}}$ concept lattices. This simplifies the query answering task and shows that some basic semantic web tasks can be improved when considered from an FCA perspective.
A user interface to build interactive visualizations for the semantic web , Miguel Ceriani,Paolo Bottoni and Simona Valentini
While the web of linked data gets increasingly richer in size and complexity, its use is still constrained by the lack of applications consuming this data. We propose a Web-based tool to build and execute complex applications to transform, integrate and visualize Semantic Web data. Applications are composed as pipelines of a few basic components and completely based on Semantic Web standards, including SPARQL Construct for data transformation and SPARQL Update for state transition. The main novelty of the approach lays in the support to interaction, through the availability of user interface event streams as pipeline inputs.
ActiveRaUL: A Web form-based User Interface to create and maintain RDF data , Anila Sahar Butt,Armin Haller,Shepherd Liu and Lexing Xie
With the advent of Linked Data the amount of automatically generated machine-readable data on the Web, often obtained by means of mapping relational data to RDF, has risen significantly. However, manually created, quality-assured and crowd-sourced data based on ontologies, is not available in the quantities that would realise the full potential of the semantic Web. One of the barriers for semantic Web novices to create machine-readable data, is the lack of easy-to-use Web publishing tools that separate the schema modelling from the data creation. In this demonstration we present ActiveRaUL, a Web service that supports the automatic generation of Web form-based user interfaces from any input ontology. The resulting Web forms are unique in supporting users, inexperienced in semantic Web technologies, to create and maintain RDF data modelled according to an ontology. We report on a use case based on the Sensor Network Ontology that supports the viability of our approach.
Adding Time to Linked Data: A Generic Memento proxy through PROV , Miel Vander Sande,Sam Coppens,Ruben Verborgh,Erik Mannens and Rik Van de Walle
Linked Data resources change rapidly over time, making a valid consistent state difficult. As a solution, the Memento framework offers content negotiation in the datetime dimension. However, due to a lack of formally described versioning, every server needs a costly custom implementation. In this poster paper, we exploit published provenance of Linked Data resources to implement a generic Memento servics. Based on the w3c prov standard, we propose a loosely coupled architecture that offers a Memento interface to any Linked Data service publishing provenance.
An FCA Framework for Knowledge Discovery in SPARQL Query Answers , Melisachew Wudage Chekol and Amedeo Napoli
Formal concept analysis (FCA) is used for knowledge discovery within data. In FCA, concept lattices are very good tools for classification and organization of data, hence, they enable the user to visualize the answers of its SPARQL query as concept lattices instead of the usual answer formats such as: RDF/XML, JSON, CSV, and HTML. Consequently, in this work, we apply FCA to reveal hidden relations within SPARQL query answers by means of concept lattices.
Assisted Policy Management for SPARQL Endpoints Access Control , Luca Costabello,Serena Villata,Iacopo Vagliano and Fabien Gandon
Shi3ld is a context-aware authorization framework for protecting SPARQL endpoints. It assumes the definition of access policies using RDF and SPARQL, and the specification of named graphs to identify the protected resources. These assumptions lead to the incapability for users who are not familiar with such languages and technologies to use the authorization framework. In this paper, we present a graphical user interface to support dataset administrators to define access policies and the target elements protected by such policies.
Best-effort Linked Data Query Processing with time constraints using ADERIS-Hybrid , Steven Lynden,Isao Kojima,Akiyoshi Matono and Akihito Nakamura
Answering SPARQL queries over the Web of Linked Data is a challenging problem. Approaches based on distributed query processing provide up-to-date results but can suffer from delayed response times, indexing-based approaches provide fast response times but results can be out-of-date and the costs of indexing the growing Web of Linked Data are potentially huge. Hybrid approaches try to offer the best of both. In this demo paper we describe a system for answering SPARQL queries within fixed time constraints by accessing SPARQL endpoints and the Web of Linked Data directly.
CEDAR: a Fast Taxonomic Reasoner Based on Lattice Operations , Samir Amir and Hassan Aït-Kaci
Taxonomy classification and query answering are the core reasoning services provided by most of the Semantic Web (SW) reasoners. However, the algorithms used by those reasoners are based on Tableau method or Rules. These well-known methods in the literature have already shown their limitations for large-scale reasoning.In this demonstration, we shall present the CEDAR system for classifying and reasoning on very large taxonomies using a technique based on lattice operations. This technique makes the CEDAR reasoner perform on a par with the best systems for concept classification and several orders-of-magnitude more efficiently in terms of response time for query-answering. The experiments were carried out using very large taxonomies (Wikipedia: 111599 sorts, MESH: 286381 sorts, NCBI: 903617 sorts and Biomodels: 182651 sorts). The results achieved by CEDAR were compared to those obtained by well-known Semantic Web reasoners, namely FaCT++, Pellet, HermiT, TrOWL, SnoRocket and RacerPro.
Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications , Bernardo Pereira Nunes,Besnik Fetahu,Stefan Dietze and Marco Antonio Casanova
Cite4Me is a Web application that leverages Semantic Web technologies to provide a new perspective on search and retrieval of bibliographical data. The Web application presented in this work focuses on: (i) semantic recommendation of papers; (ii) novel semantic search & retrieval of papers; (iii) data interlinking of bibliographical data with related data sources from LOD; (iv) innovative user interface design; and (v) sentiment analysis of extracted paper citations. Finally, as this work also targets some educational aspects, our application provides an in-depth analysis of the data that guides a user on his research field.
Comparing ontologies with ecco , Rafael S. Gonçalves,Bijan Parsia and Uli Sattler
The detection and presentation of changes between OWL ontologies (in the form of a diff) is an important service for ontology engineering, being an active research topic. We present here a diff tool that incorporates structural and semantic techniques in order to, firstly, distinguish effectual and ineffectual changes between ontologies and, secondly, align and categorise those changes according to their impact. Such a categorisation of changes is shown to facilitate the navigation through, and analysis of change sets. The tool is made available as a web-based application, as well as a standalone command-line tool. Both of these output an XML change set file and a transformation into HTML, which allows users to browse through and focus on those changes of utmost interest using any web-browser.
Context Aware Sensor Configuration Model for Internet of Things , Charith Perera,Arkady Zaslavsky,Michael Compton,Peter Christen and Dimitrios Georgakopoulos
We propose a Context Aware Sensor Configuration Model (CASCoM) to address the challenge of automated context-aware configuration of filtering, fusion, and reasoning mechanisms in IoT middleware according to the problems at hand. We incorporate semantic technologies in solving the above challenges.
Coordinating Social Care and Healthcare using Semantic Web Technologies , Spyros Kotoulas,Vanessa Lopez,Martin Stephenson,Pierpaolo Tommasi,Wei Jia Shen,Gang Hu,Marco Luca Sbodio,Veli Bicer,Anastasios Kementsietsidis,M. Mustafa Rafique,Jason Ellis,Thomas Erickson,Kavitha Srinivas,Kevin McAuliffe,Guo Tong Xie and Pol Mac Aonghusa
Social care and Healthcare are unique in terms of cultural importance, economic size and domain complexity. Combining information systems from both domains poses unique scientific and technical challenges with regard to information representation, access, integration and retrieval granularity. We present a semantics-based approach that is uniquely positioned to access information across domains using a combination of business rules and contextual exploration. A proof of concept illustrates that semantic technologies can cope in a scenario where traditional data integration approaches are too costly and reduce the addressable information space.
Curating Semantic Linked Open Datasets for Software Engineering , Kavi Mahesh,Aparna Nagarajan,Apoorva Rao Balevalachilu and Karthik Rajendra Prasad
A typical software engineer spends a significant amount of time and effort reading technical manuals to find answers to questions especially those related to features, versions, compatibilities and dependencies of software and hardware components, languages, standards, modules, libraries and products. It is currently not possible to provide a semantic solution to their problem primarily due to the non-availability of comprehensive semantic datasets in the domains of information technology. In this work, we have extracted, integrated and curated a linked open dataset (LOD) called LOaD-IT exclusively on this domain from a variety of sources including other LODs such as Freebase and DBPedia, technical documentation such as JavaDocs and others. Further, we have built a technical helpdesk system using a semantic query engine that derives answers from LOaD-IT. Our system demonstrates how productivity of the software engineer can be improved by eliminating the need to read through lengthy technical manuals. We expect LOaD-IT to become more comprehensive in the future and to find other related practical applications.
D-SPARQ: Distributed, Scalable and Efficient RDF Query Engine , Raghava Mutharaju,Sherif Sakr,Alessandra Sala and Pascal Hitzler
We present D-SPARQ, a distributed RDF query engine that combines the MapReduce processing framework with a NoSQL distributed data store, MongoDB. The performance of processing SPARQL queries mainly depends on the efficiency of handling the join operations between the RDF triple patterns. Our system features two unique characteristics that enable efficiently tackling this challenge: 1) Identifying specific patterns of the input queries that enable improving the performance by running different parts of the query in a parallel mode. 2) Using the triple selectivity information for reordering the individual triples of the input query within the identified query patterns. The preliminary results demonstrate the scalability and efficiency of our distributed RDF query engine.
DRETa: Extracting RDF from Wikitables , Emir Muñoz,Aidan Hogan and Alessandra Mileo
Tables are widely used in Wikipedia articles to display relational information – they are inherently concise and information rich. However, aside from info-boxes, there are no automatic methods to exploit the integrated content of these tables. We thus present DRETa: a tool that uses DBpedia as a reference knowledge-base to extract RDF triples from generic Wikipedia tables.
Demo: Swip, a Semantic Web Interface using Patterns , Camille Pradel,Ollivier Haemmerlé and Nathalie Hernandez
Our purpose is to provide end-users a means to query ontology based knowledge bases using natural language queries and thus hide the complexity of formulating a query expressed in a graph query language such as SPARQL. The main originality of our approach lies in the use of query patterns. Our contribution is materialized in a system named SWIP, standing for Semantic Web Interface Using Patterns. The demo will present use cases of this system.
Demonstrating The Entity Registry System: Implementing 5-Star Linked Data Without the Web , Marat Charlaganov,Philippe Cudré-Mauroux,Christian Dinu,Christophe Guéret,Martin Grund and Teodor Macicas
Linked Data applications often assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Web access. Many data-driven applications may have a critical impact on the life of those people, but are inaccessible to those populations due to the architecture of today's data registries. In this demonstration, we show a new open-source system that can be used as a general-purpose entity registry suitable for deployment in poorly-connected or ad-hoc environments.
Demonstration: Semantic Web Enabled Smart Farm with GSN , Raj Gaire,Laurent Lefort,Michael Compton,Gregory Falzon,David Lamb and Kerry Taylor
GSN is an open source middleware for managing data produced by sensors deployed in a sensor network. We have extended and enhanced GSN to enable (i) semantically aware preparation, exchange and processing of the data (ii) user specified event processing for alerts and (iii) associate sensor data to 'things'. Here, we demonstrate our smart farm as a use case of a semantically aware sensor network for better integration of sensor data.
Denoting Data in the Grounded Annotation Framework , Marieke Van Erp,Antske Fokkens,Piek Vossen,Sara Tonelli,Willem Robert Van Hage,Luciano Serafini,Rachele Sprugnoli and Jesper Hoeksema
Semantic web applications are integrating data from more and more different types of sources about events. However, most data annotation frameworks do not translate well to semantic web. We present the grounded annotation framework (GAF), a two-layered framework that aims to build a bridge between mentions of events in a data source such as a text document and their formal representation as instance}. By choosing a two-layered approach, neither the mention layer, nor the semantic layer needs to compromise on what can be represented. We demonstrate the strengths of GAF in flexibility and reasoning through a use case on earthquakes in Southeast Asia.
DiTTO: Diagrams Transformation inTo OWL , Aldo Gangemi and Silvio Peroni
In this paper we introduce DiTTO, an online service that allows one to convert a E/R diagram created through the yEd diagram editor into a proper OWL ontology according to three different conversion strategies.
Discoverability of SPARQL Endpoints in Linked Open Data , Heiko Paulheim and Sven Hertling
Accessing Linked Open Data sources with query languages such as SPARQL provides more flexible possibilities than access based on derefencerable URIs only. However, discovering a SPARQL endpoint on the fly, given a URI, is not trivial. This paper provides a quantitative analysis on the automatic discoverability of SPARQL endpoints using different mechanisms.
Distributed SPARQL Throughput Increase: On the effectiveness of Workload-driven RDF partitioning , Cosmin Basca and Abraham Bernstein
The Web of Data (WoD) continues to grow steadily each year. At over 31 billion triples in 2011, querying this globally distributed data space poses several scalability challenges. One critical aspect when processing distributed SPARQL queries is given by the number and type of distributed joins needed. Traditionally, query optimizers alleviate this issue by attempting to find an optimal query plan assuming a given and fixed data distribution. Discarding this fixed data partitioning assumption, offers the opportunity to create a data distribution that minimizes the number of distributed joins. Recent research focused on data- and query-driven partitioning strategies for both RDF and relational data. In this paper we propose a novel and naive workload-driven approach to data partitioning and investigate the impact of various critical factors on the number of resulting distributed joins. In a preliminary experiment we empirically compare our method to traditional partitioning strategies using a DBpedia query log of 400’000 queries and observe that it can produce up to 50% less distributed joins than an expert (manual) partitioning scheme, 45% less than partitioning based on hashing by subject and up to 83% less distributed joins than just random assignment.
Do it yourself (DIY) Jeopardy QA System , Andre Freitas and Edward Curry
This work demonstrates Treo, a framework which converges elements from Natural Language Processing, Semantic Web, Information Retrieval and Databases, to create a semantic search engine and question answering (QA) system for heterogeneous data. Jeopardy and Question Answering queries over open domain structured and unstructured data are used to demonstrate the approach. In this work, Treo is extended to cope with unstructured data in addition to structured data. The setup of the framework is done in 3 steps and can be adapted to other datasets by practitioners in a simple DIY process.
Editing R2RML Mappings Made Easy. , Kunal Sengupta,Peter Haase,Michael Schmidt and Pascal Hitzler
The new W3C standard R2RML\footnote{See: http://www.w3.org/TR/r2rml/} defines a language for expressing mappings from relational databases to RDF, allowing applications built on top of the W3C Semantic Technology stack to seamlessly integrate relational data. A major obstacle in using R2RML, though, is the creation and maintenance of mappings. In this demo, we present a novel R2RML mapping editor, which provides a user interface to create and edit mappings interactively. Hiding the R2RML vocabulary intricacies from the user, the editor enables even non-experts to create R2RML mappings in a guided way, offers immediate feedback by means of integrated preview functionality, and covers all the major language constructs defined in the R2RML standard.
Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs , Stephan Seufert,Srikanta J. Bedathur,Johannes Hoffart,Andrey Gubichev and Klaus Berberich
Given two sets of entities – potentially the results of two queries on aknowledge graph like YAGO or DBpedia – characterizing the relationship betweenthese sets in the form of important people, events and organizations is an analyticstask useful in many domains. In this paper, we present an intuitive and efficientlycomputable vertex centrality measure that captures the importance of a nodewith respect to the explanation of the relationship between the pair of query sets.Using a weighted link graph of entities contained in the English Wikipedia, wedemonstrate the usefulness of the proposed measure.
Enriching Concept Search across Semantic Web Ontologies , Chetana Gavankar,Vishwajeet Kumar,Yuan-Fang Li and Ganesh Ramakrishnan
Semantic Web ontologies are fast-growing knowledge sources on the Web. Searching relevant concepts from this large repository is a challenging problem. The current Semantic Web search engines provide either (1) coarse-grained search over ontologies or (2) very fine-grained search over individuals. We believe searching and ranking concepts across ontologies provides an ideal granularity for certain tasks such as ontology population and web page annotation. Towards this objective, we propose a novel approach of indexing concepts using ontology axioms in an inverted file structure and ranking them using a dynamic ranking algorithm. Our proposed method is generic and domain-independent. A preliminary evaluation indicates that our proposed method is effective, outperforming the search function of BioPortal, a large and widely-used ontology repository.
Explaining Clusters with Inductive Logic Programming and Linked Data , Ilaria Tiddi,Mathieu D'Aquin and Enrico Motta
Knowledge Discovery consists in discovering hidden regulari- ties in large amount of data using data mining techniques. The obtained patterns require an interpretation that is usually achieved using some background knowledge given by experts from several domains. On the other hand, the rise of Linked Data has increased the number of con- nected cross-disciplinary knowledge, in the form of RDF datasets, classes and relationships. Here we show how Linked Data can be used in an Inductive Logic Programming process, where they provide background knowledge for finding hypotheses regarding the unrevealed connections between items of a cluster. By using an example with clusters of books, we show how different Linked Data sources can be used to automatically generate rules giving an underlying explanation to such clusters.
Exploring Linked Open Data with Tag Clouds , Xingjian Zhang,Dezhao Song,Sambhawa Priya and Jeff Heflin
In this paper we present the contextual tag cloud system: a novel application that helps users explore a large scale RDF dataset. Unlike folksonomy tags used in most traditional tag clouds, the tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include RDFS taxonomic and/or domain/range entailment in the calculations of tag sizes, thereby understanding the impact of semantics on the data. The system runs on the BTC2012 dataset with more than 1.4 billion triples from which we extract over 380,000 tags. Several scalability challenges must be overcome in order to achieve a responsive interface.
Extending R2RML to a source-independent mapping language for RDF , Anastasia Dimou,Miel Vander Sande,Pieter Colpaert,Erik Mannens and Rik Van de Walle
Although reaching the fifth star of the Open Data deployment scheme demands the data to be represented in RDF and linked, a generic and standard mapping procedure to deploy raw data in RDF was not established so far. Only the R2RML mapping language was standardized but its applicability is limited to mappings from relational databases to RDF. We propose the extension of R2RML to also support mappings of data sources in other structured formats. Broadening its scope, the focus is put on the mappings and their optimal reuse. The language becomes source-agnostic, and resources are integrated and interlinked at a primary stage.
Finite Models in RDF(S), with datatypes , Peter Patel-Schneider and Pat Hayes
The details of reasoning in RDF are generally well known. The model-theoretic characteristcs of RDF have been less studied, particularly when datatypes are added. RDF reasoning can be performed by only considering finite models or pre-models, and sometimes only very small models need be considered.
GRAPHIUM: Visualizing Performance of Graph and RDF Engines on Linked Data , Alejandro Flores,Guillermo Palma,Maria-Esther Vidal,Domingo De Abreu,Valeria Pestana,Jose Pinero,Jonathan Queipo and Jose Sanchez
We present GRAPHIUM a tool to visualize trends and patterns in the performance of existing graph and RDF engines. We will demonstrate GRAPHIUM and attendees will be able to observe and analyze the performance exhibited by Neo4j, DEX, HypergraphDB and RDF-3x when core graph-based and mining tasks are run against a variety of benchmark of graphs of diverse characteristics.
GetThere: A Rural Passenger Information System Utilising Linked Data & Citizen Sensing , David Corsar,Peter Edwards,Chris Baillie,Milan Markovic,Konstantinos Papangelis and John Nelson
This demo paper describes a real-time passenger information system based on citizen sensing and linked data.
Git2PROV: Exposing Version Control System Content as W3C PROV , Tom De Nies,Sara Magliacane,Ruben Verborgh,Sam Coppens,Paul Groth,Erik Mannens and Rik Van de Walle
Data provenance is defined as information about entities, activities and people producing or modifying a piece of data. On the Web, the interchange of standardized provenance of (linked) data is an essential step towards establishing trust. One mechanism to track (part of) the provenance of data, is through the use of version control systems (VCS), such as Git. These systems are widely used to facilitate collaboration primarily for both code and data. Here, we describe a system to expose the provenance stored in VCS in a new standard Web-native format: W3C PROV. This enables the easy publication of VCS provenance on the Web and subsequent integration with other systems that make use of PROV. The system is exposed as a RESTful Web service, which allows integration into user-friendly tools, such as browser plugins.
Hunting for Inconsistencies in Multilingual DBpedia with QAKiS , Elena Cabrio,Julien Cojan,Serena Villata and Fabien Gandon
QAKiS, a system for open domain Question Answering over linked data, allows to query DBpedia multilingual chapters with natural language questions. But since such chapters can contain different information w.r.t. the English version (e.g. more specificity on certain topics, or fill information gaps), i) different results can be obtained for the same query, and ii) the combination of these query results may lead to inconsistent information about the same topic. To reconcile information obtainedby distributed SPARQL endpoints, an argumentation-based module is integrated into QAKiS to reason over inconsistent information sets, and to provide a unique and motivated answer to the user.
IncMap: Pay as you go Matching of Relational Schemata to OWL Ontologies , Christoph Pinkel,Carsten Binnig,Evgeny Kharlamov and Peter Haase
Ontology Based Data Access (OBDA) enables access to relational data with a complex structure through ontologies as conceptual domain models. A key component of an OBDA system are mappings between the schematic elements in the ontology and their correspondences in the relational schema. Today, in existing OBDA systems these mappings typically need to be compiled by hand. In this paper we present IncMap, a system that supports a semiautomatic approach for matching relational schemata and ontologies. Our approach is based on a novel matching technique that represents the schematic elements of an ontology and a relational schema in a unified way. Finally, IncMap can extend user-verified mapping suggestions in a pay as you go fashion.
Interlinking Multilingual LOD Resources: A Study on Connecting Chinese, Japanese, and Korean Resources Using the Unihan Database , Saemi Jang,Satria Hutomo,Soon Gill Hong and Mun Yi
This study proposes a novel method with which Chinese, Japanese, and Korean (CJK) resources on the Web can be effectively matched and connected. The three countries share Chinese characters even though Japan and Korea have their own language. Utilizing the Unihan database, which covers more than 45,000 characters commonly used by the three countries, we show that the proposed method outperforms the traditional method based on string matching in finding similar characters and words used in these countries. The results represent a first step towards overcoming the multilingual barrier in semantically interlinking Asian LOD resources.
KbQAS: A Knowledge-based QA System , Dat Quoc Nguyen,Dai Quoc Nguyen and Son Bao Pham
In this demo paper, we present the first ontology-based Vietnamese question answeringsystem KbQAS in which a knowledge acquisition approach for question analysis is integrated.
Modeling and Reasoning Upon Facebook Privacy Settings , Mathieu D'Aquin and Keerthi Thomas
Understanding the way information is propagated and made visible on Facebook is a difficult task. The privacy settings and the rules that apply to individual items are reasonably straightforward. However, for the user to track all of the information that needs to be integrated and the inferences that can be made on their posts is complex, to the extent that it is almost impossible for any individual to achieve. In this demonstration, we investigate the use of knowledge modeling and reasoning techniques (including basic ontological representation, rules and epistemic logics) to make these inferences explicit to the user.
Monitoring SPARQL Endpoint Status , Pierre-Yves Vandenbussche,Carlos Buil Aranda,Aidan Hogan and Jürgen Umbrich
We demo an online system that tracks the availability of over four-hundred public SPARQL endpoints and makes up-to-date results available to the public. Our demo currently focuses on how often an endpoint is online/offline, but we plan to extend the system to collect metrics about available meta-data descriptions, SPARQL features supported, and performance for generic queries.
Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing , Lorenz Fischer,Thomas Scharrenbach and Abraham Bernstein
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy---a uniform distribution of computation load among available machines---typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication. We propose a graph-partitioning based approach for workload scheduling within stream processing systems.We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.
NoHR: Querying EL with Non-monotonic rules , Vadim Ivanov,Matthias Knorr and Joao Leite
We present NoHR, a Protege plug-in that allows the user to take an EL ontology, add a set of non-monotonic (logic programming) rules - suitable e.g. to express defaults and exceptions - and query the combined knowledge base. Provided the given ontology alone is consistent, the system is capable of dealing with potential inconsistencies between the ontology and the rules, and, after an initial brief pre-processing period utilizing OWL 2 EL reasoner ELK, returns answers to queries at an interactive response time by means of XSB Prolog.
ONTOMS2: an Efficient and Scalable ONTOlogy Management System with an Incremental Reasoning , Min-Joong Lee,Jong-Ryul Lee,Sangyeon Kim,Myung-Jae Park and Chin-Wan Chung
We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an incremental instance reasoning of inverseOf, symmetric and transitive properties.
OU Social: Reaching Students in Social Media , Miriam Fernandez,Harith Alani and Stuart Brown
This work describes OU Social, an application that collects and analyses data from public Facebook groups set up by students to discuss particular Open University courses. This application exploits semantic technologies to monitor the behaviour of users over time as well as the topics that emerge from Facebook group discussions. The paper describes the architecture of OU Social and provides a brief overview of the analysis results obtained from 44 different Facebook groups examined over a 6 year period (2007-2013)
On the Semantics of R2RML and its Relationship with the Direct Mapping , Juan F. Sequeda
The W3C Relational Database to RDF (RDB2RDF) standards are positioned to bridge the gap between Relational Databases and the Semantic Web. The standards consist of two interrelated and complementary specifications: “Direct Mapping of Relational Data to RDF” and “R2RML: RDB to RDF Mapping Language”. In this paper we present initial results on the formal study of the R2RML mapping language by defining its semantics using Datalog. We prove that there are a total of 57 distinct Datalog rules which can be used to generate RDF triples from a relational table. Additionally, we provide insights on the relationship between R2RML and Direct Mapping.
Optique 1.0: Semantic Access to Big Data; The Case of Norwegian Petroleum Directorate’s FactPages , Evgeny Kharlamov,Martin Giese,Ernesto Jiménez-Ruiz,Martin G. Skjæveland,Ahmet Soylu,Dmitriy Zheleznyakov,Timea Bagosi,Marco Console,Peter Haase,Ian Horrocks,Sarunas Marciuska,Christoph Pinkel,Mariano Rodriguez-Muro,Marco Ruzzi,Kunal Sengupta,Michael Schmidt,Evgenij Thorstensen,Johannes Trame and Arild Waaler
The Optique project aims at developing an end-to-end system for semantic data access to Big Data in industries such as Statoil ASA and Siemens AG. In our demonstration we present the first version of the Optique system customised for the Norwegian Petroleum Directorate's FactPages, a public data available for engineers at Statoil ASA. The system provides different options, including visual, to formulate queries over ontologies and to display query answers. Optique~1.0 offers two installation wizards that allow to extract ontologies from relational schemas, extract and define mappings connecting ontologies and schemas, and align and approximate ontologies. Moreover, the system offers tools to edit these components and highly optimised techniques for query answering.
PigSPARQL: A SPARQL Query Processing Baseline for Big Data , Alexander Schätzle,Martin Przyjaciel-Zablocki,Thomas Hornung and Georg Lausen
In this paper, we discuss PigSPARQL, a competitive, yet easy to use, SPARQL query processing system based on MapReduce and thus intended for Big Data applications. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop, as an intermediate layer between SPARQL and MapReduce. The additional level of abstraction makes our approach independent of the actual Hadoop version. Thus, it is automatically compatible to future changes of the Hadoop framework as they will be neutralized by the Pig layer and allows ad-hoc SPARQL query processing on large RDF graphs out of the box. In the paper we first revisit PigSPARQL and demonstrate PigSPARQL's gain of efficiency simply because switching from version Pig 0.5.0 to Pig 0.11.0. Because of this sustainability, PigSPARQL is an attractive long-term baseline for comparing various MapReduce based SPARQL implementations. This is underlined by PigSPARQL's competitiveness with existing systems, e.g. HadoopRDF.
Profiling of Linked Datasets using Structured Descriptions , Besnik Fetahu,Stefan Dietze,Bernardo Pereira Nunes,Davide Taibi and Marco Antonio Casanova
While there exists an increasingly large number of Linked Data, metadata about the content covered by individual datasets is sparse. In this paper, we introduce a processing pipeline to automatically assess, annotate and index available linked datasets. Given a minimal description of a dataset from the DataHub, the process produces a structured RDF-based description that includes information about its main topics. Additionally, the generated descriptions embed datasets into an interlinked graph of datasets based on shared topic vocabularies. We adopt and integrate techniques for Named Entity Recognition and automated data validation, providing a consistent workflow for dataset profiling and annotation. Finally, we validate the results obtained with our tool.
Publishing Data from the Smithsonian American Art Museum as Linked Open Data , Craig Knoblock,Pedro Szekely,Shubham Gupta,Animesh Manglik,Ruben Verborgh and Fengyu Yang
Museums around the world have built databases with meta-data about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the linked data cloud is difficult: the databases are large and complex, the information is richly structured and varies from museum to museum, and it is difficult to link the data to other datasets. We have been collaborating with the Smithsonian American Art Museum to create a set of tools that allow museums and other cultural heritage institutions to publish their data as Linked Open Data. In this demonstration we will show the end-to-end process of starting with the original source data, modeling the data with respect to a ontology of cultural heritage data, linking the data to DBpedia, and then publishing the information as Linked Open Data. Video: http://youtu.be/1Vaytr09H1w
Query Suggestion by Concept Instantiation , Jack Sun,Franky,Kenny Zhu and Haixun Wang
A class of search queries which contain abstract concepts are studied in this paper. These queries cannot be correctly interpreted by traditional keyword-based search engines. This paper presents a simple framework that detects and instantiates the abstract concepts by their concrete entities or meanings to produce alternate queries that yield better search results.
RDFChain: Chain Centric Storage for Scalable Join Processing of RDF Graphs using MapReduce and HBase , Pilsik Choi,Jooik Jung and Kyong-Ho Lee
As a massive linked open data is available in RDF, the scalable storage and efficient retrieval using MapReduce have been actively studied. Most of previous researches focus on reducing the number of MapReduce jobs for processing join operations in SPARQL queries. However, the cost of shuffle phase still occurs due to their reduce-side joins. In this paper, we propose RDFChain which supports the scalable storage and efficient retrieval of a large volume of RDF data using a combination of MapReduce and HBase which is NoSQL storage system. Since the proposed storage schema of RDFChain reflects all the possible join patterns of queries, it provides a reduced number of storage accesses depending on the join pattern of a query. In addition, the proposed cost-based map-side join of RDFChain reduces the number of map jobs since it processes as many joins as possible in a map job using statistics.
RelClus: Clustering-based Relationship Search , Yanan Zhang,Gong Cheng and Yuzhong Qu
Searching and browsing relationships between entities is an important task in many domains. To support users in interactively exploring a large set of relationships, we present a novel relationship search engine called RelClus, which automatically groups search results into a dynamically generated hierarchy with meaningful labels. This hierarchical clustering of relationships exploits their schematic patterns and a similarity measure based on information theory.
SEJP: Designing Interactive Scientometrics with Linked Data and Semantic Web Reasoning , Grant McKenzie,Krzysztof Janowicz,Yingjie Hu,Kunal Sengupta and Pascal Hitzler
In this demo paper we introduce a Linked Data-driven, Semantically-enabled Journal Portal (SEJP) that offers a variety of interactive scientometrics modules. SEJP allows editors, reviewers, authors, and readers to explore and analyze (meta)data published by a journal. Besides Linked Data created from the journal's internal data, SEJP also links out to other sources and includes them to develop more powerful modules. These modules range from simple descriptive statistics, over the spatial analysis of visitors and authors, to topics trending modules. While SEJP will be available for multiple journals, this paper shows its deployment to the Semantic Web journal by IOS Press. Due to its open & transparent review process, SWJ offers a wide variety of additional information, e.g., about reviewers, editors, paper decisions, and so forth.
SILURIAN: a Sparql vIsuaLizer for UndeRstanding querIes And federatioNs , Simon Castillo,Guillermo Palma and Maria-Esther Vidal
SPARQL federated queries can be affected by both characteristics of the query and datasets in the federation. We present SILURIAN a Sparql vIsuaLizer for UndeRstanding querIes And federatioNs. SILURIAN visualizes SPARQL queries and, thus, it allows the analysis and understanding of a query complexity with respect to relevant endpoints and shapes of the possible plans.
SPACE: SParql index for efficient Auto ComplEtion , Kasjen Kraemer,Renata Dividino and Gerd Gröner
Querying Linked Data means to pose queries on various data sources without information about the data and the schema of the data. This demo shows SPACE, a tool to support autocompletion for SPARQL queries. It takes as input SPARQL query logs and builds an index structure for efficient and fast computation of query suggestions. To demonstrate SPACE, we use available query logs from the USEWOD Data Challenge 2013.
SemantEco Annotator , Patrice Seyed,Timothy Lebo,Evan Patton,Katherine Chastain,Brendan Ashby and Deborah McGuinness
Generating useful RDF linked data is not a straightforward process for scientists using today’s tools. In this paper we introduce the SemantEco Annotator, a semantic web application that leverages community-based vocabularies and ontologies during the translation process itself to ease the process of drawing out implicit relationships in tabular data so that they may be immediately available for use within the LOD cloud. Our goal for the SemantEco Annotator is to make advanced RDF translation techniques available to the layperson.
Semantic Enrichment of Mobile Phone Data Records Using Linked Open Data , Zolzaya Dashdorj and Luciano Serafini
The pervasivity of mobile phones opens an unprecedented opportunityof deepening into the human dynamics through the analysis of the data they generate.This enables a novel human-driven approach to service creation in a wideset of domains such as health-care, transportation and urban safety. The telecomoperators own and manage billions of mobile network events (like the Call DetailedRecords - CDR) per day: the interpretation of such a big stream of dataneeds a deep understanding of the context where the events have occurred. Theexploitation of available background knowledge is a key element in this scenario.In this paper we introduce a novel method for the semantic interpretation of humanbehavior in mobility based on the merge of the mobile network data streamand the geo-referred available background knowledge. We modeled the humanbehavior making use of the geo and time-referenced knowledge available on theweb (e.g., geo-tagged resources, info on weather forecast, social events, etc.)matching it with the mobile network coverage map. The model is intended tocharacterize the contexts where the mobile network events occur in order to helpin interpreting the behavioral traits that generated by them. This will allow us toachieve a set of predictive tasks such as the prediction of human activities in certain contextual conditions (e.g., when an accident occurs on highway before theworking time starts, etc.), or the characterization of exceptional events detectedfrom anomalies in mobile network data.We created an ontological and stochastic high-level representation behavioralmodel (HRBModel) that maps the human activities to the different contexts.Given the mobile phone network and the geo-tagged resource Openstreetmap,the model is used to rank the activities associated to a particular network event(e.g. a sudden call amount peak) according to their probability. We also describethe design of an experimental evaluation and the preliminary evaluation resultsto measure the performance of the model and to improve the activity predictiontask.
Semantic tools for improving software development in open source communities , Gregor Leban
Software development communities use different communication channels such as mailing lists, forums and bug tracking systems. These channels are not integrated which makes finding information difficult and inefficient. As a result of the ALERT project we developed a system that is able to collect and annotate information from various communication channels and store it in a single knowledge base. Using the stored knowledge the system can provide users valuable functionalities such as semantic search, finding potential bug duplicates, custom notifications and issue recommendations.
SexTant: Visualizing Time-Evolving Linked Geospatial Data , Konstantina Bereta,Charalampos Nikolaou,Manos Karpathiotakis,Kostis Kyzirakos and Manolis Koubarakis
The linked open data cloud is constantly evolving as datasets are continuously updated with newer versions. As a result, representing, querying, and visualizing the temporal dimension of linked data is crucial. This is especially important for geospatial datasets that form the backbone of large scale open data publication efforts in many sectors of the economy (the public sector, the Earth observation sector). Although there has been some work on the representation and querying of linked geospatial data that change over time, to the best of our knowledge, there is currently no tool that offers spatio-temporal visualization of such data. In this demo paper we present the system SexTant that addresses this issue. SexTant is a web-based tool that enables the exploration of time-evolving linked geospatial data as well as the creation, sharing, and collaborative editing of "temporally-enriched" thematic maps by combining different sources of geospatial and temporal information.
TRT - A Tripleset Recommendation Tool , Alexander Arturo Mera Caraballo,Bernardo Pereira Nunes,Giseli Rabello Lopes,Luiz André P. Paes Leme,Marco Antonio Casanova and Stefan Dietze
According to the Linked Data principles, a tripleset should be interlinked with others to take advantage of existing knowledge. However, interlinking is a laborious task. Thus, users interlink their triplesets mostly with data hubs, such as DBpedia and Freebase, ignoring the more specic yet often even more promising triplesets. To alleviate this problem, this paper describes a tripleset interlinking recommendation tool based on link prediction techniques and evaluates the tool on a real-world tripleset repository.
The Benefits of Incremental Reasoning in OWL EL , Yevgeny Kazakov and Pavel Klinov
This demo will present the advantages of the new, bookkeeping-free method for incremental reasoning in OWL EL on incremental classification of large ontologies. In particular, we will show how a typical experience of a user editing a large ontology can be improved if the reasoner (or ontology IDE) provides the capability of instantaneously re-classifying the ontology in the background mode when a change is made. In addition, we intend to demonstrate how incremental reasoning helps in other tasks such as answering DL queries and computing explanations of entailments. We will use our OWL EL reasoner ELK and its Protege plug-in as the main tools to highlight these benefits.
The Empirical Robustness of Description Logic Classification , Rafael S. Gonçalves,Nicolas Matentzoglu,Bijan Parsia and Uli Sattler
In spite of the recent renaissance in lightweight description logics (DLs), many prominent DLs, such as that underlying the Web Ontology Language (OWL), have high worst case complexity for their key inference services. Modern reasoners have a large array of optimization, tuned calculi, and implementation tricks that allow them to perform very well in a variety of application scenarios, even though the complexity results ensure that they will perform poorly for some inputs. For users, the key question is how often they will encounter those pathological inputs in practice, that is, how robust are reasoners. We attempt to determine this question for classification of existing ontologies as they are found on the Web. It is a fairly common user task to examine ontologies published on the Web as part of their development process. Thus, the robustness of reasoners in this scenario is both directly interesting and provides some hints toward answering the broader question. From our experiments, we show that the current crop of OWL reasoners, in collaboration, is very robust against the Web.
Towards Semantic Annotations of Web Tables , Stefan Zwicklbauer,Christoph Einsiedler,Michael Granitzer and Christin Seifert
Web tables comprise a rich source of factual information.However, without semantic annotation of the tables’ content the infor-mation is not usable for automatic integration and search. We propose amethodology to annotate table headers with semantic type informationbased on the content of column’s cells. In our experiments on 50 tableswe achieved an F1 value of 0.55, where the accuracy greatly varies de-pending on the used ontology. Regarding computational complexity wefound out that 94% of the maximal F1 score on average 20 cells (37%)need to be considered. Results suggest that the choice of the ontologyplays a more crucial role for type inference than the number of cells used.
Towards the Natural Ontology of Wikipedia , Andrea Giovanni Nuzzolese,Aldo Gangemi,Valentina Presutti and Paolo Ciancarini
In this paper we present preliminary results on the extraction of ORA: the Natural Ontology of Wikipedia. ORA is obtained through an automatic process that analyses the natural language definitions of DBpedia entities provided by their Wikipedia pages. Hence, this ontology reflects the richness of terms used and agreed by the crowds, and can be updated periodically according to the evolution of Wikipedia.
TripleRush , Philip Stutz,Mihaela Verman,Lorenz Fischer and Abraham Bernstein
TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that answer queries over large-scale graph data fast. To that end it leverages a novel, graph-based architecture. Specifically, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched queries are routed in parallel along different paths of this index structure. We show experimentally that TripleRush takes about a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two common benchmarks.
Using Ontologies to Identify Patients with Diabetes in Electronic Health Records , Hairong Yu,Siaw-Teng Liaw,Jane Taggart and Alireza Rahimi
This paper describes a work in progress that explores the applicability of ontology for providing solutions in medical domain. We investigate whether it is feasible to use ontologies and ontology-based data access to automate one of common clinical tasks that are constantly faced by general practitioners but labor intensive and error prone in term of relevant information retrieved from electronic health records. The focus of our study is on improving diabetes patient selection for clinical trials or medical research. The biggest impediment to automating such clinical tasks is the essential requirement of bridging the semantic gaps between existing patient data from electronic health records, such as reasons for visit, chronic conditions and diagnoses from practice notes, pathology tests and prescriptions stored in general practice information systems, and the ways which researchers or general practitioners interpret those records. Our current comprehension is that automation of identifying diabetes patients for clinical or research purposes can be specified systematically as a solution supported by semantic retrieval. We detail the challenges to build a realistic case study, which consists of solving issues related to conceptualization of data and domain context, integration of different datasets, ontology creation based on SNOMED CT-AU® standard, mapping between existing data and ontology, and dilemma of data fitness for research use. Our prototype is based on data which scale to thirteen years of approximate 100,000 anonymous patient records from four general practices in south western Sydney.
XLore: A Large-scale English-Chinese Bilingual Knowledge Graph , Zhigang Wang,Juanzi Li,Zhichun Wang,Shuangjie Li,Mingyang Li,Dongsheng Zhang,Yao Shi,Yongbin Liu and Jie Tang
Current Wikipedia-based multilingual knowledge bases still suffer the following problems: (i) the scarcity of non-English knowledge, (ii) the noise in the semantic relations and (iii) the limited coverage of equivalent cross-lingual entities. In this demo, we present a large-scale bilingual knowledge graph named XLore, which has adequately solved the above problems.

Proceedings of 4th International Workshop on Consuming Linked Data

Bounds: Expressing Reservations about Incoming Data , Martin G. Skjæveland and Audun Stolpe
Change-a-LOD: Does the Schema on the Linked Data Cloud Change or Not? , Renata Dividino,Ansgar Scherp,Gerd Gröner and Thomas Grotton
Choosing Between Graph Databases and RDF Engines for Consuming and Mining Linked Data , Domingo De Abreu,Alejandro Flores,Guillermo Palma,Valeria Pestana,Jose Pinero,Jonathan Queipo,Jose Sanchez and Maria-Esther Vidal
Consuming Linked data in Supply Chains: Enabling data visibility via Linked Pedigrees , Monika Solanki and Christopher Brewster
Content-Preserving Graphics , Timothy Lebo,Alvaro Graves and Deborah McGuinness
Including Co-referent URIs in a SPARQL Query , Christian Y. A. Brenninkmeijer,Carole Goble,Alasdair J. G. Gray,Paul Groth,Antonis Loizou and Steve Pettifer
Linked Data Platform as a novel approach for Enterprise Application Integration , Nandana Mihindukulasooriya,Raúl García Castro and Miguel Esteban Gutiérrez
Linked Data for Financial Reporting , Masatomo Goto,Bo Hu,Aisha Naseer and Pierre-Yves Vandenbussche
Natural Language Query Translation into SPARQL using Patterns , Camille Pradel,Ollivier Haemmerlé and Nathalie Hernandez
On-the-fly Integration of Static and Dynamic Linked Data , Andreas Harth,Craig Knoblock,Steffen Stadtmüller,Rudi Studer and Pedro Szekely
Pleasantly Consuming Linked Data with RDF Data Descriptions , Michael Schmidt and Georg Lausen
Rights declaration in Linked Data , Víctor Rodríguez-Doncel,Asunción Gómez-Pérez and Nandana Mihindukulasooriya
Self-Sustaining Platforms: a Semantic Workflow Engine , Sam Coppens,Ruben Verborgh,Erik Mannens and Rik Van de Walle
Towards an RDF Analytics Language: Learning from Successful Experiences , Fadi Maali and Stefan Decker

Proceedings of CrowdSem: Crowdsourcing the Semantic Web

A Role for Provenance in Social Computation , Milan Markovic,Peter Edwards and David Corsar
Content and Behaviour Based Metrics for Crowd Truth , Guillermo Soberon,Lora Aroyo,Chris Welty,Oana Inel,Manfred Overmeen and Hui Lin
Crowdsourced Entity Markup , Lili Jiang,Yafang Wang,Johannes Hoffart and Gerhard Weikum
Crowdsourced Semantics with Semantic Tagging: Don’t just tag it, LexiTag it! , Csaba Veres
Developing Crowdsourced Ontology Engineering Tasks: An iterative process , Jonathan Mortensen,Mark Musen and Natasha F. Noy
Dr. Detective: combining gamification techniques and crowdsourcing to create a gold standard in medical text , Anca Dumitrache,Lora Aroyo,Chris Welty,Robert-Jan Sips and Anthony Levas
Frame Semantics Annotation Made Easy with DBpedia , Marco Fossati,Sara Tonelli and Claudio Giuliano
Information Reputation , Peter Davis and Salman Haq
SLUA: Towards Semantic Linking of Users with Actions in Crowdsourcing , Umair Ul Hassan,Sean O'Riain and Edward Curry

Proceedings of Dbpedia & NLP 2013

Dataset Description Paper

DBpediaNYD A Silver Standard Benchmark Dataset for Semantic Relatedness in Dbpedia , Heiko Paulheim

Paper

A Rule-Based Relation Extraction System using DBpedia and Syntactic Parsing , Kamel Nebhi
A lemon lexicon for DBpedia , Christina Unger,John Mccrae,Sebastian Walter,Sara Winter and Philipp Cimiano
Argumentation-based Inconsistencies Detection for Question-Answering over DBpedia , Elena Cabrio,Julien Cojan,Serena Villata and Fabien Gandon
Datasets and GATE Evaluation Framework for Benchmarking Wikipedia-Based NER Systems , Milan Dojchinovski and Tomáš Kliegr
Extending the Coverage of DBpedia Properties using Distant Supervision over Wikipedia , Alessio Palmero Aprosio,Claudio Giuliano and Alberto Lavelli
From Strings to Things SAR-Graphs: A New Type of Resource for Connecting Knowledge and Language , Hans Uszkoreit and Feiyu Xu
Integrating Open and Closed Information Extraction: Challenges and First Steps , Arnab Dutta,Christian Meilicke,Mathias Niepert and Simone Ponzetto
Statistical Analyses of Named Entity Disambiguation Benchmarks , Nadine Steinmetz,Magnus Knuth and Harald Sack
Using BabelNet in Bridging the Gap Between Natural Language Queries and Linked Data Concepts , Khadija Elbedweihy,Stuart Wrigley and Fabio Ciravegna

Position Paper

Extending DBpedia with Wikipedia List Pages , Heiko Paulheim and Simone Paolo Ponzetto

Proceedings of 3rd International Workshop on Detection, Representation, and Exploitation of Events in the Semantic Web

Demo

FRED as an Event Extraction Tool , Aldo Gangemi,Ehab Hassan,Valentina Presutti and Diego Reforgiato Recupero

Paper

A case study on automated risk assessment of ships using newspaper-based event extraction , Jesper Hoeksema and Willem Robert Van Hage
Domain-Independent Quality Measures for Crowd Truth Disagreement , Oana Inel,Lora Aroyo,Chris Welty and Robert-Jan Sips
Extractivism: Extracting activist events from news articles using existing NLP tools and services , Thomas Ploeger,Maxine Kruijt,Lora Aroyo,Frank De Bakker,Iina Hellsten and Antske Fokkens
Representing Supply Chain Events on the Web of Data , Monika Solanki and Christopher Brewster

Proceedings of LD4IE - Linked Data for Information Extraction

Full Syntactic Parsing for Enrichment of RDF dataset , Michel Gagnon,Caroline Barrière and Eric Charton
Linked Data for Cross-disciplinary Collaboration Cohort Discovery , Trina Myers,Jarrod Trevathan,Dianna Madden and Tristan O'Neilla
Named Entity Disambiguation using Freebase and Syntactic Parsing , Kamel Nebhi
Triplifying Wikipedia's Tables , Emir Muñoz,Aidan Hogan and Alessandra Mileo
Which of the following SPARQL Queries are Similar? Why? , Renata Dividino and Gerd Gröner

Proceedings of 3rd International Workshop on Linked Science 2013

A Checklist-Based Approach for Quality Assessment of Scientific Information , Jun Zhao,Graham Klyne,Matthew Gamble and Carole Goble
A Semantic Lab Notebook – Report on a Use Case Modelling an Experiment of a Microwave-based Quarantine Method , Nico Adams,Armin Haller,Alexander Krumpholz and Kerry Taylor
BiographyNet: Managing Provenance at multiple levels and from different perspectives , Niels Ockeloen,Antske Fokkens,Serge Ter Braake and Piek Vossen
Building Exceutable Biological Pathway Models Automatically from BioPAX , Timo Willemsen,Anton Feenstra and Paul Groth
Capturing intent and rationale for Linked Science: design patterns as a resource for linked laboratory experiments , Cameron Mclean,Mark Gahegan and Fabiana Kubke
Exploiting Semantics from Ontologies and Shared Annotations to Find Patterns in Annotated Linked Open Data , Guillermo Palma,Maria-Esther Vidal,Louiqa Raschid and Andreas Thor
Using Semantic Web Technologies to Reproduce a Pharmacovigilance Case Study , Michiel Hildebrand,Rinke Hoekstra and Jacco van Ossenbruggen

Proceedings of 8th International Workshop on Ontology Matching

Complex correspondences for query patterns rewriting , Pascal Gillet,Cássia Trojahn,Ollivier Haemmerlé and Camille Pradel
IncMap: pay as you go matching of relational schemata to OWL ontologies , Christoph Pinkel,Carsten Binnig,Evgeny Kharlamov and Peter Haase
Rapid execution of weighted edit distances , Tommaso Soru and Axel-Cyrille Ngonga Ngomo
To repair or not to repair: reconciling correctness and coherence in ontology reference alignments , Catia Pesquita,Daniel Faria,Emanuel Santos and Francisco M. Couto
Unsupervised learning of link specifications: deterministic vs. non-deterministic , Axel-Cyrille Ngonga Ngomo and Klaus Lyko

Proceedings of 2nd International Workshop on Ordering and Reasoning

Event Object Boundaries in RDF Streams: A Position Paper , Robin Keskisärkkä and Eva Blomqvist
Exploiting Stream Reasoning to Monitor multi-Cloud Applications , Marco Miglierina,Marco Balduini,Narges Shahmandi Hoonejani,Elisabetta Di Nitto and Danilo Ardagna
Extending SPARQL with Qualitative Preferences , Marina Gueroussova,Axel Polleres and Sheila Mcilraith
Order Theoretical Semantic Recommendation , Cliff Joslyn,Emilie Hogan,Patrick Paulson,Elena Peterson,Eric Stephan and Dennis Thomas
SLUBM: An extented LUBM Benchmark for Stream Reasoning , Tu Ngoc Nguyen and Wolf Siberski
SPARQL Update under RDFS Entailment in Fully Materialized and Redundancy-Free Triple Stores , Axel Polleres and Albin Ahmeti

Proceedings of Society, Privacy and the Semantic Web - Policy and Technology

A case for transparency , Neel Guha
Energy efficient sensing for managing privacy on smartphones , Prajit Das,Anupam Joshi and Tim Finin
Graße—Towards Flexible Search on Encrypted Graph Data , Andreas Kasten,Ansgar Scherp,Frederik Armknecht and Matthias Krause
Legibility, Privacy and Creativity: Linked Data in a Surveillance Society , Christopher Brewster and Dougald Hine
Personal Privacy and the Web of Linked Data , David Corsar,Peter Edwards and John Nelson
Semantic Web Technologies for Social Translucence and Privacy Mirrors on the Web , Mathieu D'Aquin and Keerthi Thomas
Towards a Configurable Framework for Iterative Signing of Distributed Graph Data , Andreas Kasten and Ansgar Scherp

Proceedings of 1st International Workshop on Semantic Statistics

Australian Bureau of Statistics Implementation of Semantic Web Technology , Michael Mecham and Arupa Sarkar
Design and generation of Linked Clinical Data Cubes , Laurent Lefort and Hugo Leroux
Detecting and Reporting Extensional Concept Drift in Statistical Linked Data , Albert Meroño-Peñuela,Rinke Hoekstra,Christophe Guéret and Stefan Schlobach
Discovering Related Data Sources in Data-Portals , Andreas Wagner,Peter Haase and Achim Rettinger
OLAP Manipulations on RDF Data following a Constellation Model , Rafik Saad,Olivier Teste and Cassia Trojahn
Towards Easy Matching Between Statistical Linked Data: Dimension Patterns , Hideto Sato and Wen Wen
Towards Linked Statistical Data Analysis , Sarven Capadisli,Sören Auer and Riedl Reinhard
Towards a Vocabulary for Incorporating Predictive Models into the Linked Data Web , Evangelos Kalampokis,Areti Karamanou,Efthimios Tambouris and Konstantinos Tarabanis
Towards the Discovery of Person-Level Data - Reuse of Vocabularies and Related Use Cases , Thomas Bosch,Benjamin Zapilko,Joachim Wackerow and Arofan Gregory
XKOS: Extending SKOS for Describing Statistical Classifications , Dan Gillman,Franck Cotton and Yves Jaques

Proceedings of 1st International Workshop on Semantic Music and Media

A knowledge-based approach to computational analysis of melody in Indian art music , Gopala Krishna Koduri and Xavier Serra
Mapping, Interlinking and Exposing MusicBrainz as Linked Data , Peter Haase
Semantic Metadata for Music Production Projects , Thomas Wilmering,György Fazekas and Mark B. Sandler

Proceedings of Semantic Machine Learning and Linked Open Data Application

AgroKnowledgeBase (AKB) for plant diseases: Poppy plant use case , Andrew Terhorst and Ahsan Morshed
Austrian Environmental Data Consumption – A Mashup-based Approach , Peter Wetz,Tuan Dat Trinh,Ba Lam Do,Amin Anjomshoaa and A Min Tjoa
Early warning system for coffee rust disease based on error correcting output codes: a proposal , David Camilo Corrales,Andrés Peña,Juan Carlos Corrales,Carlos Leon and Apolinar Figueroa
GIS-based Ontology on Organic Agriculture (On-going research) , Rochelle Dacuycuy-Pacio
Learning Classifiers from Semantic Sensor Data with Application to Soil Drainage Classification , Harris Lin,Neeraj Koul and Vasant Honavar
Linked Open Data and its applications for International development: The case study at International Food Policy Research Institute , Soonho Kim,Luz Marina Alvare,Nilam Prasai and Indira Yerramareddy
Linked Open Robot Data , Claire D'Este,Ritaban Dutta,Ahsan Morshed and Cornelius Kloppers
Opportunities created for agricultural and environmental informatics through whole-of-economy federated sensing , Greg Timms,Amanda Castray,Ritaban Dutta,Richard Rawnsley and David Henry
Semantic Web Enabled Smart Farming , Raj Gaire,Laurent Lefort,Michael Compton,Gregory Falzon,David Lamb and Kerry Taylor
The role of vocabularies for estimating carbon footprint for food recipies using Linked Open Data , Ahsan Morshed and Fabrizio Celli

Proceedings of 6th International Workshop on Semantic Sensor Networks

Demo

tablet-based visualisation of transport data in Madrid using SPARQL-Stream , Jean-Paul Calbimonte,Alejandro Fernández-Carrera and Oscar Corcho

Paper

An Ontology Framework for Water Quality Management , Lule Ahmedi,Edmond Jajaga and Figene Ahmedi
An explicit OWL representation of ISO/OGC Observations and Measurements , Simon Cox
Event dashboard: Capturing user-defined semantics events for event detection over real-time sensor data , Jonathan Yu and Kerry Taylor
From RESTful to SPARQL: A Case Study on Generating Semantic Sensor Data , Heiko Mueller,Liliana Cabral,Ahsan Morshed and Yanfeng Shu

Short Paper

Assessing the Quality of Semantic Sensor Data , Chris Baillie,Peter Edwards,Edoardo Pignotti and David Corsar
Citizen Sensing within a Real Time Passenger Information System , David Corsar,Peter Edwards,Chris Baillie,Milan Markovic,Konstantinos Papangelis and John Nelson

Proceedings of 9th International Workshop on Scalable Semantic Web Knowledge Base Systems

A Distributed Directory System , Fausto Giunchiglia and Alethia Hume
Count Aggregation in Semantic Queries , Bogdan Kostov and Petr Křemen
DistEL: A Distributed EL+ Ontology Classifier , Raghava Mutharaju,Pascal Hitzler and Prabhaker Mateti
Eviction Strategies for Semantic Flow Processing , Khoa Nguyen,Thomas Scharrenbach and Abraham Bernstein
Rule-based Reasoning on Massively Parallel Hardware , Martin Peters,Christopher Brink,Sabine Sachweh and Albert Zündorf
Scalable Linked Data Stream Processing via Network-Aware Workload Scheduling , Lorenz Fischer,Thomas Scharrenbach and Abraham Bernstein
TripleRush: A Fast and Scalable Triple Store , Philip Stutz,Mihaela Verman,Lorenz Fischer and Abraham Bernstein

Proceedings of 9th International Workshop on Uncertainty Reasoning for the Semantic Web

A GUI for MLN , Estevão Aguiar,Marcelo Ladeira,Rommel Carvalho and Shou Matsumoto
Handling uncertainty in semantic information retrieval process , Mounira Chkiwa,Anis Jedidi and Faiez Gargouri
Information Integration with Provenance on the Semantic Web via Probabilistic Datalog+/- , Thomas Lukasiewicz and Livia Predoiu
Reliability Analyses of Open Government Data , Davide Ceolin,Luc Moreau,Kieron O'Hara,Guus Schreiber,Alistair Sackley,Wan Fokkink,Willem Robert Van Hage and Nigel Shadbolt
Towards Vagueness-Aware Ontologies , Panos Alexopoulos,Boris Villazón-Terrazas and Jeff Pan
UMP-ST plug-in: a tool for documenting, maintaing, and evolving probabilistic ontologies , Rommel Carvalho,Marcelo Ladeira,Rafael Mezzomo de Souza,Shou Matsumoto,Henrique Da Rocha and Gilson Libório Mendes

Proceedings of WaSABi: Workshop on Semantic Web Enterprise Adoption and Best Practice

Adaptive Semantic Publishing , Borislav Popov,Georgi Georgiev and Petya Osenova
Boosting RDF Adoption in Ruby with Goo , Manuel Salvadores,Paul Alexander,Ray Fergerson,Natasha F. Noy and Mark Musen
On Managing Prefixes of LOD Vocabularies , Ghislain Auguste Atemezing,Bernard Vatant,Pierre-Yves Vanderbussche and Raphael Troncy
Towards Linked Data based Enterprise Information Integration , Philipp Frischmuth,Sören Auer,Sebastian Tramp,Jörg Unbehauen,Kai Holzweißig and Carl-Martin Marquardt

Proceedings of 4th Workshop on Ontology Patterns

Paper

Detecting Good Practices and Pitfalls when Publishing Vocabularies on the Web , María Poveda-Villalón,Bernard Vatant,Mari Carmen Suárez-Figueroa and Asunción Gómez-Pérez
Event Processing in RDF , Mikko Rinne,Eva Blomqvist,Robin Keskisärkkä and Esko Nuutila
Ontology Patterns: Clarifying Concepts and Terminology , Ricardo Falbo,Giancarlo Guizzardi,Aldo Gangemi and Valentina Presutti
Reasoning Performance Indicators for Ontology Design Patterns , Karl Hammar
Statistical Knowledge Patterns for Characterising Linked Data , Eva Blomqvist,Ziqi Zhang,Anna Lisa Gentile,Isabelle Augenstein and Fabio Ciravegna

Pattern

Abstracting Transport to an Ontology Design Pattern for the Geosciences , Brandon Whitehead,Benjamin Adams,Mark Schildhauer,Charles Vardeman,Werner Kuhn,Adam Shepherd and Krishna Sinha
The Event Processing ODP , Eva Blomqvist and Mikko Rinne
The Object with States Ontology Design Pattern , Raúl García-Castro and Asunción Gómez-Pérez

Short Paper

License Linked Data Resources Pattern , Víctor Rodríguez-Doncel,Mari Carmen Suárez-Figueroa,Asunción Gómez-Pérez and María Poveda-Villalón
Terminology-Based Patterns for Natural Language Definitions in Ontologies , Dagmar Gromann

Short Paper and Pattern

Diagrammatic Ontology Patterns , Gem Stapleton,John Howse,Kerry Taylor,Aidan Delaney,Jim Burton and Peter Chapman