-
A Formal Semantics for Weighted Ontology Mappings
,
Manuel Atencia,Alexander Borgida,Jérôme Euzenat,Chiara Ghidini and Luciano Serafini
,
17-33
,
[OpenAccess]
,
[Publisher]
Ontology mappings are often assigned a weight or confidence factor by matchers. Nonetheless, few semantic accounts have been given so far for such weights. This paper presents a formal semantics for weighted mappings between different ontologies. It is based on a classificational interpretation of mappings: if O1 and O2 are two ontologies used to classify a common set X , then mappings between O1 and O2 are interpreted to encode how elements of X classified in the concepts of O1 are re-classified in the concepts of O2, and weights are interpreted to measure how precise and complete re-classifications are. This semantics is justifiable by extensional practice of ontology matching. It is a conservative extension of a semantics of crisp mappings. The paper also includes properties that relate mapping entailment with description logic constructors.
-
A Machine Learning Approach for Instance Matching Based on Similarity Metrics
,
Shu Rong,Xing Niu,Evan Wei Xiang,Haofen Wang,Qiang Yang and Yong Yu
,
460-475
,
[OpenAccess]
,
[Publisher]
The Linking Open Data (LOD) project is an ongoing effort to construct a global data space, i.e. the Web of Data. One important part of this project is to establish owl:sameAs links among structured data sources. Such links indicate equivalent instances that refer to the same real-world object. The problem of discovering owl:sameAs links between pairwise data sources is called instance matching. Most of the existing approaches addressing this problem rely on the quality of prior schema matching, which is not always good enough in the LOD scenario. In this paper, we propose a schema-independent instance-pair similarity metric based on several general descriptive features. We transform the instance matching problem to the binary classification problem and solve it by machine learning algorithms. Furthermore, we employ some transfer learning methods to utilize the existing owl:sameAs links in LOD to reduce the demand for labeled data. We carry out experiments on some datasets of OAEI2010. The results show that our method performs well on real-world LOD data and outperforms the participants of OAEI2010.
-
An Efficient Bit Vector Approach to Semantics-based Machine Perception in Resource-Constrained Devices
,
Cory A. Henson,Krishnaprasad Thirunarayan and Amit P. Sheth
,
149-164
,
[OpenAccess]
,
[Publisher]
The primary challenge of machine perception is to define efficient computational methods to derive high-level knowledge from low-level sensor observation data. Emerging solutions are using ontologies for expressive representation of concepts in the domain of sensing and perception, which enable advanced integration and interpretation of heterogeneous sensor data. The computational complexity of OWL, however, seriously limits its applicability and use within resource-constrained environments, such as mobile devices. To overcome this issue, we employ OWL to formally define the inference tasks needed for machine perception - explanation and discrimination - and then provide efficient algorithms for these tasks, using bit-vector encodings and operations. The applicability of our approach to machine perception is evaluated on a smart-phone mobile device, demonstrating dramatic improvements in both efficiency and scale.
-
An Evidence-based Verification Approach to Extract Entities and Relations for Knowledge Base Population
,
Naimdjon Takhirov,Fabien Duchateau and Trond Aalberg
,
575-590
,
[OpenAccess]
,
[Publisher]
This paper presents an approach to automatically extract entities and relationships from textual documents. The main goal is to populate a knowledge base that hosts this structured information about domain entities. The extracted entities and their expected relationships are verified using two evidence based techniques: classification and linking. This last process also enables the linking of our knowledge base to other sources which are part of the Linked Open Data cloud. We demonstrate the benefit of our approach through series of experiments with real-world datasets.
-
Automatic typing of DBpedia entities
,
Aldo Gangemi,Andrea Giovanni Nuzzolese,Valentina Presutti,Francesco Draicchio,Alberto Musetti and Paolo Ciancarini
,
65-81
,
[OpenAccess]
,
[Publisher]
We present Tipalo, an algorithm and tool for automatically typing DBpedia entities. Tipalo identifies the most appropriate types for an entity by interpreting its natural language definition, which is extracted from its corresponding Wikipedia page abstract. Types are identified by means of a set of heuristics based on graph patterns, disambiguated to WordNet, and aligned to two top-level ontologies: WordNet supersenses and a subset of DOLCE+DnS Ultra Lite classes. The algorithm has been tuned against a golden standard that has been built online by a group of selected users, and further evaluated in a user study.
-
Blank Node Matching and RDF/S Comparison Functions
,
Yannis Tzitzikas,Christina Lantzaki and Dimitris Zeginis
,
591-607
,
[OpenAccess]
,
[Publisher]
In RDF, a blank node (or anonymous resource or bnode) is a node in an RDF graph which is not identified by a URI and is not a literal. Several RDF/S Knowledge Bases (KBs) rely heavily on blank nodes as they are convenient for representing complex attributes or resources whose identity is unknown but their attributes (either literals or associations with other resources) are known. In this paper we show how we can exploit blank nodes anonymity in order to reduce the delta (diff) size when comparing such KBs. The main idea of the proposed method is to build a mapping between the bnodes of the compared KBs for reducing the delta size. We prove that finding the optimal mapping is NP-Hard in the general case, and polynomial in case there are not directly connected bnodes. Subsequently we present various polynomial algorithms returning approximate solutions for the general case. For making the application of our method feasible also to large KBs we present a signature-based mapping algorithm with n logn complexity. Finally, we report experimental results over real and synthetic datasets that demonstrate significant reductions in the sizes of the computed deltas. For the proposed algorithms we also provide comparative results regarding delta reduction, equivalence detection and time efficiency.
-
Collaborative Filtering by Analyzing Dynamic User Interests Modeled by Taxonomy
,
Makoto Nakatsuji,Yasuhiro Fujiwara,Toshio Uchiyama and Hiroyuki Toda
,
361-377
,
[OpenAccess]
,
[Publisher]
Tracking user interests over time is important for making accurate recommendations. However, the widely-used time-decay-based approach worsens the sparsity problem because it deemphasizes old item transactions. We introduce two ideas to solve the sparsity problem. First, we divide the users' transactions into epochs i.e. time periods, and identify epochs that are dominated by interests similar to the current interests of the active user. Thus, it can eliminate dissimilar transactions while making use of similar transactions that exist in prior epochs. Second, we use a taxonomy of items to model user item transactions in each epoch. This well captures the interests of users in each epoch even if there are few transactions. It suits the situations in which the items transacted by users dynamically change over time; the semantics behind classes do not change so often while individual items often appear and disappear. Fortunately, many taxonomies are now available on the web because of the spread of the Linked Open Data vision. We can now use those to understand dynamic user interests semantically. We evaluate our method using a dataset, a music listening history, extracted from users' tweets and one containing a restaurant visit history gathered from a gourmet guide site. The results show that our method predicts user interests much more accurately than the previous time-decay-based method.
-
Concept-Based Semantic Difference in Expressive Description Logics
,
Rafael S. Gonçalves,Bijan Parsia and Ulrike Sattler
,
99-115
,
[OpenAccess]
,
[Publisher]
Detecting, much less understanding, the difference between two description logic based ontologies is challenging for ontology engineers due, in part, to the possibility of complex, non-local logic effects of axiom changes. First, it is often quite difficult to even determine which concepts have had their meaning altered by a change. Second, once a concept change is pinpointed, the problem of distinguishing whether the concept is directly or indirectly affected by a change has yet to be tackled. To address the first issue, various principled notions of ``semantic diff'' (based on deductive inseparability) have been proposed in the literature and shown to be computationally practical for the expressively restricted case of ELHr-terminologies. However, problems arise even for such limited logics as ALC: First, computation gets more difficult, becoming undecidable for logics such as SROIQ which underly the Web Ontology Language (OWL). Second, the presence of negation and disjunction make the standard semantic difference too sensitive to change: essentially, any logically effectual change always affects all terms in the ontology. In order to tackle these issues, we formulate the central notion of finding the minimal change set based on model inseparability, and present a method to differentiate changes which are specific to (thus directly affect) particular concept names. Subsequently we devise a series of computable approximations, and compare the variously approximated change sets over a series of versions of the NCI Thesaurus (NCIt).
-
Cost based Query Ordering over OWL Ontologies
,
Ilianna Kollia and Birte Glimm
,
231-246
,
[OpenAccess]
,
[Publisher]
The paper presents an approach for cost-based query planning for SPARQL queries issued over an OWL ontology using the OWL Direct Semantics entailment regime of SPARQL 1.1. The costs are based on information about the instances of classes and properties that are extracted from a model abstraction built by an OWL reasoner. A static and a dynamic algorithm are presented which use these costs to find optimal or near optimal execution orders for the atoms of a query. For the dynamic case, we improve the performance by exploiting an individual clustering approach that allows for computing the cost functions based on one individual sample from a cluster. Our experimental study shows that the static ordering usually outperforms the dynamic one when accurate statistics are available. This changes, however, when the statistics are less accurate, e.g., due to non-deterministic reasoning decisions.
-
CrowdMAP: Crowdsourcing Ontology Alignment with Microtasks
,
Cristina Sarasua,Elena Simperl and Natalya Fridman Noy
,
525-541
,
[OpenAccess]
,
[Publisher]
The last decade of research in ontology alignment has brought a variety of computational techniques to discover correspondences between ontologies. While the accuracy of automatic approaches has continuously improved, human contributions remain a key ingredient of the process: this input serves as a valuable source of domain knowledge that is used to train the algorithms and to validate and augment automatically computed alignments. In this paper, we introduce CROWDMAP, a model to acquire such human contributions via microtask crowdsourcing. For a given pair of ontologies, CROWDMAP translates the alignment problem into microtasks that address individual alignment questions, publishes the microtasks on an online labor market, and evaluates the quality of the results obtained from the crowd. We evaluated the current implementation of CROWDMAP in a series of experiments using ontologies and reference alignments from the Ontology Alignment Evaluation Initiative and the crowdsourcing platform CrowdFlower. The experiments clearly demonstrated that the overall approach is feasible, and can improve the accuracy of existing ontology alignment solutions in a fast, scalable, and cost-effective manner.
-
DeFacto - Deep Fact Validation
,
Jens Lehmann,Daniel Gerber,Mohamed Morsey and Axel-Cyrille Ngonga Ngomo
,
312-327
,
[OpenAccess]
,
[Publisher]
One of the main tasks when creating and maintaining knowledge bases is to validate facts and provide sources for them in order to ensure correctness and traceability of the provided knowledge. So far, this task is often addressed by human curators in a three-step process: issuing appropriate keyword queries for the statement to check using standard search engines, retrieving potentially relevant documents and screening those documents for relevant content. The drawbacks of this process are manifold. Most importantly, it is very time-consuming as the experts have to carry out several search processes and must often read several documents. In this article, we present DeFacto (Deep Fact Validation) - an algorithm for validating facts by finding trustworthy sources for it on the Web. DeFacto aims to provide an effective way of validating facts by supplying the user with relevant excerpts of webpages as well as useful additional information including a score for the confidence DeFacto has in the correctness of the input fact.
-
Discovering Concept Coverings in Ontologies of Linked Data Sources
,
Rahul Parundekar,Craig A. Knoblock and José Luis Ambite
,
427-443
,
[OpenAccess]
,
[Publisher]
Despite the increase in the number of linked instances in the Linked Data Cloud in recent times, the absence of links at the concept level has resulted in heterogenous schemas, challenging the interoperability goal of the Semantic Web. In this paper, we address this problem by finding alignments between concepts from multiple Linked Data sources. Instead of only considering the existing concepts present in each ontology, we hypothesize new composite concepts defined as disjunctions of conjunctions of (RDF) types and value restrictions, which we call restriction classes, and generate alignments between these composite concepts. This extended concept language enables us to find more complete definitions and to even align sources that have rudimentary ontologies, such as those that are simple renderings of relational databases. Our concept alignment approach is based on analyzing the extensions of these concepts and their linked instances. Having explored the alignment of conjunctive concepts in our previous work, in this paper, we focus on concept coverings (disjunctions of restriction classes). We present an evaluation of this new algorithm to Geospatial, Biological Classification, and Genetics domains. The resulting alignments are useful for refining existing ontologies and determining the alignments between concepts in the ontologies, thus increasing the interoperability in the Linked Open Data Cloud.
-
Domain-aware Ontology Matching
,
Kristian Slabbekoorn,Laura Hollink and Geert-Jan Houben
,
542-558
,
[OpenAccess]
,
[Publisher]
The inherent heterogeneity of datasets on the Semantic Web has created a need to interlink them, and several tools have emerged that automate this task. In this paper we are interested in what happens if we enrich these matching tools with knowledge of the domain of the ontologies. We explore how to express the notion of a domain in terms usable for an ontology matching tool, and we examine various methods to decide what constitutes the domain of a given dataset. We show how we can use this in a matching tool, and study the effect of domain knowledge on the quality of the alignment. We perform evaluations for two scenarios: Last.fm artists and UMLS medical terms. To quantify the added value of domain knowledge, we compare our domain-aware matching approach to a standard approach based on a manually created reference alignment. The results indicate that the proposed domain-aware approach indeed outperforms the standard approach, with a large effect on ambiguous concepts but a much smaller effect on unambiguous concepts.
-
Efficient Execution of Top-K SPARQL Queries
,
Sara Magliacane,Alessandro Bozzon and Emanuele Della Valle
,
344-360
,
[OpenAccess]
,
[Publisher]
Top-k queries, i.e. queries returning the top k results ordered by a user-defined scoring function, are an important category of queries. Order is an important property of data that can be exploited to speed up query processing. State-of-the-art SPARQL engines underuse order, and top-k queries are mostly managed with a materialize-then-sort processing scheme that computes all the matching solutions (e.g. thousands) even if only a limited number k (e.g. ten) are requested. The SPARQL-RANK algebra is an extended SPARQL algebra that treats order as a first class citizen, enabling efficient split-and-interleave processing schemes that can be adopted to improve the performance of top-k SPARQL queries. In this paper we propose an incremental execution model for SPARQL-RANK queries, we compare the performance of alternative physical operators, and we propose a rank-aware join algorithm optimized for native RDF stores. Experiments conducted with an open source implementation of a SPARQL-RANK query engine based on ARQ show that the evaluation of top-k queries can be sped up by orders of magnitude.
-
Feature LDA: a Supervised Topic Model for Automatic Detection of Web API Documentations from the Web
,
Chenghua Lin,Yulan He,Carlos Pedrinaci and John Domingue
,
328-343
,
[OpenAccess]
,
[Publisher]
Web APIs have gained increasing popularity in recent Web service technology development owing to its simplicity of technology stack and the proliferation of mashups. However, efficiently discovering Web APIs and the relevant documentations on the Web is still a challenging task even with the best resources available on the Web. In this paper we cast the problem of detecting the Web API documentations as a text classification problem of classifying a given Web page as Web API associated or not. We propose a supervised generative topic model called feature latent Dirichlet allocation (feaLDA) which offers a generic probabilistic framework for automatic detection of Web APIs. feaLDA not only captures the correspondence between data and the associated class labels, but also provides a mechanism for incorporating side information such as labeled features automatically learned from data that can effectively help improving classification performance. Extensive experiments on our Web APIs documentation dataset shows that the feaLDA model outperforms three strong supervised baselines including naive Bayes, support vector machines, and the maximum entropy model, by over 3% in classification accuracy. In addition, feaLDA also gives superior performance when compared against other existing supervised topic models.
-
Formal Verification of Data Provenance Records
,
Szymon Klarman,Stefan Schlobach and Luciano Serafini
,
215-230
,
[OpenAccess]
,
[Publisher]
Data provenance is the history of derivation of a data artifact from its original sources. As the real-life provenance records can likely cover thousands of data items and derivation steps, one of the pressing challenges becomes development of formal frameworks for their automated verification. In this paper, we consider data expressed in standard Semantic Web ontology languages, such as OWL, and define a novel verification formalism called provenance specification logic, building on dynamic logic. We validate our proposal by modeling the test queries presented in The First Provenance Challenge, and conclude that the logic core of such queries can be successfully captured in our formalism.
-
Hitting the Sweetspot: Economic Rewriting of Knowledge Bases
,
Nadeschda Nikitina and Birte Glimm
,
394-409
,
[OpenAccess]
,
[Publisher]
Three conflicting requirements arise in the context of knowledge base (KB) extraction: the size of the extracted KB, the size of the corresponding signature and the syntactic similarity of the extracted KB with the original one. Minimal module extraction and uniform interpolation assign an absolute priority to one of these requirements, thereby limiting the possibilities to influence the other two. We propose a novel technique for EL that does not require such an extreme prioritization. We propose a tractable rewriting approach and empirically compare the technique with existing approaches with encouraging results.
-
Hybrid SPARQL queries: fresh vs. fast results
,
Jürgen Umbrich,Marcel Karnstedt,Aidan Hogan and Josiane Xavier Parreira
,
608-624
,
[OpenAccess]
,
[Publisher]
For Linked Data query engines, there are inherent trade-offs between centralised approaches that can efficiently answer queries over data cached from parts of the Web, and live decentralised approaches that can provide fresher results over the entire Web at the cost of slower response times. Herein, we propose a hybrid query execution approach that returns fresher results from a broader range of sources vs. the centralised scenario, while speeding up results vs. the live scenario. We first compare results from two public SPARQL stores against current versions of the Linked Data sources they cache; results are often missing or out-of-date. We thus propose using coherence estimates to split a query into a sub-query for which the cached data have good fresh coverage, and a sub-query that should instead be run live. Finally, we evaluate different hybrid query plans and split positions in a real-world setup. Our results show that hybrid query execution can improve freshness vs. fully cached results while reducing the time taken vs. fully live execution.
-
Instance-Based Matching of Large Ontologies Using Locality-Sensitive Hashing
,
Songyun Duan,Achille Fokoue,Oktie Hassanzadeh,Anastasios Kementsietsidis,Kavitha Srinivas and Michael J. Ward
,
49-64
,
[OpenAccess]
,
[Publisher]
In this paper, we describe a mechanism for ontology alignment using instance based matching of types (or classes). Instance-based matching is known to be a useful technique for matching ontologies that have different names and different structures. A key problem in instance matching of types, however, is scaling the matching algorithm to (a) handle types with a large number of instances, and (b) efficiently match a large number of type pairs. We propose the use of state-of-the art locality-sensitive hashing (LSH) techniques to vastly improve the scalability of instance matching across multiple types. We show the feasibility of our approach with DBpedia and Freebase, two different type systems with hundreds and thousands of types, respectively. We describe how these techniques can be used to estimate containment or equivalence relations between two type systems, and we compare two different LSH techniques for computing instance similarity.
-
Large-Scale Learning of Relation-Extraction Rules with Distant Supervision from the Web
,
Sebastian Krause,Hong Li,Hans Uszkoreit and Feiyu Xu
,
263-278
,
[OpenAccess]
,
[Publisher]
We present a large-scale relation extraction (RE) system which learns grammar-based RE rules from the Web by utilizing large numbers of relation instances as seed. Our goal is to obtain rule sets large enough to cover the actual range of linguistic variation, thus tackling the long-tail problem of real-world applications. A variant of distant supervision learns several relations in parallel, enabling a new method of rule filtering. The system detects both binary and n-ary relations. We target 39 relations from Freebase, for which 3M sentences extracted from 20M web pages serve as the basis for learning an average of 40K distinctive rules per relation. Employing an efficient dependency parser, the average run time for each relation is only 19 hours. We compare these rules with ones learned from local corpora of different sizes and demonstrate that the Web is indeed needed for a good coverage of linguistic variation.
-
Link Discovery with Guaranteed Reduction Ratio in Affine Spaces with Minkowski Measures
,
Axel-Cyrille Ngonga Ngomo
,
378-393
,
[OpenAccess]
,
[Publisher]
Time-efficient algorithms are essential to address the complex linking tasks that arise when trying to discover links on the Web of Data. Although several lossless approaches have been developed for this exact purpose, they do not offer theoretical guarantees with respect to their performance. In this paper, we address this drawback by presenting the first Link Discovery approach with theoretical quality guarantees. In particular, we prove that given an achievable reduction ratio r, our Link Discovery approach HR3 can achieve a reduction ratio r'<=r in a metric space where distances are measured by the means of a Minkowski metric of any order p >= 2. We compare HR3 and the HYPPO algorithm implemented in LIMES 0.5 with respect to the number of comparisons they carry out. In addition, we compare our approach with the algorithms implemented in the state-of-the-art frameworks LIMES 0.5 and SILK 2.5 with respect to runtime. We show that HR3 outperforms these previous approaches with respect to runtime in each of our four experimental setups.
-
MORe: Modular Combination of OWL Reasoners for Ontology Classification
,
Ana Armas Romero,Bernardo Cuenca Grau and Ian Horrocks
,
1-16
,
[OpenAccess]
,
[Publisher]
Classification is a fundamental reasoning task in ontology design, and there is currently a wide range of reasoners highly optimised for classification of OWL 2 ontologies. There are also several reasoners that are complete for restricted fragments of OWL 2 , such as the OWL 2 EL profile. These reasoners are much more efficient than fully-fledged OWL 2 reasoners, but they are not complete for ontologies containing (even if just a few) axioms outside the relevant fragment. In this paper, we propose a novel classification technique that combines an OWL 2 reasoner and an efficient reasoner for a given fragment in such a way that the bulk of the workload is assigned to the latter. Reasoners are combined in a black-box modular manner, and the specifics of their implementation (and even of their reasoning technique) are irrelevant to our approach.
-
Mining Semantic Relations between Research Areas
,
Francesco Osborne and Enrico Motta
,
410-426
,
[OpenAccess]
,
[Publisher]
For a number of years now we have seen the emergence of repositories of research data specified using OWL/RDF as representation languages, and conceptualized according to a variety of ontologies. This class of solutions promises both to facilitate the integration of research data with other relevant sources of information and also to support more intelligent forms of querying and exploration. However, an issue which has only been partially addressed is that of generating and characterizing semantically the relations that exist between research areas. This problem has been traditionally addressed by manually creating taxonomies, such as the ACM classification of research topics. However, this manual approach is inadequate for a number of reasons: these taxonomies are very coarse-grained and they do not cater for the finegrained research topics, which define the level at which typically researchers (and even more so, PhD students) operate. Moreover, they evolve slowly, and therefore they tend not to cover the most recent research trends. In addition, as we move towards a semantic characterization of these relations, there is arguably a need for a more sophisticated characterization than a homogeneous taxonomy, to reflect the different ways in which research areas can be related. In this paper we propose Klink, a new approach to i) automatically generating relations between research areas and ii) populating a bibliographic ontology, which combines both machine learning methods and external knowledge, which is drawn from a number of resources, including Google Scholar and Wikipedia. We have tested a number of alternative algorithms and our evaluation shows that a method relying on both external knowledge and the ability to detect temporal relations between research areas performs best with respect to a manually constructed standard.
-
On the Diversity and Availability of Temporal Information in Linked Open Data
,
Anisa Rula,Matteo Palmonari,Andreas Harth,Steffen Stadtmüller and Andrea Maurino
,
492-507
,
[OpenAccess]
,
[Publisher]
An increasing amount of data is published and consumed on the Web according to the Linked Data paradigm. In consideration of both publishers and consumers, the temporal dimension of data is important. In this paper we investigate the characterisation and availability of temporal information in Linked Data at large scale. Based on an abstract definition of temporal information we conduct experiments to evaluate the availability of such information using the data from the 2011 Billion Triple Challenge (BTC) dataset. Focusing in particular on the representation of temporal meta-information, i.e., temporal information associated with RDF statements and graphs, we investigate the approaches proposed in the literature, performing both a quantitative and a qualitative analysis and proposing guidelines for data consumers and publishers. Our experiments show that the amount of temporal information available in the LOD cloud is still very small; several different models have been used on different datasets, with a prevalence of approaches based on the annotation of RDF documents.
-
Ontology Constraints in Incomplete and Complete Data
,
Peter F. Patel-Schneider and Enrico Franconi
,
444-459
,
[OpenAccess]
,
[Publisher]
Ontology and other logical languages are built around the idea that axioms enable the inference of new facts about the available data. In some circumstances, however, the data is meant to be complete in certain ways, and deducing new facts may be undesirable. Previous approaches to this issue have relied on syntactically specifying certain axioms as constraints or adding in new constructs for constraints, and providing a different or extended meaning for constraints that reduces or eliminates their ability to infer new facts without requiring the data to be complete. We propose to instead directly state that the extension of certain concepts and roles are complete by making them DBox predicates, which eliminates the distinction between regular axioms and constraints for these concepts and roles. This proposal eliminates the need for special semantics and avoids problems of previous proposals.
-
Ontology-Based Access to Probabilistic Data with OWL QL
,
Jean Christoph Jung and Carsten Lutz
,
182-197
,
[OpenAccess]
,
[Publisher]
We propose a framework for querying probabilistic instance data in the presence of an OWL2 QL ontology, arguing that the interplay of probabilities and ontologies is fruitful in many applications such as managing data that was extracted from the web. The prime inference problem is computing answer probabilities, and it can be implemented using standard probabilistic database systems. We establish a PTime vs. #P dichotomy for the data complexity of this problem by lifting a corresponding result from probabilistic databases. We also demonstrate that query rewriting (backwards chaining) is an important tool for our framework, show that non-existence of a rewriting into first-order logic implies #P-hardness, and briefly discuss approximation of answer probabilities.
-
Performance Heterogeneity and Approximate Reasoning in Description Logic Ontologies
,
Rafael S. Gonçalves,Bijan Parsia and Ulrike Sattler
,
82-98
,
[OpenAccess]
,
[Publisher]
Due to the high worst case complexity of the core reasoning problem for the expressive profiles of OWL 2, ontology engineers are often surprised and confused by the performance behaviour of reasoners on their ontology. Even very experienced modellers with a sophisticated grasp of reasoning algorithms do not have a good mental model of reasoner performance behaviour. Seemingly innocuous changes to an OWL ontology can degrade classification time from instantaneous to too long to wait for. Similarly, switching reasoners (e.g., to take advantage of specific features) can result in wildly different classification times. In this paper we investigate performance variability phenomena in OWL ontology, and present methods to identify subsets of an ontology which are performance-degrading for a given reasoner. When such (ideally small) subsets are removed from an ontology, and the remainder is much easier for the given reasoner to reason over, we designate them "hot spots"?. The identification of these hot spots allows users to isolate difficult portions of the ontology in a principled and systematic way. Moreover, we devise and compare various methods for approximate reasoning and knowledge compilation based on hot spots. We verify our techniques with a select set of varyingly difficult ontology from the NCBO BioPortal, and were able to, firstly, successfully identify performance hot spots against the major freely available DL reasoners, and, secondly, significantly improve classification time using approximate reasoning based on hot spots.
-
Personalised Graph-based Selection of Web APIs
,
Milan Dojchinovski,Jaroslav Kuchar,Tomas Vitvar and Maciej Zaremba
,
34-48
,
[OpenAccess]
,
[Publisher]
Modelling and understanding various contexts of users is important to enable personalised selection of Web APIs in directories such as Programmable Web. Currently, relationships between users and Web APIs are not clearly understood and utilized by existing selection approaches. In this paper, we present a semantic model of a Web API directory graph that captures relationships such as Web APIs, mashups, developers, and categories. We describe a novel configurable graph-based method for selection of Web APIs with personalised and temporal aspects. The method allows users to get more control over their preferences and recommended Web APIs while they can exploit information about their social links and preferences. We evaluate the method on a real-world dataset from ProgrammableWeb.com, and show that it provides more contextualised results than currently available popularity based rankings.
-
Predicting Reasoning Performance Using Ontology Metrics
,
Yong-Bin Kang,Yuan-Fang Li and Shonali Krishnaswamy
,
198-214
,
[OpenAccess]
,
[Publisher]
A key issue in semantic reasoning is the computational complexity of inference tasks on expressive ontology languages such as OWL DL and OWL 2 DL. Theoretical works have established worst-case complexity results for reasoning tasks for these languages. However, hardness of reasoning about individual ontologies has not been adequately characterised. In this paper, we conduct a systematic study to tackle this problem using machine learning techniques, covering over 350 real-world ontologies and four state-of-the-art, widely-used OWL 2 reasoners. Our main contributions are two-fold. Firstly, we learn various classifiers that accurately predict classification time for an ontology based on its metric values. Secondly, we identify a number of metrics that can be used to effectively predict reasoning performance. Our prediction models have been shown to be highly effective, achieving an accuracy of over 80%.
-
Provenance for SPARQL queries
,
Carlos Viegas Damásio,Anastasia Analyti and Grigoris Antoniou
,
625-640
,
[OpenAccess]
,
[Publisher]
Determining trust of data available in the Semantic Web is fundamental for applications and users, in particular for linked open data obtained from SPARQL endpoints. There exist several proposals in the literature to annotate SPARQL query results with values from abstract models, adapting the seminal works on provenance for annotated relational databases. We provide an approach capable of providing provenance information for a large and significant fragment of SPARQL 1.1, including for the first time the major non-monotonic constructs under multiset semantics. The approach is based on the translation of SPARQL into relational queries over annotated relations with values of the most general m-semiring, and in this way also refuting a claim in the literature that the OPTIONAL construct of SPARQL cannot be captured appropriately with the known abstract models.
-
RDFS Reasoning on Massively Parallel Hardware
,
Norman Heino and Jeff Z. Pan
,
133-148
,
[OpenAccess]
,
[Publisher]
Recent developments in hardware have shown an increase in parallelism as opposed to clock rates. In order to fully exploit these new avenues of performance improvement, computationally expensive workloads have to be expressed in a way that allows for fine-grained parallelism. In this paper, we address the problem of describing RDFS entailment in such a way. Different from previous work on parallel RDFS reasoning, we assume a shared memory architecture. We analyze the problem of duplicates that naturally occur in RDFS reasoning and develop strategies towards its mitigation, exploiting all levels of our architecture. We implement and evaluate our approach on two real-world datasets and study its performance characteristics on different levels of parallelization. We conclude that RDFS entailment lends itself well to parallelization but can benefit even more from careful optimizations that take into account intricacies of modern parallel hardware.
-
Rapidly Integrating Services into the Linked Data Cloud
,
Mohsen Taheriyan,Craig A. Knoblock,Pedro A. Szekely and José Luis Ambite
,
559-574
,
[OpenAccess]
,
[Publisher]
The amount of data available in the Linked Data cloud continues to grow. Yet, few services consume and produce linked data. There is recent work that allows a user to define a linked service from an online service, which includes the specifications for consuming and producing linked data, but building such models is time consuming and requires specialized knowledge of RDF and SPARQL. This paper presents a new approach that allows domain experts to rapidly create semantic models of services by demonstration in an interactive web-based interface. First, the user provides examples of the service request URLs. Then, the system automatically proposes a service model the user can refine interactively. Finally, the system saves a service specification using a new expressive vocabulary that includes lowering and lifting rules. This approach empowers end users to rapidly model existing services and immediately use them to consume and produce linked data.
-
Robust Runtime Optimization and Skew-Resistant Execution of Analytical SPARQL Queries on Pig
,
Spyros Kotoulas,Jacopo Urbani,Peter A. Boncz and Peter Mika
,
247-262
,
[OpenAccess]
,
[Publisher]
We describe a system that incrementally translates SPARQL queries to Pig Latin and executes them on a Hadoop cluster. This system is designed to work efficiently on complex queries with many self-joins over huge datasets, avoiding job failures even in the case of joins with unexpected high-value skew. To be robust against cost estimation errors, our system interleaves query optimization with query execution, determining the next steps to take based on data samples and statistics gathered during the previous step. Furthermore, we have developed a novel skew-resistant join algorithm that replicates tuples corresponding to popular keys. We evaluate the effectiveness of our approach both on a synthetic benchmark known to generate complex queries (BSBM-BI) as well as on a Yahoo! case of data analysis using RDF data crawled from the web. Our results indicate that our system is indeed capable of processing huge datasets without pre-computed statistics while exhibiting good load-balancing properties.
-
SPLODGE: Systematic Generation of SPARQL Benchmark Queries for Linked Open Data
,
Olaf Görlitz,Matthias Thimm and Steffen Staab
,
116-132
,
[OpenAccess]
,
[Publisher]
The distributed and heterogeneous nature of Linked Open Data requires flexible and federated techniques for query evaluation. In order to evaluate current federation querying approaches a general methodology for conducting benchmarks is mandatory. In this paper, we present a classification methodology for federated SPARQL queries. This methodology can be used by developers of federated querying approaches to compose a set of test benchmarks that cover diverse characteristics of different queries and allows for comparability. We further develop a heuristic called SPLODGE for automatic generation of benchmark queries that is based on this methodology and takes into account the number of sources to be queried and several complexity parameters. We evaluate the adequacy of our methodology and the query generation strategy by applying them on the 2011 billion triple challenge data set.
-
SRBench: A Streaming RDF/SPARQL Benchmark
,
Ying Zhang,Minh-Duc Pham,Óscar Corcho and Jean-Paul Calbimonte
,
641-657
,
[OpenAccess]
,
[Publisher]
We introduce SRBench, a general-purpose benchmark primarily designed for streaming RDF/SPARQL engines, completely based on real-world data sets from the Linked Open Data cloud. With the increasing problem of too much streaming data but not enough tools to gain knowledge from them, researchers have set out for solutions in which Semantic Web technologies are adapted and extended for publishing, sharing, analysing and understanding streaming data. To help researchers and users comparing streaming RDF/SPARQL (strRS) engines in a standardised application scenario, we have designed SRBench, with which one can assess the abilities of a strRS engine to cope with a broad range of use cases typically encountered in real-world scenarios. The data sets used in the benchmark have been carefully chosen, such that they represent a realistic and relevant usage of streaming data. The benchmark defines a concise, yet comprehensive set of queries that cover the major aspects of strRS processing. Finally, our work is complemented with a functional evaluation on three representative strRS engines: SPARQLStream, C-SPARQL and CQELS. The presented results are meant to give a first baseline and illustrate the state-of-the-art.
-
Scalable Geo-thematic Query Answering
,
Özgür Lütfü Özçep and Ralf Möller
,
658-673
,
[OpenAccess]
,
[Publisher]
First order logic (FOL) rewritability is a desirable feature for query answering over geo-thematic ontologies because in most geoprocessing scenarios one has to cope with large data volumes. Hence, there is a need for combined geo-thematic logics that provide a sufficiently expressive query language allowing for FOL rewritability. The DL-Lite family of description logics is tailored towards FOL rewritability of query answering for unions of conjunctive queries, hence it is a suitable candidate for the thematic component of a combined geo-thematic logic. We show that a weak coupling of DL-Lite with the expressive region connection calculus RCC8 allows for FOL rewritability under a spatial completeness condition for the ABox. Stronger couplings allowing for FOL rewritability are possible only for spatial calculi as weak as the low-resolution calculus RCC2. Already a strong combination of DL-Lite with the low-resolution calculus RCC3 does not allow for FOL rewritability.
-
Semantic Enrichment by Non-Experts: Usability of Manual Annotation Tools
,
Annika Hinze,Ralf Heese,Markus Luczak-Rösch and Adrian Paschke
,
165-181
,
[OpenAccess]
,
[Publisher]
Most of the semantic content available has been generated automatically by using annotation services for existing content. Automatic annotation is not of sufficient quality to enable focused search and retrieval: either too many or too few terms are semantically annotated. User-defined semantic enrichment allows for a more targeted approach. We developed a tool for semantic annotation of digital documents and conducted an end-user study to evaluate its acceptance by and usability for non-expert users. This paper presents the results of this user study and discusses the lessons learned about both the semantic enrichment process and our methodology of exposing non-experts to semantic enrichment.
-
Semantic Sentiment Analysis of Twitter
,
Hassan Saif,Yulan He and Harith Alani
,
508-524
,
[OpenAccess]
,
[Publisher]
Sentiment analysis over Twitter offer organisations a fast and effective way to monitor the publics' feelings towards their brand, business, directors, etc. A wide range of features and methods for training sentiment classifiers for Twitter datasets have been researched in recent years with varying results. In this paper, we introduce a novel approach of adding semantics as additional features into the training set for sentiment analysis. For each extracted entity (e.g. iPhone) from tweets, we add its semantic concept (e.g. ''Apple product'') as an additional feature, and measure the correlation of the representative concept with negative/positive sentiment. We apply this approach to predict sentiment for three different Twitter datasets. Our results show an average increase of F harmonic accuracy score for identifying both negative and positive sentiment of around 6.5% and 4.8% over the baselines of unigrams and part-of-speech features respectively. We also compare against an approach based on sentiment-bearing topic analysis, and find that semantic features produce better Recall and F score when classifying negative sentiment, and better Precision with lower Recall and F score in positive sentiment classification.
-
Strabon: A Semantic Geospatial DBMS
,
Kostis Kyzirakos,Manos Karpathiotakis and Manolis Koubarakis
,
295-311
,
[OpenAccess]
,
[Publisher]
We present Strabon, a new RDF store that supports the state of the art semantic geospatial query languages stSPARQL and GeoSPARQL. To illustrate the expressive power offered by these query languages and their implementation in Strabon, we concentrate on the new version of the data model stRDF and the query language stSPARQL that we have developed ourselves. Like GeoSPARQL, these new versions use OGC standards to represent geometries where the original versions used linear constraints. We study the performance of Strabon experimentally and show that it scales to very large data volumes and performs, most of the times, better than all other geospatial RDF stores it has been compared with.
-
The Not-So-Easy Task of Computing Class Subsumptions in OWL RL
,
Markus Krötzsch
,
279-294
,
[OpenAccess]
,
[Publisher]
The lightweight ontology language OWL RL is used for reasoning with large amounts of data. To this end, the W3C standard provides a simple system of deduction rules, which operate directly on the RDF syntax of OWL. Several similar systems have been studied. However, these approaches are usually complete for instance retrieval only. This paper asks if and how such methods could also be used for computing entailed subclass relationships. Checking entailment for arbitrary OWL RL class subsumptions is co-NP-hard, but tractable rule-based reasoning is possible when restricting to subsumptions between atomic classes. Surprisingly, however, this cannot be achieved in any RDF-based rule system, i.e., the W3C calculus cannot be extended to compute all atomic class subsumptions. We identify syntactic restrictions to mitigate this problem, and propose a rule system that is sound and complete for many OWL RL ontologies.
-
Who will follow whom? Exploiting Semantics for Link Prediction in Attention-Information Networks
,
Matthew Rowe,Milan Stankovic and Harith Alani
,
476-491
,
[OpenAccess]
,
[Publisher]
Existing approaches for link prediction, in the domain of network science, exploit a network's topology to predict future connections by assessing existing edges and connections, and inducing links given the presence of mutual nodes. Despite the rise in popularity of Attention-Information Networks (i.e. microblogging platforms) and the production of content within such platforms, no existing work has attempted to exploit the semantics of published content when predicting network links. In this paper we present an approach that fills this gap by a) predicting follower edges within a directed social network by exploiting concept graphs and thereby significantly outperforming a random baseline and models that rely solely on network topology information, and b) assessing the different behavior that users exhibit when making followee-addition decisions. This latter contribution exposes latent factors within social networks and the existence of a clear need for topical affinity between users for a follow link to be created.
-
A Multi-Domain Framework for Community Building Based on Data Tagging
,
Bojan Bozic
,
441-444
,
[OpenAccess]
,
[Publisher]
In this paper, we present a doctoral thesis which introduces a new approach of time series enrichment with semantics. The paper shows the problem of assigning time series data to the right party of interest and why this problem could not be solved so far. We demonstrate a new way of processing semantic time series and the consequential ability of addressing users. The combination of time series processing and Semantic Web technologies leads us to a new powerful method of data processing and data generation, which offers completely new opportunities to the expert user.
-
Burst the Filter Bubble: Using Semantic Web to Enable Serendipity
,
Valentina Maccatrozzo
,
391-398
,
[OpenAccess]
,
[Publisher]
Personalization techniques aim at helping people dealing with the ever growing amount of information by filtering it according to their interests. However, to avoid the information overload, such techniques often create an over-personalization effect, i.e. users are exposed only to the content systems assume they would like. To break this "personalization bubble" we introduce the notion of serendipity as a performance measure for recommendation algorithms. For this, we first identify aspects from the user perspective, which can determine level and type of serendipity desired by users. Then, we propose a user model that can facilitate such user requirements, and enables serendipitous recommendations. The use case for this work focuses on TV recommender systems, however the ultimate goal is to explore the transferability of this method to different domains. This paper covers the work done in the first eight months of research and describes the plan for the entire PhD trajectory.
-
Composition of Linked Data-based RESTful Services
,
Steffen Stadtmüller
,
461-464
,
[OpenAccess]
,
[Publisher]
We address the problem of developing a scalable composition framework for Linked Data-based services, which retains the advantages of the loose coupling fostered by REST.
-
Cross Lingual Semantic Search by Improving Semantic Similarity and Relatedness Measures
,
Nitish Aggarwal
,
375-382
,
[OpenAccess]
,
[Publisher]
Since 2001, the semantic web community has been working hard towards creating standards which will increase the accessibility of available information on the web. Yahoo research recently reported that 30% of all HTML pages contain structured data such as microdata, RDFa, or microformat. Although multilinguality of the web is a hurdle in information access, the rapid growth of the semantic web enables us to retrieve fine grained information across the language barrier. In this thesis, firstly, we focus on developing a methodology to perform cross-lingual semantic search over structured data (knowledge base), by transforming natural language queries into SPARQL. Secondly, we focus on improving the semantic similarity and relatedness measures, to overcome the semantic gap between the vocabulary in the knowledge base and the terms appearing in the query. The preliminary results are evaluated against the QALD-2 test dataset, which achieved a F1 score of 0.46, an average precision of 0.44, and an average recall of 0.48.
-
Distributed Reasoning on Semantic Data Streams
,
Rehab Albeladi
,
433-436
,
[OpenAccess]
,
[Publisher]
Data streams are being continually generated in diverse application domains such as traffic monitoring, smart buildings, and so on. Stream Reasoning is the area that aims to combine reasoning techniques with data streams. In this paper, we present our approach to enable rule-based reasoning on semantic data streams using data flow networks in a distributed manner.
-
Knowledge Pattern Extraction and their usage in Exploratory Search
,
Andrea Giovanni Nuzzolese
,
449-452
,
[OpenAccess]
,
[Publisher]
Knowledge interaction in Web context is a challenging problem. For instance, it requires to deal with complex structures able to filter knowledge by drawing a meaningful context boundary around data. We assume that these complex structures can be formalized as Knowledge Patterns (KPs), aka frames. This Ph.D. work is aimed at developing methods for extracting KPs from the Web and at applying KPs to exploratory search tasks. We want to extract KPs by analyzing the structure of Web links from rich resources, such as Wikipedia.
-
Online Unsupervised Coreference Resolution for Semi-Structured Heterogeneous Data
,
Jennifer Sleeman
,
457-460
,
[OpenAccess]
,
[Publisher]
A pair of RDF instances are said to corefer when they are intended to denote the same thing in the world, for example, when two nodes of type foaf:Person describe the same individual. This problem is central to integrating and inter-linking semi-structured datasets. We are developing an online, unsupervised coreference resolution framework for heterogeneous, semi-structured data. The online aspect requires us to process new instances as they appear and not as a batch. The instances are heterogeneous in that they may contain terms from different ontologies whose alignments are not known in advance. Our framework encompasses a two-phased clustering algorithm that is both flexible and distributable, a probabilistic multidimensional attribute model that will support robust schema mappings, and a consolidation algorithm that will be used to perform instance consolidation in order to improve accuracy rates over time by addressing data sparseness.
-
Quality Reasoning in the Semantic Web
,
Chris Colin Baillie,Peter Edwards and Edoardo Pignotti
,
383-390
,
[OpenAccess]
,
[Publisher]
Assessing the quality of data published on the Web has been identified as an essential step in selecting reliable information for use in tasks such as decision making. This paper discusses a quality assessment framework based on semantic web technologies and outlines a role for provenance in supporting and documenting such assessments.
-
Reconstructing Provenance
,
Sara Magliacane
,
399-406
,
[OpenAccess]
,
[Publisher]
Provenance is an increasingly important aspect of data management that is often underestimated and neglected by practitioners. In our work, we target the problem of reconstructing provenance of files in a shared folder setting, assuming that only standard filesystem metadata are available. We propose a content-based approach that is able to reconstruct provenance automatically, leveraging several similarity measures and edit distance algorithms, adapting and integrating them into a multi-signal pipeline. We discuss our research methodology and show some promising preliminary results.
-
Replication for Linked Data
,
Laurens Rietveld
,
415-423
,
[OpenAccess]
,
[Publisher]
With the Semantic Web scaling up, and more triple-stores with update facilities being available, the need for higher levels of simultaneous triple-stores with identical information becomes more and more urgent. However, where such Data Replication approaches are common in the database community, there is no comprehensive approach for data replication for the Semantic Web. In this research proposal, we will discuss the problem space and scenarios of data replication in the Semantic Web, and explain how we plan on dealing with this issue.
-
Reusing XML Schemas' Information as a Foundation for Designing Domain Ontologies
,
Thomas Bosch
,
437-440
,
[OpenAccess]
,
[Publisher]
Designing domain ontologies from scratch is a time-consuming endeavor requiring a lot of close collaboration with domain experts. However, domain descriptions such as XML Schemas are often available in early stages of the ontology development process. For my dissertation, I propose a method to convert XML Schemas to OWL ontologies in an automatic way. The approach addresses the transformation of any XML Schema documents by using the XML Schema metamodel, which is completely represented by the XML Schema Metamodel Ontology. Automatically, all Schema declarations and definitions are converted to class axioms, which are intended to be enriched with additional domain-specific semantic information in form of domain ontologies.
-
SPARQL Update for Complex Event Processing
,
Mikko Rinne
,
453-456
,
[OpenAccess]
,
[Publisher]
Complex event processing is currently done primarily with proprietary definition languages. Future smart environments will require collaboration of multi-platform sensors operated by multiple parties. The goal of my research is to verify the applicability of standard-compliant SPARQL for complex event processing tasks. If successful, semantic web standards RDF, SPARQL and OWL with their established base of tools have many other benefits for event processing including support for interconnecting disjoint vocabularies, enriching event information with linked open data and reasoning over semantically annotated content. A software platform capable of continuous incremental evaluation of multiple parallel SPARQL queries is a key enabler of the approach.
-
Scalable and Domain-Independent Entity Coreference: Establishing High Quality Data Linkages Across Heterogeneous Data Sources
,
Dezhao Song
,
424-432
,
[OpenAccess]
,
[Publisher]
Due to the decentralized nature of the Semantic Web, the same real world entity may be described in various data sources and assigned syntactically distinct identifiers. In order to facilitate data utilization in the Semantic Web, without compromising the freedom of people to publish their data, one critical problem is to appropriately interlink such heterogeneous data. This interlinking process can also be referred to as Entity Coreference, i.e., finding which identifiers refer to the same real world entity. This proposal will investigate algorithms to solve this entity coreference problem in the Semantic Web in several aspects. The essence of entity coreference is to compute the similarity of instance pairs. Given the diversity of domains of existing datasets, it is important that an entity coreference algorithm be able to achieve good precision and recall across domains represented in various ways. Furthermore, in order to scale to large datasets, an algorithm should be able to intelligently select what information to utilize for comparison and determine whether to compare a pair of instances to reduce the overall complexity. Finally, appropriate evaluation strategies need to be chosen to verify the effectiveness of the algorithms.
-
Towards a theoretical foundation for the harmonization of linked data
,
Enrico Daga
,
445-448
,
[OpenAccess]
,
[Publisher]
In real world cases, building reliable problem centric views over Linked Data is a challenging task. An ideal method should include a formal representation of the requirements of the needed dataset and a controlled process moving from the original sources to the outcome. We believe that a goal oriented approach, similar to the AI planning problem, could be successful in controlling the process of linked data fusion, as well as to formalize the relations between requirements, process and result.
-
Very Large Scale OWL Reasoning through Distributed Computation
,
Raghava Mutharaju
,
407-414
,
[OpenAccess]
,
[Publisher]
Due to recent developments in reasoning algorithms of the various OWL profiles, the classification time for an ontology has come down drastically. For all of the popular reasoners, in order to process an ontology, an implicit assumption is that the ontology should fit in primary memory. The memory requirements for a reasoner are already quite high, and considering the ever increasing size of the data to be processed and the goal of making reasoning Web scale, this assumption becomes overly restrictive. In our work, we study several distributed classification approaches for the description logic EL+ (a fragment of OWL 2 EL profile). We present the lessons learned from each approach, our current results, and plans for future work.
-
A Comparison of Hard Filters and Soft Evidence for Answer Typing in Watson
,
Chris Welty,J. William Murdock,Aditya Kalyanpur and James Fan
,
243-256
,
[OpenAccess]
,
[Publisher]
Questions often explicitly request a particular type of answer. One popular approach to answering natural language questions involves filtering candidate answers based on precompiled lists of instances of common answer types (e.g., countries, animals, foods, etc.). Such a strategy is poorly suited to an open domain in which there is an extremely broad range of types of answers, and the most frequently occurring types cover only a small fraction of all answers. In this paper we present an alternative approach called TyCor, that employs soft filtering of candidates using multiple strategies and sources. We find that TyCor significantly outperforms a single-source, single-strategy hard filtering approach, demonstrating both that multi-source multi-strategy outperforms a single source, single strategy, and that its fault tolerance yields significantly better performance than a hard filter.
-
Achieving Interoperability through Semantics-based Technologies: The Instant Messaging Case
,
Amel Bennaceur,Valérie Issarny,Romina Spalazzese and Shashank Tyagi
,
17-33
,
[OpenAccess]
,
[Publisher]
The success of pervasive computing depends on the ability to compose a multitude of networked applications dynamically in order to achieve user goals. However, applications from different providers are not able to interoperate due to incompatible interaction protocols or disparate data models. Instant messaging is a representative example of the current situation, where various competing applications keep emerging. To enforce interoperability at runtime and in a non-intrusive manner, mediators are used to perform the necessary translations and coordination between the heterogeneous applications. Nevertheless, the design of mediators requires considerable knowledge about each application as well as a substantial development effort. In this paper we present an approach based on ontology reasoning and model checking in order to generate correct-by-construction mediators automatically. We demonstrate the feasibility of our approach through a prototype tool and show that it synthesises mediators that achieve efficient interoperation of instant messaging applications.
-
Applying Semantic Web Technologies for Diagnosing Road Traffic Congestions
,
Freddy Lécué,Anika Schumann and Marco Luca Sbodio
,
114-130
,
[OpenAccess]
,
[Publisher]
Diagnosis, or the method to connect causes to its effects, is an important reasoning task for obtaining insight on cities and reaching the concept of sustainable and smarter cities that is envisioned nowadays. This paper, focusing on transportation and its road traffic, presents how road traffic congestions can be detected and diagnosed in quasi real-time. We adapt pure Artificial Intelligence diagnosis techniques to fully exploit knowledge which is captured through relevant semantics-augmented stream and static data from various domains. Our prototype of semantic-aware diagnosis of road traffic congestions, experimented in Dublin Ireland, works efficiently with large, heterogeneous information sources and delivers value-added services to citizens and city managers in quasi real-time.
-
DEQA: Deep Web Extraction for Question Answering
,
Jens Lehmann,Tim Furche,Giovanni Grasso,Axel-Cyrille Ngonga Ngomo,Christian Schallhart,Andrew Jon Sellers,Christina Unger,Lorenz Bühmann,Daniel Gerber,Konrad Höffner,David Liu and Sören Auer
,
131-147
,
[OpenAccess]
,
[Publisher]
Despite decades of effort, intelligent object search remains elusive. Neither search engine nor semantic web technologies alone have managed to provide usable systems for simple questions such as "Find me a flat with a garden and more than two bedrooms near a supermarket." We introduce DEQA, a conceptual framework that achieves this elusive goal through combining state-of-the-art semantic technologies with effective data extraction. To that end, we apply DEQA to the UK real estate domain and show that it can answer a significant percentage of such questions correctly. DEQA achieves this by mapping natural language questions to SPARQL patterns. These patterns are then evaluated on an RDF database of current real estate offers. The offers are obtained using OXPATH, a state-of-the-art data extraction system, on the major agencies in the Oxford area and linked through LIMES to background knowledge such as the location of supermarkets.
-
Embedded EL + Reasoning on Programmable Logic Controllers
,
Stephan Grimm,Michael Watzke,Thomas Hubauer and Falco Cescolini
,
66-81
,
[OpenAccess]
,
[Publisher]
Many industrial use cases, such as machine diagnostics, can benefit from embedded reasoning, the task of running knowledge-based reasoning techniques on embedded controllers as widely used in industrial automation. However, due to the memory and CPU restrictions of embedded devices like programmable logic controllers (PLCs), state-of-the-art reasoning tools and methods cannot be easily migrated to industrial automation environments. In this paper, we describe an approach to porting lightweight OWL 2 EL reasoning to a PLC platform to run in an industrial automation environment. We report on initial runtime experiments carried out on a prototypical implementation of a PLC-based EL+ -reasoner in the context of a use case about turbine diagnostics.
-
Experiences with modeling composite phenotypes in the SKELETOME project
,
Tudor Groza,Andreas Zankl and Jane Hunter
,
82-97
,
[OpenAccess]
,
[Publisher]
Semantic annotation of patient data in the skeletal dysplasia domain (e.g., clinical summaries) is a challenging process due to the structural and lexical differences existing between the terms used to describe radiographic findings. In this paper we propose an ontology aimed at representing the intrinsic structure of such radiographic findings in a standard manner, in order to bridge the different lexical variations of the actual terms. Furthermore, we describe and evaluate an algorithm capable of mapping concepts of this ontology to exact or broader terms in the main phenotype ontology used in the bone dysplasia domain.
-
Incorporating Semantic Knowledge into Dynamic Data Processing for Smart Power Grids
,
Qunzhi Zhou,Yogesh Simmhan and Viktor K. Prasanna
,
257-273
,
[OpenAccess]
,
[Publisher]
Semantic Web allows us to model and query time-invariant or slowly evolving knowledge using ontologies. Emerging applications in Cyber Physical Systems such as Smart Power Grids that require continuous information monitoring and integration present novel opportunities and challenges for Semantic Web technologies. Semantic Web is promising to model diverse Smart Grid domain knowledge for enhanced situation awareness and response by multi-disciplinary participants. However, current technology does pose a performance overhead for dynamic analysis of sensor measurements. In this paper, we combine semantic web and complex event processing for stream based semantic querying. We illustrate its adoption in the USC Campus Micro-Grid for detecting and enacting dynamic response strategies to peak power situations by diverse user roles. We also describe the semantic ontology and event query model that supports this. Further, we introduce and evaluate caching techniques to improve the response time for semantic event queries to meet our application needs and enable sustainable energy management.
-
Linking Smart Cities Datasets with Human Computation - the case of UrbanMatch
,
Irene Celino,Simone Contessa,Marta Corubolo,Daniele Dell'Aglio,Emanuele Della Valle,Stefano Fumeo and Thorsten Krüger
,
34-49
,
[OpenAccess]
,
[Publisher]
To realize the Smart Cities vision, applications can leverage the large availability of open datasets related to urban environments. Those datasets need to be integrated, but it is often hard to automatically achieve a high-quality interlinkage. Human Computation approaches can be employed to solve such a task where machines are ineffective. We argue that in this case not only people's background knowledge is useful to solve the task, but also people's physical presence and direct experience can be successfully exploited. In this paper we present UrbanMatch, a Game with a Purpose for players in mobility aimed at validating links between points of interest and their photos; we discuss the design choices and we show the high throughput and accuracy achieved in the interlinking task.
-
Managing the life-cycle of Linked Data with the LOD2 Stack
,
Sören Auer,Lorenz Bühmann,Christian Dirschl,Orri Erling,Michael Hausenblas,Robert Isele,Jens Lehmann,Michael Martin,Pablo N. Mendes,Bert Van Nuffelen,Claus Stadler,Sebastian Tramp and Hugh Williams
,
1-16
,
[OpenAccess]
,
[Publisher]
The LOD2 Stack is an integrated distribution of aligned tools which support the whole life cycle of Linked Data from extraction, authoring/creation via enrichment, interlinking, fusing to maintenance. The LOD2 Stack comprises new and substantially extended existing tools from the LOD2 project partners and third parties. The stack is designed to be versatile; for all functionality we define clear interfaces, which enable the plugging in of alternative third-party implementations. The architecture of the LOD2 Stack is based on three pillars: (1) Software integration and deployment using the Debian packaging system. (2) Use of a central SPARQL endpoint and standardized vocabularies for knowledge base access and integration between the different tools of the LOD2 Stack. (3) Integration of the LOD2 Stack user interfaces based on REST enabled Web Applications. These three pillars comprise the methodological and technological framework for integrating the very heterogeneous LOD2 Stack components into a consistent framework. In this article we describe these pillars in more detail and give an overview of the individual LOD2 Stack components. The article also includes a description of a real-world usage scenario in the publishing domain.
-
QuerioCity: A Linked Data Platform for Urban Information Management
,
Vanessa Lopez,Spyros Kotoulas,Marco Luca Sbodio,Martin Stephenson,Aris Gkoulalas-Divanis and Pol Mac Aonghusa
,
148-163
,
[OpenAccess]
,
[Publisher]
In this paper, we present QuerioCity, a platform to catalog, index and query highly heterogenous information coming from complex systems, such as cities. A series of challenges are identified: namely, the heterogeneity of the domain and the lack of a common model, the volume of information and the number of data sets, the requirement for a low entry threshold to the system, the diversity of the input data, in terms of format, syntax and update frequency (streams vs static data), and the sensitivity of the information. We propose an approach for incremental and continuous integration of static and streaming data, based on Semantic Web technologies. The proposed system is unique in the literature in terms of handling of multiple integrations of available data sets in combination with flexible provenance tracking, privacy protection and continuous integration of streams. We report on lessons learnt from building the first prototype for Dublin.
-
Query Driven Hypothesis Generation for Answering Queries over NLP Graphs
,
Chris Welty,Ken Barker,Lora Aroyo and Shilpa Arora
,
228-242
,
[OpenAccess]
,
[Publisher]
It has become common to use RDF to store the results of Natural Language Processing (NLP) as a graph of the entities mentioned in the text with the relationships mentioned in the text as links between them. These NLP graphs can be measured with Precision and Recall against a ground truth graph representing what the documents actually say. When asking conjunctive queries on NLP graphs, the Recall of the query is expected to be roughly the product of the Recall of the relations in each conjunct. Since Recall is typically less than one, conjunctive query Recall on NLP graphs degrades geometrically with the number of conjuncts. We present an approach to address this Recall problem by hypothesizing links in the graph that would improve query Recall, and then attempting to find more evidence to support them. Using this approach, we confirm that in the context of answering queries over NLP graphs, we can use lower confidence results from NLP components if they complete a query result.
-
Semantic Reasoning in Context-Aware Assistive Environments to Support Ageing with Dementia
,
Thibaut Tiberghien,Mounir Mokhtari,Hamdi Aloulou and Jit Biswas
,
212-227
,
[OpenAccess]
,
[Publisher]
Robust solutions for ambient assisted living are numerous, yet predominantly specific in their scope of usability. In this paper, we describe the potential contribution of semantic web technologies to building more versatile solutions - a step towards adaptable context-aware engines and simplified deployments. Our conception and deployment work in hindsight, we highlight some implementation challenges and requirements for semantic web tools that would help to ease the development of context-aware services and thus generalize real-life deployment of semantically driven assistive technologies. We also compare available tools with regard to these requirements and validate our choices by providing some results from a real-life deployment.
-
Semantic similarity-driven decision support in the skeletal dysplasia domain
,
Razan Paul,Tudor Groza,Andreas Zankl and Jane Hunter
,
164-179
,
[OpenAccess]
,
[Publisher]
Biomedical ontologies have become a mainstream topic in medical research. They represent important sources of evolved knowledge that may be automatically integrated in decision support methods. Grounding clinical and radiographic findings in concepts defined by a biomedical ontology, e.g., the Human Phenotype Ontology, enables us to compute semantic similarity between them. In this paper, we focus on using such similarity measures to predict disorders on undiagnosed patient cases in the bone dysplasia domain. Different methods for computing the semantic similarity have been implemented. All methods have been evaluated based on their support in achieving a higher prediction accuracy. The outcome of this research enables us to understand the feasibility of developing decision support methods based on ontology-driven semantic similarity in the skeletal dysplasia domain.
-
Toward an ecosystem of LOD in the field: LOD content generation and its consuming service
,
Takahiro Kawamura and Akihiko Ohsuga
,
98-113
,
[OpenAccess]
,
[Publisher]
This paper proposes to apply semantic technologies in a new domain, Field research. It is said that if "raw data" is openly available on the Web, it will be used by other people to do wonderful things. But, it would be better to show a use case together with that data, especially in the dawn of LOD. Therefore, we are proceeding with both of LOD content generation and its application for a specific domain. The application addresses an issue of information retrieval in the field, and the mechanism of LOD generation from the Web might be applied to the other domain. Firstly, we demonstrate the use of our mobile application, which searches a plant fitting the environmental conditions obtained by the smartphone's sensors. Then, we introduce our approach of the LOD generation, and present an evaluation showing its practical effectiveness.
-
Trentino government linked open geo-data: a case study
,
Pavel Shvaiko,Feroz Farazi,Vincenzo Maltese,Alexander Ivanyukovich,Veronica Rizzi,Daniela Ferrari and Giuliana Ucelli
,
196-211
,
[OpenAccess]
,
[Publisher]
Our work is settled in the context of the public administration domain, where data can come from different entities, can be produced, stored and delivered in different formats and can have different levels of quality. Hence, such a heterogeneity has to be addressed, while performing various data integration tasks. We report our experimental work on publishing some government linked open geo-metadata and geo-data of the Italian Trentino region. Specifically, we illustrate how 161 core geographic datasets were released by leveraging on the geo-catalogue application within the existing geo-portal. We discuss the lessons we learned from deploying and using the application as well as from the released datasets.
-
Using SPARQL to Query BioPortal Ontologies and Metadata
,
Manuel Salvadores,Matthew Horridge,Paul R. Alexander,Ray W. Fergerson,Mark A. Musen and Natalya Fridman Noy
,
180-195
,
[OpenAccess]
,
[Publisher]
BioPortal is a repository of biomedical ontologies - the largest such repository, with more than 300 ontologies to date. This set includes ontologies that were developed in OWL, OBO and other languages, as well as a large number of medical terminologies that the US National Library of Medicine distributes in its own proprietary format. We have published the RDF based serializations of all these ontologies and their metadata at sparql.bioontology.org. This dataset contains 203M triples, representing both content and metadata for the 300+ ontologies; and 9M mappings between terms. This endpoint can be queried with SPARQL which opens new usage scenarios for the biomedical domain. This paper presents lessons learned from having redesigned several applications that today use this SPARQL endpoint to consume ontological data.
-
ourSpaces - Design and Deployment of a Semantic Virtual Research Environment
,
Peter Edwards,Edoardo Pignotti,Alan Eckhardt,Kapila Ponnamperuma,Chris Mellish and Thomas Bouttaz
,
50-65
,
[OpenAccess]
,
[Publisher]
In this paper we discuss our experience with the design, development and deployment of the ourSpaces Virtual Research Environment. ourSpaces makes use of Semantic Web technologies to create a platform to support multidisciplinary research groups. This paper introduces the main semantic components of the system: a framework to capture the provenance of the research process, a collection of services to create and visualise metadata and a policy reasoning service. We also describe different approaches to support interaction between users and metadata within the VRE. We discuss the lessons learnt during the deployment process with three case study groups. Finally, we present our conclusions and future directions for exploration in terms of developing ourSpaces further.
-
A Prototype for Semantic based Diagnosis of Road Traffic Congestions
,
Marco Luca Sbodio,Freddy Lecue and Anika Schumann
,
[OpenAccess]
,
[Publisher]
Retrieving the causes of road traffic congestions in quasi real-time is an important task that will enable city managers to get better insight into traffic issues and thus take appropriate corrective actions in a timely way. Our work, accepted at ISWC 2012 In-Use track, tackles this problem by integrating and reasoning over a variety of heterogeneous data sources including data streams. In this paper we present an initial prototype of our work for the city of Dublin, Ireland.
-
Adding Realtime Coverage to the Google Knowledge Graph
,
Thomas Steiner,Ruben Verborgh,Raphaël Troncy,Joaquim Gabarro and Rik Van de Walle
,
[OpenAccess]
,
[Publisher]
In May 2012, the Web search engine Google has introduced the so-called Knowledge Graph, a graph that understands real-world entities and their relationships to one another. Entities covered by the Knowledge Graph include landmarks, celebrities, cities, sports teams, buildings, movies, celestial objects, works of art, and more. The graph enhances Google search in three main ways: by disambiguation of search queries, by search log-based summarization of key facts, and by explorative search suggestions. With this paper, we suggest a fourth way of enhancing Web search: through the addition of realtime coverage of what people say about real-world entities on social networks. We report on a browser extension that seamlessly adds relevant microposts from the social networking sites Google+, Facebook, and Twitter in form of a panel to Knowledge Graph entities. In a true Linked Data fashion, we interlink detected concepts in microposts with Freebase entities, and evaluate our approach for both relevancy and usefulness. The extension is freely available, we invite the reader to reconstruct the examples of this paper to see how realtime opinions may have changed since time of writing.
-
Applying Multidimensional Navigation and Explanation in Semantic Dataset Summarization
,
James Michaelis,Deborah L. McGuinness,Cynthia Chang,Joanne Luciano and Jim Hendler
,
[OpenAccess]
,
[Publisher]
A key objective of multidimensional dataset analysis is to reveal patterns of interest to analysts. However, multidimensional analysis has been observed to be dicult for analysts, due to the challenges of both presenting and navigating large datasets. This work explores how initial summarizations of multidimensional datasets can be generated for consuming parties (designed to reduce the number of data points which would need to be displayed) driven by summarization policies based on provided dataset values. Additionally, functionality for explaining the derivation of summarizations is being developed - in line with prior work on aiding analyst interactions with data processing systems. To help drive development of this work, as well as provide illustrative use cases, we are presently developing a dataset summarization generator as part of greater work being done in the Foresight and Understanding from Scientific Exposition (FUSE) program.
-
Browsing Causal Chains in a Disease Ontology
,
Kouji Kozaki,Hiroko Kou,Yuki Yamagata,Takeshi Imai,Kazuhiko Ohe and Riichiro Mizoguchi
,
[OpenAccess]
,
[Publisher]
In order to realize sophisticated medical information systems, many medical ontologies have been developed. We proposed a definition of disease based on River Flow Model which captures a disease as a causal chain of clinical disorders. We also developed a disease ontology based on the model. It includes definitions of more than 6,000 diseases with 17,000 causal relationships. This demonstration summarizes the disease ontology and a browsing system for causal chains defined in it.
-
Building Large Scale Relation KB from Text
,
Junfeng Pan,Haofen Wang and Yong Yu
,
[OpenAccess]
,
[Publisher]
Recently more and more structured data in form of RDF triples have been published and integrated into Linked Open Data (LOD). While the current LOD contains hundreds of data sources with billions of triples, it has a small number of distinct relations compared with the large number of entities. On the other hand, Web pages are growing rapidly, which results in much larger number of textual contents to be exploited. With the popularity and wide adoption of open information extraction technology, extracting entities and relations among them from text at the Web scale is possible. In this paper, we present an approach to extract the subject individuals and the object counterparts for the relations from text and determine the most appropriate domain and range as well as the most confident dependency path patterns for the given relation based on the EM algorithm. As a preliminary results, we built a knowledge base for relations extracted from Chinese encyclopedias. The experimental results show the effectiveness of our approach to extract relations with reasonable domain, range and path pattern restrictions as well as high-quality triples.
-
Creating Enriched YouTube Media Fragments With NERD Using Timed-Text
,
Yunjia Li,Giuseppe Rizzo and Raphaël Troncy
,
[OpenAccess]
,
[Publisher]
This demo enables the automatic creation of semantically annotated YouTube media fragments. A video is first ingested in the Synote system and a new method enables to retrieve its associated subtitles or closed captions. Next, NERD is used to extract named entities from the transcripts which are then temporally aligned with the video. The entities are disambiguated in the LOD clound and a user interface enables to browse through the entities detected in a video or get more information. We evaluated our application with 60 videos from 3 YouTube channels.
-
Demo: Efficient Human Attention Detection in Museums based on Semantics and Complex Event Processing
,
Yongchun Xu,Nenad Stojanovic,Ljiljana Stojanovic and Tobias Schuchert
,
[OpenAccess]
,
[Publisher]
In this paper we present a demo for efficient detecting of visitor's attention in museum environment based on the application of intelligent complex event processing and semantic technologies. The detection takes advantage of semantics: (i) in design time for the correlation of sensors' data via modeling of the interesting situations and annotation of artworks and their parts and (ii) in real-time for the more accurate and precise detection of the interesting situation. The results of the proposed approach have been applied in the EU project ARtSENSE.
-
Demonstrating Blank Node Matching and RDF/S Comparison Functions
,
Christina Lantzaki,Yannis Tzitzikas and Dimitris Zeginis
,
[OpenAccess]
,
[Publisher]
The ability to compute the differences that exist between two RDF/S Knowledge Bases (KBs) is important for aiding humans to understand the evolution of knowledge, and for reducing the amount of data that need to be exchanged and managed over the network in order to build synchronization, versioning and replication services. We will show how we can exploit blank node anonymity in order to reduce the delta size when comparing RDF/S KBs. We will show experimental results over real and synthetic data sets that demonstrate significant reductions of the sizes of the computed deltas, and how the reduced deltas can be visualized. (This demo paper accompanies a research paper accepted for ISWC'2012)
-
DiscOU: A Flexible Discovery Engine for Open Educational Resources Using Semantic Indexing and Relationship Summaries
,
Mathieu d'Aquin,Carlo Allocca and Trevor Collins
,
[OpenAccess]
,
[Publisher]
We demonstrate the DiscOU engine implementing a resource discovery approach where the textual components of open educational resources are automatically annotated with relevant entities (using a named entity recognition system), so that these rich annotations can be searched by similarity, based on existing resources of interest.
-
Everything is Connected: Using Linked Data for Multimedia Narration of Connections between Concepts
,
Miel Vander Sande,Ruben Verborgh,Sam Coppens,Tom De Nies,Pedro Debevere,Laurens De Vocht,Pieterjan De Potter,Davy Van Deursen,Erik Mannens and Rik Van de Walle
,
[OpenAccess]
,
[Publisher]
This paper introduces a Linked Data application for automatically generating a story between two concepts in the Web of Data, based on formally described links. A path between two concepts is obtained by querying multiple linked open datasets; the path is then enriched with multimedia presentation material for each node in order to obtain a full multimedia presentation of the found path.
-
Extracting Relevant Subgraphs from Graph Navigation
,
Valeria Fionda,Claudio Gutierrez and Giuseppe Pirró
,
[OpenAccess]
,
[Publisher]
The main goal of current Web navigation languages is to retrieve set of nodes reachable from a given node. No information is provided about the fragments of the Web navigated to reach these nodes. In other words, information about their connections is lost. This paper presents an efficient algorithm to extract relevant parts of these Web fragments and shows the importance of producing subgraphs besides of sets of nodes. We discuss examples with real data using an implementation of the algorithm in the EXpRESs tool.
-
INSTANS: High-Performance Event Processing with Standard RDF and SPARQL
,
Mikko Rinne,Esko Nuutila and Seppo Törmä
,
[OpenAccess]
,
[Publisher]
Smart environments require collaboration of multi-platform sensors operated by multiple parties. Proprietary event processing solutions do not have enough interoperation flexibility, easily leading to overlapping functions wasting hardware and software resources as well as data communications. Our goal is to verify the applicability of standard-compliant SPARQL for any complex event processing task. If found feasible, semantic web methods RDF, SPARQL and OWL have the built-in support for interconnecting disjoint vocabularies, enriching event information with linked open data and reasoning over semantically annotated content, yielding a very flexible event processing environment. Our approach is designed to meet these requirements. Our INSTANS platform based on continuous execution of interconnected SPARQL queries using the Rete-algorithm is a new approach showing improved performance for event processing tasks over current SPARQL-based solutions.
-
Jena-HBase: A Distributed, Scalable and Effcient RDF Triple Store
,
Vaibhav Khadilkar,Murat Kantarcioglu,Bhavani Thuraisingham and Paolo Castagna
,
[OpenAccess]
,
[Publisher]
Lack of scalability is one of the most significant problems faced by single machine RDF data stores. The advent of Cloud Computing and related tools and technologies has paved a way for a distributed ecosystem of RDF triple stores that can potentially allow up to a planet scale storage along with distributed query processing capabilities. Towards this end, we present Jena-HBase, a HBase backed triple store that can be used with the Jena framework. Jena-HBase provides end-users with a scalable storage and querying solution that supports all features from the RDF specification.
-
Linked Data Fusion in ODCleanStore
,
Jan Michelfeit and Tomáš Knap
,
[OpenAccess]
,
[Publisher]
As part of LOD2 project and OpenData.cz initiative, we are developing an ODCleanStore framework enabling management of Linked Data. In this paper, we focus on the query-time data fusion in ODCleanStore, which provides data consumers with integrated views on Linked Data; the fused data (1) has solved conflicts according to the preferred conflict resolution policies and (2) is accompanied with provenance and quality scores, so that the consumers can judge the usefulness and trustworthiness of the data for their task at hand.
-
Making Sense of Research with Rexplore
,
Enrico Motta and Francesco Osborne
,
[OpenAccess]
,
[Publisher]
While there are many tools and services which support the exploration of research data, by and large these tend to provide a limited set of functionalities, which cover primarily ranking measures and mechanisms for relating authors, typically on the basis of simple co-authorship relations. To try and improve over the current state of affairs, we are developing a novel tool for exploring research data, which is called Rexplore. Rexplore builds on an intelligent algorithm for automatically identifying hierarchical and equivalence relations between research areas, to provide a variety of functionalities and visualizations to help users to make sense of a research area. These include visualizations to detect trends in research; ways to cluster authors according to several dynamic similarity measures; and fine-grained mechanisms for ranking authors, taking into account parameters such as ranking criterion, career stage, calendar years, publication venues, etc.
-
MeDetect: Domain Entity Annotation in Biomedical References Using Linked Open Data
,
Li Tian,Weinan Zhang,Haofen Wang,Chenyang Wu,Yuan Ni,Feng Cao and Yong Yu
,
[OpenAccess]
,
[Publisher]
Recently, with the ever-growing of textual medicine records, annotating domain entities has been regarded as an important task in the biomedical field. On the other hand, the process of interlinking open data sources is actively pursued within the Linking Open Data (LOD) project. The number of entities and the number of properties describing semantic relationships between entities within the linked data cloud are very large. In this paper, we propose a knowledge-incentive approach based on LOD for entity annotation in the biomedical field. With this approach, we implement MeDetect, a prototype system to solve the problems mentioned above. The experimental results verify the effectiveness and efficiency of our approach.
-
Mining Patterns from Clinical Trial Annotated Datasets by Exploiting the NCI Thesaurus
,
Joseph Benik,Guillermo Palma,Louiqa Raschid,Andreas Thor and Maria-Esther Vidal
,
[OpenAccess]
,
[Publisher]
Annotations of clinical trials with controlled vocabularies of drugs and diseases, encode scientific knowledge that can be mined to discover relationships between scientific concepts. We present PAnG (Patterns in Annotation Graphs), a tool that relies on dense subgraphs, graph summarization and taxonomic distance metrics, computed using the NCI Thesaurus, to identify patterns.
-
On Direct Debugging of Aligned Ontologies
,
Kostyantyn Shchekotykhin,Patrick Rodler,Philipp Fleiss and Gerhard Friedrich
,
[OpenAccess]
,
[Publisher]
Modern ontology debugging methods allow efficient identification and localization of faulty axioms defined by a user while developing an ontology. However, in many use cases such as ontology alignment the ontologies might include many conflict sets, i.e. sets of axioms preserving the faults, thus making ontology diagnosis infeasible. In this paper we present a debugging approach based on a direct computation of diagnoses that omits calculation of conflict sets. Embedded in an ontology debugger, the proposed algorithm is able to identify diagnoses for an ontology which includes a large number of faults and for which application of standard diagnosis methods fails. The evaluation results show that the approach is practicable and is able to identify a fault in adequate time.
-
QAKiS: an Open Domain QA System based on Relational Patterns
,
Elena Cabrio,Julien Cojan,Alessio Palmero Aprosio,Bernardo Magnini,Alberto Lavelli and Fabien Gandon
,
[OpenAccess]
,
[Publisher]
We present QAKiS, a system for open domain Question Answering over linked data. It addresses the problem of question interpretation as a relation-based match, where fragments of the question are matched to binary relations of the triple store, using relational textual patterns automatically collected. For the demo, the relational patterns are automatically extracted from Wikipedia, while DBpedia is the RDF data set to be queried using a natural language interface.
-
Queries, the Missing Link in Automatic Data Integration
,
Aibo Tian,Juan F. Sequeda and Daniel Miranker
,
[OpenAccess]
,
[Publisher]
This paper introduces the ontology mapping approach of a system that automatically integrates data sources into an ontology-based data integration system (OBDI). In addition to the domain and source ontologies, the mapping algorithm requires a SPARQL query to determine the ontology mapping. Further, the mapping algorithm is dynamic; running each time a query is processed and producing only a partial mapping sufficient to reformulate the query. This approach enables the mapping algorithm to exploit query semantics to correctly choose among ontology mappings that are indistinguishable when only the ontologies are considered. Also, the mapping associates paths with paths, instead of entities with entities. This approach simplifies query reformulation. The sys- tem achieves favorable results when compared to the algorithms developed for Clio, the best automated relational data integration system.
-
Quest: Effcient SPARQL-to-SQL for RDF and OWL
,
Mariano Rodriguez-Muro,Josef Hardi and Diego Calvanese
,
[OpenAccess]
,
[Publisher]
In this demo we introduce Quest, a new system that provides SPARQL query answering with support for OWL~2~QL and RDFS entailments. Quest allows to link the vocabulary of an ontology to the content of a relational database through mapping axioms. These are then used together with the ontology to answer a SPARQL query by means of a single SQL query that is executed over the database. Quest uses highly-optimised query rewriting techniques to generate the SQL query which not only takes into account the entailments of the ontology and data, but is also 'lean' and simple so that it can be executed efficiently by any SQL engine. Quest supports commercial and open source databases, including database federation tools like Teiid to allow for Ontology Based Data Integration of relational and other sources (e.g., CSV, Excel, XML). Here we will briefly describe Quest mapping language, the query answering process and the most relevant optimisation techniques used by the system. We will conclude with a brief description of the content of this demo.
-
RIO: Minimizing User Interaction in Ontology Debugging
,
Patrick Rodler,Kostyantyn Shchekotykhin,Philipp Fleiss and Gerhard Friedrich
,
[OpenAccess]
,
[Publisher]
Interactive ontology debugging incorporates a user who answers queries about entailments of their intended ontology. In order to minimize the amount of user interaction in a debugging session, a user must choose an appropriate query selection strategy. However, the choice of an unsuitable strategy may result in tremendous overhead in terms of time and cost. We present a learning method for query selection which unites the advantages of existing approaches while overcoming their flaws. Our tests show the utility of our approach when applied to a large set of real-world ontologies, its scalability and adequate reaction time allowing for continuous interactivity.
-
Real Time Fire Monitoring Using Semantic Web and Linked Data Technologies
,
Kostis Kyzirakos,Manos Karpathiotakis,George Garbis,Charalampos Nikolaou,Konstantina Bereta,Michael Sioutis,Ioannis Papoutsis,Themistoklis Herekakis,Dimitrios Michail,Manolis Koubarakis and Charis Kontoes
,
[OpenAccess]
,
[Publisher]
TELEIOS is a recent European project that addresses the need for scalable access to petabytes of Earth Observation data and the discovery and exploitation of knowledge that is hidden in them. In this demo paper we demonstrate a fire monitoring service that we have implemented in context of the project TELEIOS and explain how Semantic Web and Linked Data technologies allow the service to go beyond relevant services currently deployed in various Earth Observation data centers.
-
Reasoning in RDFS is Inherently Serial, at least in the worst case
,
Peter Patel-Schneider
,
[OpenAccess]
,
[Publisher]
Although it appears that reasoning in RDFS is embarrassingly parallel, this is not the case. Because all vocabulary is treated the same way in RDF, it is possible to extend the RDFS ontology vocabulary. The ability permits the creation of useful constructs that are not amenable to parallelism, and that in the end require serial processing.
-
Semantic Vernacular System: an Observation-based, Community-powered, and Semantics-enabled Naming System for Organisms
,
Han Wang,Nathan Wilson,Kathryn Dunn and Deborah L. McGuinness
,
[OpenAccess]
,
[Publisher]
The Semantic Vernacular System is a novel naming system for creating named, machine-interpretable descriptions for groups of organisms. Unlike the traditional scientific naming system, which is based on evolutionary relationships, it emphasizes the observable features of organisms. By independently naming the descriptions composed of sets of observational features, as well as maintaining connections to scientific names, it preserves the observational data used to identify organisms. The system is designed to support a peer-review mechanism for creating new names, and uses a controlled vocabulary encoded in the Web Ontology Language to represent the observational features. A prototype of the system is currently under development in collaboration with the Mushroom Observer website. It allows users to propose new names and descriptions, provide feedback on those proposals, and ultimately have them formally approved. This effort aims at offering the mycology community a knowledge base of fungal observational features and a tool for identifying fungal observations.
-
Simplifying MIREOT; a MIREOT Protege Plugin
,
Josh Hanna,Chen Cheng,Alex Crow,Roger Hall,Jie Liu,Tejaswini Pendurthi,Trent Schmidt,Steven Jennings,Mathias Brochhausen and William Hogan
,
[OpenAccess]
,
[Publisher]
The Web Ontology Language (OWL) is a commonly used standard for creating ontology artifacts. However, its capabilities for reusing existing OWL artifacts in the creation of new artifacts is limited to the import of whole ontologies, even when only a small handful of classes, object properties, and so on (which we refer to generically as OWL components) are relevant. This situation can result in extremely large and unwieldy, or even broken, ontologies. To address this problem while still promoting ontology reuse, the OBI Consortium has elucidated the Minimum Information to Reference an External Ontology Term (MIREOT). We provide a suite of plugins to the Protege editor that greatly simplifies the use of MIREOT principles during ontology creation and editing.
-
The Linked Data Visualization Model
,
Josep Maria Brunetti Fernández,Sören Auer and Roberto Garcia
,
[OpenAccess]
,
[Publisher]
The potential of the semantic data available in the Web is enormous but in most cases it is very difficult for users to explore and use this data. Applying information visualization techniques to the Semantic Web helps users to easily explore large amounts of data and interact with them. We devise a formal Linked Data Visualization model (LDVM), which allows to dynamically connect data with visualizations.
-
Towards Licenses Compatibility and Composition in the Web of Data
,
Serena Villata and Fabien Gandon
,
[OpenAccess]
,
[Publisher]
We propose a general framework to attach the licensing terms to the data where the compatibility of the licensing terms concerning the data affected by a query is verified, and, if compatible, the licenses are combined into a composite license. The framework returns the composite license as licensing term about the data resulting from the query.
-
TwikiMe! - User profiles that make sense
,
Patrick Siehndel and Ricardo Kawase
,
[OpenAccess]
,
[Publisher]
The use of social media has been rapidly increasing in the last years. Social media such as Twitter has become an important source of information for a variety of people. The public availability of data describing some of these social networks has led to a lot of research in this area. Link prediction, user classification and community detection are some of the main research areas related to social networks. In this paper we present a user modeling framework that uses Wikipedia as a frame to model user interests inside a social network. Our fine grained model of user interests reflects the areas a user is interested in as well as the level of expertise a user has in a certain field.
-
ourSpaces - A Semantic Virtual Research Environment
,
Peter Edwards,Edoardo Pignotti,Alan Eckhardt,Kapila Ponnamperuma,Chris Mellish and Thomas Bouttaz
,
[OpenAccess]
,
[Publisher]
In this demo we present ourSpaces, a semantic Virtual Research Environment designed to support inter-disciplinary research teams. The system utilizes technologies such as OWL, RDF and a rule-based reasoner to support the management of provenance information, social networks, online communication and policy enforcement within the VRE.