-
A Linked-Data-driven and Semantically-enabled Journal Portal for Scientometrics
,
Yingjie Hu,Krzysztof Janowicz,Grant Mckenzie,Kunal Sengupta and Pascal Hitzler
,
113-128
,
[OpenAccess]
,
[Publisher]
The Semantic Web journal by IOS Press follows a unique open and transparent process during which each submitted manuscript is available online together with the full history of its successive decision statuses, assigned editors, solicited and voluntary reviewers, their full text reviews, and in many cases also the authors’ response letters. Combined with a highly-customized, Drupal-based journal management system, this provides the journal with semantically rich manuscript time lines and networked data about authors, reviewers, and editors. These data are now exposed using a SPARQL endpoint, an extended Bibo ontology, and a modular Linked Data portal that provides interactive scientometrics based on established and new analysis methods. The portal can be customized for other journals as well.
-
Cross-language Semantic Retrieval and Linking of E-gov Services
,
Fedelucio Narducci,Matteo Palmonari and Giovanni Semeraro
,
129-144
,
[OpenAccess]
,
[Publisher]
Public administrations are aware of the advantages of sharing Open Government Data in terms of transparency, development of improved services, collaboration between stakeholders, and spurring new economic activities. Initiatives for the publication and interlinking of government service catalogs as Linked Open Data (lod) support the interoperability among European administrations and improve the capability of foreign citizens to access services across Europe. However, linking service catalogs to reference lod catalogs requires a significant effort from local administrations, preventing the uptake of interoperable solutions at a large scale. The web application presented in this paper is named CroSeR (Cross-language Service Retriever) and supports public bodies in the process of linking their own service catalogs to the lod cloud. CroSeR supports different European languages and adopts a semantic representation of e-gov services based on Wikipedia. CroSeR tries to overcome problems related to the short textual descriptions associated to a service by embodying a semantic annotation algorithm that enriches service labels with emerging Wikipedia concepts related to the service. An experimental evaluation carried-out on e-gov service catalogs in five different languages shows the effectiveness of our model.
-
Deployment of RDFa, Microdata, and Microformats on the Web – A Quantitative Analysis
,
Christian Bizer,Kai Eckert,Robert Meusel,Hannes Mühleisen,Michael Schuhmacher and Johanna Völker
,
17-32
,
[OpenAccess]
,
[Publisher]
More and more websites embed structured data describing for instance products, reviews, blog posts, people, organizations, events, and cooking recipes into their HTML pages using markup standards such as Microformats, Microdata and RDFa. This development has accelerated in the last two years as major Web companies, such as Google, Facebook, Yahoo!, and Microsoft, have started to use the embedded data within their applications. In this paper, we analyze the adoption of RDFa, Microdata, and Microformats across the Web. Our study is based on a large public Web crawl dating from early 2012 and consisting of 3 billion HTML pages which originate from over 40 million websites. The analysis reveals the deployment of the different markup standards, the main topical areas of the published data as well as the different vocabularies that are used within each topical area to represent data. What distinguishes our work from earlier studies, published by the large Web companies, is that the analyzed crawl as well as the extracted data are publicly available. This allows our findings to be verified and to be used as starting points for further domain-specific investigations as well as for focused information extraction endeavors.
-
Entity recommendations in Web Search
,
Roi Blanco,Berkant Barla Cambazoglu,Peter Mika and Nicolas Torzec
,
33-48
,
[OpenAccess]
,
[Publisher]
While some web search users know exactly what they are looking for, others are willing to explore topics related to an initial interest. Often, the user’s initial interest can be uniquely linked to an entity in a knowledge base. In this case, it is natural to recommend the explicitly linked entities for further exploration. In real world knowledge bases, however, the number of linked entities may be very large and not all related entities may be equally relevant. Thus, there is a need for ranking related entities. In this paper, we describe Spark, a recommendation engine that links a user’s initial query to an entity within a knowledge base and provides a ranking of the related entities. Spark extracts several signals from a variety of data sources, including Yahoo! Web Search, Twitter, and Flickr, using a large cluster of computers running Hadoop. These signals are combined with a machine learned ranking model in order to produce a final recommendation of entities to user queries. This system is currently powering Yahoo! Web Search result pages.
-
Incorporating Commercial and Private Data into an Open Linked Data Platform for Drug Discovery
,
Carole Goble,Alasdair J. G. Gray,Lee Harland,Karen Karapetyan,Antonis Loizou,Ivan Mikhailov,Yrjana Rankka,Stefan Senger,Valery Tkachenko,Antony Williams and Egon Willighagen
,
65-80
,
[OpenAccess]
,
[Publisher]
The Open PHACTS Discovery Platform aims to provide an integrated information space to advance pharmacological research in the area of drug discovery. Effective drug discovery requires comprehensive data coverage, i.e. integrating all available sources of pharmacology data. While many relevant data sources are available on the linked open data cloud, their content needs to be combined with that of commercial datasets and the licensing of these commercial datasets respected when providing access to the data. Additionally, pharmaceutical companies have built up their own extensive private data collections that they require to be included in their pharmacological dataspace. In this paper we discuss the challenges of incorporating private and commercial data into a linked dataspace: focusing on the modelling of these datasets and their interlinking. We also present the graph-based access control mechanism that ensures commercial and private datasets are only available to authorized users.
-
Integrating NLP using Linked Data
,
Sebastian Hellmann,Jens Lehmann,Sören Auer and Martin Brümmer
,
97-112
,
[OpenAccess]
,
[Publisher]
We are currently observing a plethora of Natural Language Processing tools and services being made available. Each of the tools and services has its particular strengths and weaknesses, but exploiting the strengths and synergistically combining different tools is currently an extremely cumbersome and time consuming task. Also, once a particular set of tools is integrated, this integration is not reusable by others. We argue that simplifying the interoperability of different NLP tools performing similar but also complementary tasks will facilitate the comparability of results and the creation of sophisticated NLP applications. In this paper, we present the NLP Interchange Format (NIF). NIF is based on a Linked Data enabled URI scheme for identifying elements in (hyper-)texts and an ontology for describing common NLP terms and concepts. In contrast to more centralized solutions such as UIMA and GATE, NIF enables the creation of heterogeneous, distributed and loosely coupled NLP applications, which use the Web as an integration platform. We present several use cases of the second version of the NIF specification (NIF 2.0) and the result of a developer study.
-
Publishing the Norwegian Petroleum Directorate’s FactPages as Semantic Web Data
,
Martin G. Skjæveland,Espen H. Lian and Ian Horrocks
,
161-176
,
[OpenAccess]
,
[Publisher]
This paper motivates, documents and evaluates the process and results of converting the Norwegian Petroleum Directorate’s Fact-Pages, a well-known and diverse set of tabular data, but with little and incomplete schema information, stepwise into other representations where in each step more semantics is added to the dataset. The different representations we consider are a regular relational database, a linked open data dataset, and an ontology. For each conversion step we explain and discuss necessary design choices which are due to the specific shape of the dataset, but also those due to the characteristics and idiosyncrasies of the representation formats. We additionally evaluate the output, performance and cost of querying the different formats using questions provided by users of the FactPages.
-
Real-time Urban Monitoring in Dublin using Semantic and Stream Technologies
,
Simone Tallevi-Diotallevi,Spyros Kotoulas,Luca Foschini,Freddy Lecue and Antonio Corradi
,
177-192
,
[OpenAccess]
,
[Publisher]
Several sources of information, from people, systems, things, are already available in most modern cities. Processing these continuous flows of information and capturing insight poses unique technical challenges that span from response time constraints to data heterogeneity, in terms of format and throughput. To tackle these problems, we focus on a novel prototype to ease real-time monitoring and decision-making processes for the City of Dublin with three main original technical aspects: (i) an extension to SPARQL to support efficient querying of heterogeneous streams; (ii) a query execution framework and runtime environment based on IBM InfoSphere Streams, a high-performance, industrial strength, stream processing engine; (iii) a hybrid RDFS reasoner, optimized for our stream processing execution framework. Our approach has been validated with real data collected on the field, as shown in our Dublin City video demonstration. Results indicate that real-time processing of city information streams based on semantic technologies is indeed not only possible, but also efficient, scalable and low-latency.
-
Reasoning on crowd-sourced semantic annotations to facilitate cataloguing of 3D artefacts in the cultural heritage domain
,
Chih-Hao Yu,Tudor Groza and Jane Hunter
,
225-240
,
[OpenAccess]
,
[Publisher]
The 3D Semantic Annotation (3DSA) system expedites the classification of 3D digital surrogates from the cultural heritage domain, by leveraging crowd-sourced semantic annotations. More specifically, the 3DSA system generates high-level classifications of 3D ob jects by applying rule-based reasoning across community-generated annotations and low-level shape and size attributes. This paper describes a particular use of the 3DSA system – cataloguing Greek pottery. It also describes our novel approach to rule-based reasoning that is modelled on concepts inspired from Markov logic networks. Our evaluation of this approach demonstrates its efficiency, accuracy and versatility, compared to classical rule-based reasoning.
-
Semantic Data and Models Sharing in systems Biology: The Just Enough Results Model and the SEEK Platform
,
Katherine Wolstencroft,Stuart Owen,Olga Krebs,Quyen Ngyuen,Jacky. L. Snoep,Wolfgang Mueller and Carole Goble
,
209-224
,
[OpenAccess]
,
[Publisher]
Research in Systems Biology involves integrating data and knowledge about the dynamic processes in biological systems in order to understand and model them. Semantic web technologies should be ideal for exploring the complex networks of genes, proteins and metabolites that interact, but much of this data is not natively available to the semantic web. Data is typically collected and stored with free-text annotations in spreadsheets, many of which do not conform to existing metadata standards and are often not publica lly released. Along with initiatives to promote more data sharing, one of the main challenges is therefore to semantically annotate and extract this data so that it is available to the research community. Data annotation and curation are expensive and undervalued tasks that have enormous benefits to the discipline as a whole, but fewer benefits to the individual data producers. By embedding semantic annotation into spreadsheets, however, and automat ically extracting this data into RDF at the time of repository submission, the process of producing standards-compliant data, that is available for semantic web querying, can be achieved without adding additional overheads to laboratory data management. This paper describes these strategies in the context of semantic data management in the SEEK. The SEEK is a web-based resource for sharing and exchanging Systems Biology data and models that is underpinned by the JERM ontology (Just Enough Results Model), which describes the relationships between data, models, protocols and experiments. The SEEK was originally developed for SysMO, a large European Systems Biology consortium studying micro-organisms, but it has since had widespread adoption across European Systems Biology.
-
Social listening of City Scale Events using the Streaming Linked Data Framework
,
Marco Balduini,Emanuele Della Valle,Daniele Dell'Aglio,Themis Palpanas,Mikalai Tsytsarau and Cristian Confalonieri
,
1-16
,
[OpenAccess]
,
[Publisher]
City-scale events may easily attract half a million of visitors in hundreds of venues over just a few days. Which are the most attended venues? What do visitors think about them? How do they feel before, during and after the event? These are few of the questions a city-scale event manger would like to see answered in real-time. In this paper, we report on our experience in social listening of two city-scale events (London Olympic Games 2012, and Milano Design Week 2013) using the Streaming Linked Data Framework.
-
The Energy Management Adviser at EDF
,
Pierre Chaussecourte,Birte Glimm,Ian Horrocks,Boris Motik and Laurent Pierre
,
49-64
,
[OpenAccess]
,
[Publisher]
The EMA (Energy Management Adviser) aims to produce personalised energy saving advice for EDF’s customers. The advice takes the form of one or more ‘tips’, and personalisation is achieved using semantic technologies: customers are described using RDF, an OWL ontology provides a conceptual model of the relevant domain (housing, environment, and so on) and the different kinds of tips, and SPARQL query answering is used to identify relevant tips. The current prototype provides tips to more than 300,000 EDF customers in France at least twice a year. The main challenges for our future work include providing a timely service for all of the 35 million EDF customers in France, simplifying the system’s maintenance, and providing new ways for interacting with customers such as via a Web site.
-
Using Linked Data to evaluate the impact of Research and Development in Europe: a Structural Equation Model
,
Amrapali Zaveri,Joao Ricardo Nickenig Vissoci,Cinzia Daraio and Ricardo Pietrobon
,
241-256
,
[OpenAccess]
,
[Publisher]
Europe has a high impact on the global biomedical literature, having contributed with a growing number of research articles and a significant citation impact. However, the impact of research and development generated by European countries on economic, educational and healthcare performance is poorly understood. The recent Linking Open Data (LOD) project has made a lot of data sources publicly available and in human-readable formats. In this paper, we demonstrate the utility of LOD in assessing the impact of Research and Development (R&D) on the economic, education and healthcare performance in Europe. We extract relevant variables from two LOD datasets, namely World Bank and Eurostat. We analyze the data for 20 out of the 27 European countries over a span of 10 years (1999 to 2009). We use a Structural Equation Modeling (SEM) approach to quantify the impact of R&D on the different measures. We perform different exploratory and confirmatory factorial analysis evaluations which gives rise to four latent variables that are included in the model: (i) Research and Development (R&D), (ii) Economic Performance (EcoP), (iii) Educational Performance (EduP), (iv) Healthcare performance (HcareP) of the European countries. Our results indicate the importance of R&D to the overall development of the European educational and healthcare performance (directly) and economic performance (indirectly). The results also shows the practical applicability of LOD to estimate this impact.
-
Using Semantic Web in ICD-11: Three Years Down the Road
,
Tania Tudorache,Csongor I Nyulas,Natasha F. Noy and Mark Musen
,
193-208
,
[OpenAccess]
,
[Publisher]
The World Health Organization is using Semantic Web technologies in the development of the 11th revision of the International Classification of Diseases (ICD-11). Health officials use ICD in all United Nations member countries to compile basic health statistics, to monitor health-related spending, and to inform policy makers. In 2010, we published a paper in the ISWC In Use track reporting on our experience in the first six months with building and deploying iCAT, a Semantic Web platform to support the collaborative authoring of ICD-11. Three years since our original publication, 270 domain experts around the world have used iCAT to author more than 45,000 classes, to perform more than 260,000 changes, and to create more than 17,000 links to external medical terminologies. During the last three years, the collaboration processes, modeling and tooling have evolved significantly, and we have learned important lessons, which we will report in this paper. We describe the benefits of using semantic technologies as an infrastructure, which proved to be critical in making support for this rapid evolution possible. To our knowledge, this effort is the only real-world pro ject supporting the collaborative authoring of ontologies at this scale, and which, at the same time, has a high visibility and impact for the health care around the world. We believe that the insights that we gained and the lessons that we learned after four years into this large-scale pro ject will be useful to others who need to support similar collaborative pro jects.
-
Using the past to explain the present: interlinking current affairs with archives via the Semantic Web
,
Yves Raimond,Michael Smethurst,Andrew McParland and Christopher Lowis
,
145-160
,
[OpenAccess]
,
[Publisher]
The BBC has a very large archive of programmes, covering a wide range of topics. This archive holds a significant part of the BBC’s institutional memory and is an important part of the cultural history of the United Kingdom and the rest of the world. These programmes, or parts of them, can help provide valuable context and background for current news events. However the BBC’s archive catalogue is not a complete record of everything that was ever broadcast. For example, it excludes the BBC World Service, which has been broadcasting since 1932. This makes the discovery of content within these parts of the archive very difficult. In this paper we describe a system based on Semantic Web technologies which helps us to quickly locate content related to current news events within those parts of the BBC’s archive with little or no pre-existing metadata. This system is driven by automated interlinking of archive content with the Semantic Web, user validations of the resulting data and topic extraction from live BBC News subtitles. The resulting inter-links between live news subtitles and the BBC’s archive are used in a dynamic visualisation enabling users to quickly locate relevant content. This content can then be used by journalists and editors to provide historical context, background information and supporting content around current affairs.
-
When History Matters - Assessing Reliability for the Reuse of Scientific Workflows
,
José Manuel Gómez-Pérez,Esteban García-Cuesta,Aleix Garrido and José Enrique Ruiz
,
81-96
,
[OpenAccess]
,
[Publisher]
Scientific workflows play an important role in computational research as essential artifacts for communicating the methods used to produce research findings. We are witnessing a growing number of efforts that treat workflows as first-class artifacts for sharing and exchanging scientific knowledge, either as part of scholarly articles or as standalone ob jects. However, workflows are not born to be reliable, which can seriously damage their reusability and trustworthiness as knowledge exchange instruments. Scientific workflows are commonly sub ject to decay, which consequently undermines their reliability over their lifetime. The reliability of workflows can be notably improved by advocating scientists to preserve a minimal set of information that is essential to assist the interpretations of these workflows and hence improve their potential for reproducibility and reusability. In this paper we show how, by measuring and monitoring the completeness and stability of scientific workflows over time we are able to provide scientists with a measure of their reliability, supporting the reuse of trustworthy scientific knowledge.
-
A Confidentiality Model for Ontologies
,
Piero Bonatti and Luigi Sauro
,
17-32
,
[OpenAccess]
,
[Publisher]
We illustrate several novel attacks to the confidentiality of knowledge bases (KB). Then we introduce a new confidentiality model, sensitive enough to detect those attacks, and a method for constructing secure KB views. We identify safe approximations of the background knowledge exploited in the attacks; they can be used to reduce the complexity of constructing secure KB views.
-
A Graph-Based Approach to Learn Semantic Descriptions of Data Sources
,
Mohsen Taheriyan,Craig Knoblock,Pedro Szekely and José Luis Ambite
,
593-608
,
[OpenAccess]
,
[Publisher]
Semantic models of data sources and services provide support to automate many tasks such as source discovery, data integration, and service composition, but writing these semantic descriptions by hand is a tedious and time-consuming task. Most of the related work focuses on automatic annotation with classes or properties of source attributes or input and output parameters. However, constructing a source model that includes the relationships between the attributes in addition to their semantic types remains a largely unsolved problem. In this paper, we present a graph-based approach to hypothesize a rich semantic description of a new target source from a set of known sources that have been modeled over the same domain ontology. We exploit the domain ontology and the known source models to build a graph that represents the space of plausible source descriptions. Then, we compute the top k candidates and suggest to the user a ranked list of the semantic models for the new source. The approach takes into account user corrections to learn more accurate semantic descriptions of future data sources. Our evaluation shows that our method produces models that are twice as accurate than the models produced using a state of the art system that does not learn from prior models.
-
A Query Tool for EL with Non-monotonic rules
,
Vadim Ivanov,Matthias Knorr and Joao Leite
,
209-224
,
[OpenAccess]
,
[Publisher]
We present the Protégé plug-in NoHR that allows the user to take an E L+⊥ ontology, add a set of non-monotonic (logic programming) rules – suitable e.g. to express defaults and exceptions – and query the combined knowledge base. Our approach uses the well-founded semantics for MKNF knowledge bases as underlying formalism, so no restriction other than DL-safety is imposed on the rules that can be written. The tool itself builds on the procedure SLG(O) and, with the help of OWL 2 EL reasoner ELK, pre-processes the ontology into rules, whose result together with the non-monotonic rules serve as input for the topdown querying engine XSB Prolog. With the resulting plug-in, even queries to very large ontologies, such as SNOMED CT, augmented with a large number of rules, can be processed at an interactive response time after one initial brief pre-processing period. At the same time, our system is able to deal with possible inconsistencies between the rules and an ontology that alone is consistent.
-
A decision procedure for SHOIQ with transitive closure of roles
,
Chan Le Duc,Myriam Lamolle and Olivier Curé
,
257-272
,
[OpenAccess]
,
[Publisher]
The Semantic Web makes an extensive use of the OWL DL ontology language, underlied by the SHOIQ description logic, to formalize its resources. In this paper, we propose a decision procedure for this logic extended with the transitive closure of roles in concept axioms, a feature needed in several application domains. The most challenging issue we have to deal with when designing such a decision procedure is to represent infinitely non-tree-shaped models, which are different from those of SHOIQ ontologies. To address this issue, we introduce a new blocking condition for characterizing models which may have an infinite non-tree-shaped part.
-
A snapshot of the OWL Web
,
Nicolas Matentzoglu,Samantha Bail and Bijan Parsia
,
321-336
,
[OpenAccess]
,
[Publisher]
Tool development for and empirical experimentation in OWL ontology engineering require a wide variety of suitable ontologies as input for testing and evaluation purposes and detailed characterisations of real ontologies. Empirical activities often resort to (somewhat arbitrarily) hand curated corpora available on the web, such as the NCBO BioPortal and the TONES Repository, or manually selected sets of well-known ontologies. Findings of surveys and results of benchmarking activities may be biased, even heavily, towards these datasets. Sampling from a large corpus of ontologies, on the other hand, may lead to more representative results. Current large scale repositories and web crawls are mostly uncurated and suffer from duplication, small and (for many purposes) uninteresting ontology files, and contain large numbers of ontology versions, variants, and facets, and therefore do not lend themselves to random sampling. In this paper, we survey ontologies as they exist on the web and describe the creation of a corpus of OWL DL ontologies using strategies such as web crawling, various forms of de-duplications and manual cleaning, which allows random sampling of ontologies for a variety of empirical applications.
-
Bringing Math to LOD: A Semantic Publishing Platform Prototype for Scientific Collections in Mathematics
,
Olga Nevzorova,Nikita Zhiltsov,Danila Zaikin,Olga Zhibrik,Alexander Kirillovich,Vladimir Nevzorov and Evgeniy Birialtsev
,
369-384
,
[OpenAccess]
,
[Publisher]
We present our work on developing a software platform for mining mathematical scholarly papers to obtain a Linked Data representation. Currently, the Linking Open Data (LOD) cloud lacks up-to-date and detailed information on professional level mathematics. To our mind, the main reason for that is the absence of appropriate tools that could analyze the underlying semantics in mathematical papers and effectively build their consolidated representation. We have developed a holistic approach to analysis of mathematical documents, including ontology based extraction, conversion of the article body as well as its metadata into RDF, integration with some existing LOD data sets, and semantic search. We argue that the platform may be helpful for enriching user experience on modern online scientific collections.
-
Complete Query Answering Over Horn Ontologies Using a Triple Store
,
Yujiao Zhou,Yavor Nenov,Bernardo Cuenca Grau and Ian Horrocks
,
703-718
,
[OpenAccess]
,
[Publisher]
In our previous work, we showed how a scalable OWL 2 RL reasoner can be used to compute both lower and upper bound query answers over very large datasets and arbitrary OWL 2 ontologies. However, when these bounds do not coincide, there still remain a number of possible answer tuples whose status is not determined. In this paper, we show how in the case of Horn ontologies one can exploit the lower and upper bounds computed by the RL reasoner to efficiently identify a subset of the data and ontology that is large enough to resolve the status of these tuples, yet small enough so that the status can be computed using a fully-fledged OWL 2 reasoner. The resulting hybrid approach has enabled us to compute exact answers to queries over datasets and ontologies where previously only approximate query answering was possible.
-
Completeness Statements about RDF Data Sources and Their Use for Query Answering
,
Fariz Darari,Werner Nutt,Giuseppe Pirrò and Simon Razniewski
,
65-80
,
[OpenAccess]
,
[Publisher]
With thousands of RDF data sources available on the Web covering disparate and possibly overlapping knowledge domains, the problem of providing high-level descriptions (in the form of metadata) of their content becomes crucial. In this paper we introduce a theoretical framework for describing data sources in terms of their completeness. We show how existing data sources can be described with completeness statements expressed in RDF. We then focus on the problem of the completeness of query answering over plain and RDFS data sources augmented with completeness statements. Finally, we present an extension of the completeness framework for federated data sources.
-
Controlled Query Evaluation over OWL 2 RL Ontologies
,
Bernardo Cuenca Grau,Evgeny Kharlamov,Egor V. Kostylev and Dmitriy Zheleznyakov
,
49-64
,
[OpenAccess]
,
[Publisher]
We study confidentiality enforcement in ontology-based information systems where ontologies are expressed in OWL 2 RL, a profile of OWL 2 that is becoming increasingly popular in Semantic Web applications. We formalise a natural adaptation of the Controlled Query Evaluation (CQE) framework to ontologies. Our goal is to provide CQE algorithms that (i) ensure confidentiality of sensitive information; (ii) are efficiently implementable by means of RDF triple store technologies; and (iii) ensure maximality of the answers returned by the system to user queries (thus restricting access to information as little as possible). We formally show that these requirements are in conflict and cannot be satisfied without imposing restrictions on ontologies. We propose a fragment of OWL 2 RL for which all three requirements can be satisfied. For the identified fragment, we design a CQE algorithm that has the same computational complexity as standard query answering and can be implemented by relying on state-of-the-art triple stores.
-
DAW: Duplicate-AWare Federated Query Processing over the Web of Data
,
Muhammad Saleem,Axel-Cyrille Ngonga Ngomo,Josiane Xavier Parreira,Helena Deus and Manfred Hauswirth
,
561-576
,
[OpenAccess]
,
[Publisher]
Over the last years the Web of Data has developed into a large compendium of interlinked data sets from multiple domains. Due to the decentralised architecture of this compendium, several of these datasets contain duplicated data. Yet, so far, only little attention has been paid to the effect of duplicated data on federated querying. This work presents DAW, a novel duplicate-aware approach to federated querying over the Web of Data. DAW is based on a combination of min-wise independent permutations and compact data summaries. It can be directly combined with existing federated query engines in order to achieve the same query recall values while querying fewer data sources. We extend three well-known federated query processing engines – DARQ, SPLENDID, and FedX – with DAW and compare our extensions with the original approaches. The comparison shows that DAW can greatly reduce the number of queries sent to the endpoints, while keeping high query recall values. Therefore, it can significantly improve the performance of federated query processing engines. Moreover, DAW provides a source selection mechanism that maximises the query recall, when the query processing is limited to a subset of the sources.
-
Discovering Missing Semantic Relations between Entities in Wikipedia
,
Mengling Xu,Zhichun Wang,Rongfang Bie,Juanzi Li,Chen Zheng,Wantian Ke and Mingquan Zhou
,
657-670
,
[OpenAccess]
,
[Publisher]
Wikipedia’s infoboxes contain rich structured information of various entities, which have been explored by the DBpedia pro ject to generate large scale Linked Data sets. Among all the infobox attributes, those attributes having hyperlinks in its values identify semantic relations between entities, which are important for creating RDF links between DBpedia’s instances. However, quite a few hyperlinks have not been anotated by editors in infoboxes, which causes lots of relations between entities being missing in Wikipedia. In this paper, we propose an approach for automatically discovering the missing entity links in Wikipedia’s infoboxes, so that the missing semantic relations between entities can be established. Our approach first identifies entity mentions in the given infoboxes, and then computes several features to estimate the possibilities that a given attribute value might link to a candidate entity. A learning model is used to obtain the weights of different features, and predict the destination entity for each attribute value. We evaluated our approach on the English Wikipedia data, the experimental results show that our approach can effectively find the missing relations between entities, and it significantly outperforms the baseline methods in terms of both precision and recall.
-
DynamiTE: Parallel Materialization of Dynamic RDF Data
,
Jacopo Urbani,Alessandro Margara,Ceriel Jacobs,Frank Van Harmelen and Henri Bal
,
641-656
,
[OpenAccess]
,
[Publisher]
One of the main advantages of using semantically annotated data is that machines can reason on it, deriving implicit knowledge from explicit information. In this context, materializing every possible implicit derivation from a given input can be computationally expensive, especially when considering large data volumes. Most of the solutions that address this problem rely on the assumption that the information is static, i.e., that it does not change, or changes very infrequently. However, the Web is extremely dynamic: online newspapers, blogs, social networks, etc., are frequently changed so that outdated information is removed and replaced with fresh data. This demands for a materialization that is not only scalable, but also reactive to changes. In this paper, we consider the problem of incremental materialization, that is, how to update the materialized derivations when new data is added or removed. To this purpose, we consider the ρdf RDFS fragment [12], and present a parallel system that implements a number of algorithms to quickly recalculate the derivation. In case new data is added, our system uses a parallel version of the well-known semi-naive evaluation of Datalog. In case of removals, we have implemented two algorithms, one based on previous theoretical work, and another one that is more efficient since it does not require a complete scan of the input. We have evaluated the performance using a prototype system called DynamiTE , which organizes the knowledge bases with a number of indices to facilitate the query process and exploits parallelism to improve the performance. The results show that our methods are indeed capable to recalculate the derivation in a short time, opening the door to reasoning on much more dynamic data than is currently possible.
-
Elastic and scalable processing of Linked Stream Data in the Cloud
,
Danh Le Phuoc,Hoan Nguyen Mau Quoc,Chan Le Van and Manfred Hauswirth
,
273-288
,
[OpenAccess]
,
[Publisher]
Linked Stream Data extends the Linked Data paradigm to dynamic data sources. It enables the integration and joint processing of heterogeneous stream data with quasi-static data from the Linked Data Cloud in near-real-time. Several Linked Stream Data processing engines exist but their scalability still needs to be in improved in terms of (static and dynamic) data sizes, number of concurrent queries, stream update frequencies, etc. So far, none of them supports parallel processing in the Cloud, i.e., elastic load profiles in a hosted environment. To remedy these limitations, this paper presents an approach for elastically parallelizing the continuous execution of queries over Linked Stream Data. For this, we have developed novel, highly efficient, and scalable parallel algorithms for continuous query operators. Our approach and algorithms are implemented in our CQELS Cloud system and we present extensive evaluations of their superior performance on Amazon EC2 demonstrating their high scalability and excellent elasticity in a real deployment.
-
Empirical Study of Logic-Based Modules: Cheap Is Cheerful
,
Chiara Del Vescovo,Pavel Klinov,Bijan Parsia,Ulrike Sattler,Thomas Schneider and Dmitry Tsarkov
,
81-96
,
[OpenAccess]
,
[Publisher]
For ontology reuse and integration, a number of approaches have been devised that aim at identifying modules, i.e., suitably small sets of “relevant” axioms from ontologies. Here we consider three logically sound notions of modules: MEX modules, only applicable to inexpressive ontologies; modules based on semantic locality, a sound approximation of the first; and modules based on syntactic locality, a sound approximation of the second (and thus the first), widely used since these modules can be extracted from OWL DL ontologies in time polynomial in the size of the ontology. In this paper we investigate the quality of both approximations over a large corpus of ontologies, using our own implementation of semantic locality, which is the first to our knowledge. In particular, we show with statistical significance that, in most cases, there is no difference between the two module notions based on locality; where they differ, the additional axioms can either be easily ruled out or their number is relatively small. We classify the axioms that explain the rare differences into four kinds of “culprits” and discuss which of those can be avoided by extending the definition of syntactic locality. Finally, we show that differences between MEX and locality-based modules occur for a minority of ontologies from our corpus and largely affect (approximations of ) expressive ontologies – this conclusion relies on a much larger and more diverse sample than existing comparisons between MEX and syntactic locality-based modules.
-
Exploring Scholarly Data with Rexplore
,
Francesco Osborne,Enrico Motta and Paul Mulholland
,
449-464
,
[OpenAccess]
,
[Publisher]
Despite the large number and variety of tools and services available today for exploring scholarly data, current support is still very limited in the context of sensemaking tasks, which go beyond standard search and ranking of authors and publications, and focus instead on i) understanding the dynamics of research areas, ii) relating authors ‘semantically’ (e.g., in terms of common interests or shared academic trajectories), or iii) performing fine-grained academic expert search along multiple dimensions. To address this gap we have developed a novel tool, Rexplore, which integrates statistical analysis, semantic technologies, and visual analytics to provide effective support for exploring and making sense of scholarly data. Here, we describe the main innovative elements of the tool and we present the results from a task-centric empirical evaluation, which shows that Rexplore is highly effective at providing support for the aforementioned sensemaking tasks. In addition, these results are robust both with respect to the background of the users (i.e., expert analysts vs. ‘ordinary’ users) and also with respect to whether the tasks are selected by the evaluators or proposed by the users themselves.
-
FedSearch: efficiently combining structured queries and full-text search in a SPARQL federation
,
Andriy Nikolov,Andreas Schwarte and Christian Hütter
,
417-432
,
[OpenAccess]
,
[Publisher]
Combining structured queries with full-text search provides a powerful means to access distributed linked data. However, executing hybrid search queries in a federation of multiple data sources presents a number of challenges due to data source heterogeneity and lack of statistical data about keyword selectivity. To address these challenges, we present FedSearch – a novel hybrid query engine based on the SPARQL federation framework FedX. We extend the SPARQL algebra to incorporate keyword search clauses as first-class citizens and apply novel optimization techniques to improve the query processing efficiency while maintaining a meaningful ranking of results. By performing on-the-fly adaptation of the query execution plan and intelligent grouping of query clauses, we are able to reduce significantly the communication costs making our approach suitable for top-k hybrid search across multiple data sources. In experiments we demonstrate that our optimization techniques can lead to a substantial performance improvement, reducing the execution time of hybrid queries by more than an order of magnitude.
-
Federated Entity Search using On-The-Fly Consolidation
,
Daniel M. Herzig,Roi Blanco,Peter Mika and Thanh Tran
,
161-176
,
[OpenAccess]
,
[Publisher]
Nowadays, search on the Web goes beyond the retrieval of textual Web sites and increasingly takes advantage of the growing amount of structured data. Of particular interest is entity search, where the units of retrieval are structured entities instead of textual documents. These entities reside in different sources, which may provide only limited information about their content and are therefore called “uncooperative”. Further, these sources capture complementary but also redundant information about entities. In this environment of uncooperative data sources, we study the problem of federated entity search, where redundant information about entities is reduced on-the-fly through entity consolidation performed at query time. We propose a novel method for entity consolidation that is based on using language models and completely unsupervised, hence more suitable for this on-the-fly uncooperative setting than state-of-the-art methods that require training data. Further, we apply the same language model technique to deal with the federated search problem of ranking results returned from different sources. Particular novel are the mechanisms we propose to incorporate consolidation results into this ranking. We perform experiments using real Web queries and data sources. Our experiments show that our approach for federated entity search with on-the-fly consolidation improves upon the performance of a state-of-the-art preference aggregation baseline and also benefits from consolidation.
-
Getting Lucky in Ontology Search: A Data-Driven Evaluation Framework for Ontology Ranking
,
Natasha F. Noy,Paul Alexander,Rave Harpaz,Trish Whetzel,Raymond Fergerson and Mark Musen
,
433-448
,
[OpenAccess]
,
[Publisher]
With hundreds, if not thousands, of ontologies available today in many different domains, ontology search and ranking has become an important and timely problem. When a user searches a collection of ontologies for her terms of interest, there are often dozens of ontologies that contain these terms. How does she know which ontology is the most relevant to her search? Our research group hosts BioPortal, a public repository of more than 330 ontologies in the biomedical domain. When a term that a user searches for is available in multiple ontologies, how do we rank the results and how do we measure how well our ranking works? In this paper, we develop an evaluation framework that enables developers to compare and analyze the performance of different ontology-ranking methods. Our framework is based on processing search logs and determining how often users select the top link that the search engine offers. We evaluate our framework by analyzing the data on BioPortal searches. We explore several different ranking algorithms and measure the effectiveness of each ranking by measuring how often users click on the highest ranked ontology. We collected log data from more than 4,800 BioPortal searches. Our results show that regardless of the ranking, in more than half the searches, users select the first link. Thus, it is even more critical to ensure that the ranking is appropriate if we want to have satisfied users. Our further analysis demonstrates that ranking ontologies based on page view data significantly improves the user experience, with an approximately 26% increase in the number of users who select the highest ranked ontology for the search.
-
Incremental Reasoning in OWL EL without Bookkeeping
,
Yevgeny Kazakov and Pavel Klinov
,
225-240
,
[OpenAccess]
,
[Publisher]
We describe a method for updating the classification of ontologies expressed in the E L family of Description Logics after some axioms have been added or deleted. While incremental classification modulo additions is relatively straightforward, handling deletions is more problematic since it requires retracting logical consequences that are no longer valid. Known algorithms address this problem using various forms of bookkeeping to trace the consequences back to premises. But such additional data can consume memory and place an extra burden on the reasoner during application of inferences. In this paper, we present a technique, which avoids this extra cost while being very efficient for small incremental changes in ontologies. The technique is freely available as a part of the open-source E L reasoner ELK and its efficiency is demonstrated on naturally occurring and synthetic data.
-
Indented Tree or Graph? A Usability Study of Ontology Visualization Techniques in the Context of Class Mapping Evaluation
,
Bo Fu,Natalya F. Noy and Margaret-Anne Storey
,
113-128
,
[OpenAccess]
,
[Publisher]
Research effort in ontology visualization has largely focused on developing new visualization techniques. At the same time, researchers have paid less attention to investigating the usability of common visualization techniques that many practitioners regularly use to visualize ontological data. In this paper, we focus on two popular ontology visualization techniques: indented tree and graph. We conduct a controlled usability study with an emphasis on the effectiveness, efficiency, workload and satisfaction of these visualization techniques in the context of assisting users during evaluation of ontology mappings. Findings from this study have revealed both strengths and weaknesses of each visualization technique. In particular, while the indented tree visualization is more organized and familiar to novice users, subjects found the graph visualization to be more controllable and intuitive without visual redundancy, particularly for ontologies with multiple inheritance.
-
Infrastructure for Efficient Exploration of Large Scale Linked Data via Contextual Tag Clouds
,
Xingjian Zhang,Dezhao Song,Sambhawa Priya and Jeff Heflin
,
671-686
,
[OpenAccess]
,
[Publisher]
In this paper we present the infrastructure of the contextual tag cloud system which can execute large volumes of queries about the number of instances that use particular ontological terms. The contextual tag cloud system is a novel application that helps users explore a large scale RDF dataset: the tags are ontological terms (classes and properties), the context is a set of tags that defines a subset of instances, and the font sizes reflect the number of instances that use each tag. It visualizes the patterns of instances specified by the context a user constructs. Given a request with a specific context, the system needs to quickly find what other tags the instances in the context use, and how many instances in the context use each tag. The key question we answer in this paper is how to scale to Linked Data; in particular we use a dataset with 1.4 billion triples and over 380,000 tags. This is complicated by the fact that the calculation should, when directed by the user, consider the entailment of taxonomic and/or domain/range axioms in the ontology. We combine a scalable preprocessing approach with a specially-constructed inverted index and use three approaches to prune unnecessary counts for faster intersection computations. We compare our system with a state-of-the-art triple store, examine how pruning rules interact with inference and analyze our design choices.
-
Knowledge Graph Identification
,
Jay Pujara,Hui Miao,Lise Getoor and William Cohen
,
529-544
,
[OpenAccess]
,
[Publisher]
Large-scale information processing systems are able to extract massive collections of interrelated facts, but unfortunately transforming these candidate facts into useful knowledge is a formidable challenge. In this paper, we show how uncertain extractions about entities and their relations can be transformed into a know ledge graph. The extractions form an extraction graph and we refer to the task of removing noise, inferring missing information, and determining which candidate facts should be included into a knowledge graph as know ledge graph identification. In order to perform this task, we must reason jointly about candidate facts and their associated extraction confidences, identify coreferent entities, and incorporate ontological constraints. Our proposed approach uses probabilistic soft logic (PSL), a recently introduced probabilistic modeling framework which easily scales to millions of facts. We demonstrate the power of our method on a synthetic Linked Data corpus derived from the MusicBrainz music community and a real-world set of extractions from the NELL pro ject containing over 1M extractions and 70K ontological relations. We show that compared to existing methods, our approach is able to achieve improved AUC and F1 with significantly lower running time.
-
ORCHID – Reduction-Ratio-Optimal Computation of Geo-Spatial Distances for Link Discovery
,
Axel-Cyrille Ngonga Ngomo
,
385-400
,
[OpenAccess]
,
[Publisher]
The discovery of links between resources within knowledge bases is of crucial importance to realize the vision of the Semantic Web. Addressing this task is especially challenging when dealing with geo-spatial datasets due to their sheer size and the potential complexity of single geo-spatial objects. Yet, so far, little attention has been paid to the characteristics of geo-spatial data within the context of link discovery. In this paper, we address this gap by presenting Orchid, a reduction-ratio-optimal link discovery approach designed especially for geo-spatial data. Orchid relies on a combination of the Hausdorff and orthodromic metrics to compute the distance between geo-spatial objects. We first present two novel approaches for the efficient computation of Hausdorff distances. Then, we present the space tiling approach implemented by Orchid and prove that it is optimal with respect to the reduction ratio that it can achieve. The evaluation of our approaches is carried out on three real datasets of different size and complexity. Our results suggest that our approaches to the computation of Hausdorff distances require two orders of magnitude less orthodromic distances computations to compare geographical data. Moreover, they require two orders of magnitude less time than a naive approach to achieve this goal. Finally, our results indicate that Orchid scales to large datasets while outperforming the state of the art significantly.
-
On the Status of Experimental Research on the Semantic Web
,
Heiner Stuckenschmidt,Michael Schuhmacher,Johannes Knopp,Christian Meilicke and Ansgar Scherp
,
577-592
,
[OpenAccess]
,
[Publisher]
Experimentation is an important way to validate results of Semantic Web and Computer Science research in general. In this paper, we investigate the development and the current status of experimental work on the Semantic Web. Based on a corpus of 500 papers collected from the International Semantic Web Conferences (ISWC) over the past decade, we analyse the importance and the quality of experimental research conducted and compare it to general Computer Science. We observe that the amount and quality of experiments are steadily increasing over time. Unlike hypothesised, we cannot confirm a statistically significant correlation between a paper’s citations and the amount of experimental work reported. Our analysis, however, shows that papers comparing themselves to other systems are more often cited than other papers.
-
One License to Compose Them All: a deontic logic approach to data licensing on the Web of Data
,
Guido Governatori,Antonino Rotolo,Serena Villata and Fabien Gandon
,
145-160
,
[OpenAccess]
,
[Publisher]
In the domain of Linked Open Data a need is emerging for developing automated frameworks able to generate the licensing terms associated to data coming from heterogeneous distributed sources. This paper proposes and evaluates a deontic logic semantics which allows us to define the deontic components of the licenses, i.e., permissions, obligations, and prohibitions, and generate a composite license compliant with the licensing items of the composed different licenses. Some heuristics are proposed to support the data publisher in choosing the licenses composition strategy which better suits her needs w.r.t. the data she is publishing.
-
Ontology-Based Data Access: Ontop of Databases
,
Mariano Rodriguez-Muro,Roman Kontchakov and Michael Zakharyaschev
,
545-560
,
[OpenAccess]
,
[Publisher]
We present the architecture and technologies underpinning the OBDA system Ontop and taking full advantage of storing data in relational databases. We discuss the theoretical foundations of Ontop : the tree-witness query rewriting, T -mappings and optimisations based on database integrity constraints and SQL features. We analyse the performance of Ontop in a series of experiments and demonstrate that, for standard ontologies, queries and data stored in relational databases, Ontop is fast, efficient and produces SQL rewritings of high quality.
-
Pattern Based Knowledge Base Enrichment
,
Lorenz Bühmann and Jens Lehmann
,
33-48
,
[OpenAccess]
,
[Publisher]
Although an increasing number of RDF knowledge bases are published, many of those consist primarily of instance data and lack sophisticated schemata. Having such schemata allows more powerful querying, consistency checking and debugging as well as improved inference. One of the reasons why schemata are still rare is the effort required to create them. In this article, we propose a semi-automatic schemata construction approach addressing this problem: First, the frequency of axiom patterns in existing knowledge bases is discovered. Afterwards, those patterns are converted to SPARQL based pattern detection algorithms, which allow to enrich knowledge base schemata. We argue that we present the first scalable knowledge base enrichment approach based on real schema usage patterns. The approach is evaluated on a large set of knowledge bases with a quantitative and qualitative result analysis.
-
Personalized Best Answer Computation in Graph Databases
,
Michael Ovelgönne,Noseong Park,V.S. Subrahmanian,Elizabeth K. Bowman and Kirk A. Ogaard
,
465-480
,
[OpenAccess]
,
[Publisher]
Though subgraph matching has been extensively studied as a query paradigm in semantic web and social network data environments, a user can get a large number of answers in response to a query. Just like Google does, these answers can be shown to the user in accordance with an importance ranking. In this paper, we present scalable algorithms to find the top-K answers to a practically important subset of SPARQL-queries, denoted as importance queries, via a suite of pruning techniques. We test our algorithms on multiple real-world graph data sets, showing that our algorithms are efficient even on networks with up to 6M vertices and 15M edges and far more efficient than popular triple stores.
-
ProSWIP: Property-based Data Access for Semantic Web Interactive Programming
,
Silviu Homoceanu,Philipp Wille and Wolf-Tilo Balke
,
177-192
,
[OpenAccess]
,
[Publisher]
The Semantic Web has matured from a mere theoretical vision to a variety of ready-to-use linked open data sources currently available on the Web. Still, with respect to application development , the Web community is just starting to develop new paradigms in which data as the main driver of applications is promoted to first class status. Relying on properties of resources as an indicator for the type, property-based typing is such a paradigm. In this paper, we inspect the feasibility of property-based typing for accessing data from the linked open data cloud. Problems in terms of transparency and quality of the selected data were noticeable. To alleviate these problems, we developed an iterative approach that builds on human feedback.
-
QODI: Query as Context in Automatic Data Integration
,
Aibo Tian,Juan F. Sequeda and Daniel Miranker
,
609-624
,
[OpenAccess]
,
[Publisher]
QODI is an automatic ontology-based data integration system (OBDI). QODI is distinguished in that the ontology mapping algorithm dynamically determines a partial mapping specific to the reformulation of each query. The query provides application context not available in the ontologies alone; thereby the system is able to disambiguate mappings for different queries. The mapping algorithm decomposes the query into a set of paths, and compares the set of paths with a similar decomposition of a source ontology. Using test sets from three real world applications, QODI achieves favorable results compared with AgreementMaker, a leading ontology matcher, and an ontology-based implementation of the mapping methods detailed for Clio, the state-of-the-art relational data integration and data exchange system.
-
Real-time RDF extraction from unstructured data streams
,
Daniel Gerber,Sebastian Hellmann,Lorenz Bühmann,Tommaso Soru,Axel-Cyrille Ngonga Ngomo and Ricardo Usbeck
,
129-144
,
[OpenAccess]
,
[Publisher]
The vision behind the Web of Data is to extend the current document-oriented Web with machine-readable facts and structured data, thus creating a representation of general knowledge. However, most of the Web of Data is limited to being a large compendium of encyclopedic knowledge describing entities. A huge challenge, the timely and massive extraction of RDF facts from unstructured data, has remained open so far. The availability of such knowledge on the Web of Data would provide significant benefits to manifold applications including news retrieval, sentiment analysis and business intelligence. In this paper, we address the problem of the actuality of the Web of Data by presenting an approach that allows extracting RDF triples from unstructured data streams. We employ statistical methods in combination with deduplication, disambiguation and unsupervised as well as supervised machine learning techniques to create a knowledge base that reflects the content of the input streams. We evaluate a sample of the RDF we generate against a large corpus of news streams and show that we achieve a precision of more than 85%.
-
Secure Manipulation of Linked Data
,
Sabrina Kirrane,Ahmed Abdelrahman,Alessandra Mileo and Stefan Decker
,
241-256
,
[OpenAccess]
,
[Publisher]
When it comes to publishing data on the web, the level of access control required (if any) is highly dependent on the type of content exposed. Up until now RDF data publishers have focused on exposing and linking public data. With the advent of SPARQL 1.1, the linked data infrastructure can be used, not only as a means of publishing open data but also, as a general mechanism for managing distributed graph data. However, such a decentralised architecture brings with it a number of additional challenges with respect to both data security and integrity. In this paper, we propose a general authorisation framework that can be used to deliver dynamic query results based on user credentials and to cater for the secure manipulation of linked data. Specifically we describe how graph patterns, propagation rules, conflict resolution policies and integrity constraints can together be used to specify and enforce consistent access control policies.
-
Semantic Message Passing for Generating Linked Data from Tables
,
Varish Mulwad,Tim Finin and Anupam Joshi
,
353-368
,
[OpenAccess]
,
[Publisher]
We describe work on automatically inferring the intended meaning of tables and representing it as RDF linked data, making it available for improving search, interoperability and integration. We present implementation details of a joint inference module that uses knowledge from the linked open data (LOD) cloud to jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns. We also implement a novel Semantic Message Passing algorithm which uses LOD knowledge to improve existing message passing schemes. We evaluate our implemented techniques on tables from the Web and Wikipedia.
-
Semantic Rule Filtering for Web-Scale Relation Extraction
,
Andrea Moro,Hong Li,Sebastian Krause,Feiyu Xu,Roberto Navigli and Hans Uszkoreit
,
337-352
,
[OpenAccess]
,
[Publisher]
Web-scale relation extraction is a means for building and extending large repositories of formalized knowledge. This type of automated knowledge building requires a decent level of precision, which is hard to achieve with automatically acquired rule sets learned from unlabeled data by means of distant or minimal supervision. This paper shows how precision of relation extraction can be considerably improved by employing a wide-coverage, general-purpose lexical semantic network, i.e., BabelNet, for effective semantic rule filtering. We apply Word Sense Disambiguation to the content words of the automatically extracted rules. As a result a set of relation-specific relevant concepts is obtained, and each of these concepts is then used to represent the structured semantics of the corresponding relation. The resulting relation-specific subgraphs of BabelNet are used as semantic filters for estimating the adequacy of the extracted rules. For the seven semantic relations tested here, the semantic filter consistently yields a higher precision at any relative recall value in the high-recall range.
-
Simplified OWL Ontology Editing for the Web: Is WebProtégé Enough?
,
Matthew Horridge,Tania Tudorache,Jennifer Vendetti,Csongor Nyulas,Mark Musen and Natasha F. Noy
,
193-208
,
[OpenAccess]
,
[Publisher]
Ontology engineering is a task that is notorious for its difficulty. As the group that developed Protégé, the most widely used ontology editor, we are keenly aware of how difficult the users perceive this task to be. In this paper, we present the new version of WebProtégé that we designed with two main goals in mind: (1) create a tool that will be easy to use while still accounting for commonly used OWL constructs; (2) support collaboration and social interaction around distributed ontology editing as part of the core tool design. We designed this new version of the WebProtégé user interface empirically, by analysing the use of OWL constructs in a large corpus of publicly available ontologies. Since the beta release of this new WebProtégé interface in January 2013, our users from around the world have created and uploaded 519 ontologies on our server. In this paper, we describe the key features of the new tool and our empirical design approach. We evaluate language coverage in WebProtégé by assessing how well it covers the OWL constructs that are present in ontologies that users have uploaded to WebProtégé. We evaluate the usability of WebProtégé through a usability survey. Our analysis validates our empirical design, suggests additional language constructors to explore, and demonstrates that an easy-to-use web-based tool that covers most of the frequently used OWL constructs is sufficient for many users to start editing their ontologies.
-
Simplifying Description Logic Ontologies
,
Nadeschda Nikitina and Sven Schewe
,
401-416
,
[OpenAccess]
,
[Publisher]
We discuss the problem of minimizing TBoxes expressed in the light-weight description logic E L, which forms a basis of some large ontologies like SNOMED, Gene Ontology, NCI and Galen. We show that the minimization of TBoxes is intractable (NP-complete). While this looks like a bad news result, we also provide a heuristic technique for minimizing TBoxes. We prove the correctness of the heuristics and show that it provides optimal results for a class of ontologies, which we define through an acyclicity constraint over a reference relation between equivalence classes of concepts. To establish the feasibility of our approach, we have implemented the algorithm and evaluated its effectiveness on a small suite of benchmarks.
-
Statistical Knowledge Patterns: Identifying Synonymous Relations in Large Linked Datasets
,
Ziqi Zhang,Anna Lisa Gentile,Eva Blomqvist,Isabelle Augenstein and Fabio Ciravegna
,
687-702
,
[OpenAccess]
,
[Publisher]
The Web of Data is a rich common resource with billions of triples available in thousands of datasets and individual Web documents created by both expert and non-expert ontologists. A common problem is the imprecision in the use of vocabularies: annotators can misunderstand the semantics of a class or property or may not be able to find the right objects to annotate with. This decreases the quality of data and may eventually hamper its usability over large scale. This paper describes Statistical Knowledge Patterns (SKP) as a means to address this issue. SKPs encapsulate key information about ontology classes, including synonymous properties in (and across) datasets, and are automatically generated based on statistical data analysis. SKPs can be effectively used to automatically normalise data, and hence increase recall in querying. Both pattern extraction and pattern usage are completely automated. The main benefits of SKPs are that: (1) their structure allows for both accurate query expansion and restriction; (2) they are context dependent, hence they describe the usage and meaning of properties in the context of a particular class; and (3) they can be generated offline, hence the equivalence among relations can be used efficiently at run time.
-
TRM – Learning Dependencies between Text and Structure with Topical Relational Models
,
Veli Bicer,Thanh Tran and Yongtao Ma
,
1-16
,
[OpenAccess]
,
[Publisher]
Text-rich structured data become more and more ubiquitous on the Web and on the enterprise databases by encoding heterogeneous structural information between entities such as people, locations, or organizations and the associated textual information. For analyzing this type of data, existing topic modeling approaches, which are highly tailored toward document collections, require manually-defined regularization terms to exploit and to bias the topic learning towards structure information. We propose an approach, called Topical Relational Model, as a principled approach for automatically learning topics from both textual and structure information. Using a topic model, we can show that our approach is effective in exploiting heterogeneous structure information, outperforming a state-of-the-art approach that requires manually-tuned regularization.
-
TRank: Ranking Entity Types Using the Web of Data
,
Alberto Tonon,Michele Catasta,Gianluca Demartini,Philippe Cudré-Mauroux and Karl Aberer
,
625-640
,
[OpenAccess]
,
[Publisher]
Much of Web search and browsing activity is today centered around entities. For this reason, Search Engine Result Pages (SERPs) increasingly contain information about the searched entities such as pictures, short summaries, related entities, and factual information. A key facet that is often displayed on the SERPs and that is instrumental for many applications is the entity type. However, an entity is usually not associated to a single generic type in the background knowledge bases but rather to a set of more specific types, which may be relevant or not given the document context. For example, one can find on the Linked Open Data cloud the fact that Tom Hanks is a person, an actor, and a person from Concord, California. All those types are correct but some may be too general to be interesting (e.g., person), while other may be interesting but already known to the user (e.g., actor), or may be irrelevant given the current browsing context (e.g., person from Concord, California). In this paper, we define the new task of ranking entity types given an entity and its context. We propose and evaluate new methods to find the most relevant entity type based on collection statistics and on the graph structure interconnecting entities and types. An extensive experimental evaluation over several document collections at different levels of granularity (e.g., sentences, paragraphs, etc.) and different type hierarchies (including DBPedia, Freebase, and schema.org) shows that hierarchy-based approaches provide more accurate results when picking entity types to be displayed to the end-user while still being highly scalable.
-
The Combined Approach to OBDA: Taming Role Hierarchies using Filters
,
Carsten Lutz,Inanc Seylan,David Toman and Frank Wolter
,
305-320
,
[OpenAccess]
,
[Publisher]
The basic idea of the combined approach to query answering in the presence of ontologies is to materialize the consequences of the ontology in the data and then use a limited form of query rewriting to deal with infinite materializations. While this approach is efficient and scalable for ontologies that are formulated in the basic version of the description logic DL-Lite, it incurs an exponential blowup during query rewriting when DL-Lite is extended with the popular role hierarchies. In this paper, we show how to replace the query rewriting with a filtering technique. This is natural from an implementation perspective and allows us to handle role hierarchies without an exponential blowup. We also carry out an experimental evaluation that demonstrates the scalability of this approach.
-
The Logic of Extensional RDFS
,
Enrico Franconi,Claudio Gutierrez,Alessandro Mosca,Giuseppe Pirrò and Riccardo Rosati
,
97-112
,
[OpenAccess]
,
[Publisher]
The normative version of RDF Schema (RDFS) gives non-standard (intensional) interpretations to some standard notions such as classes and properties, thus departing from standard set-based semantics. In this paper we develop a standard set-based (extensional) semantics for the RDFS vocabulary while preserving the simplicity and computational complexity of deduction of the intensional version. This result can positively impact current implementations, as reasoning in RDFS can be implemented following common set-based intuitions and be compatible with OWL extensions.
-
Towards Constructive Evidence of Data Flow-oriented Web Service Composition
,
Freddy Lecue
,
289-304
,
[OpenAccess]
,
[Publisher]
Automation of service composition is one of the most interesting challenges facing the Semantic Web and the Web of services today. Despite approaches which are able to infer a partial order of services, its data flow remains implicit and difficult to be automatically generated. Enhanced with formal representations, the semantic links between output and input parameters of services can be then exploited to infer their data flow. This work addresses the problem of effectively inferring data flow between services based on their representations. To this end, we introduce the non standard Description Logic reasoning join, aiming to provide a “constructive evidence” of why services can be connected and how non trivial links (many to many parameters) can be inferred in data flow. The preliminary evaluation provides evidence in favor of our approach regarding the completeness of data flow.
-
Towards an automatic creation of localized versions of DBpedia
,
Alessio Palmero Aprosio,Claudio Giuliano and Alberto Lavelli
,
481-496
,
[OpenAccess]
,
[Publisher]
DBpedia is a large-scale knowledge base that exploits Wikipedia as primary data source. The extraction procedure requires to manually map Wikipedia infoboxes into the DBpedia ontology. Thanks to crowdsourcing, a large number of infoboxes has been mapped in the English DBpedia. Consequently, the same procedure has been applied to other languages to create the localized versions of DBpedia. However, the number of accomplished mappings is still small and limited to most frequent infoboxes. Furthermore, mappings need maintenance due to the constant and quick changes of Wikipedia articles. In this paper, we focus on the problem of automatically mapping infobox attributes to properties into the DBpedia ontology for extending the coverage of the existing localized versions or building from scratch versions for languages not covered in the current version. The evaluation has been performed on the Italian mappings. We compared our results with the current mappings on a random sample re-annotated by the authors. We report results comparable to the ones obtained by a human annotator in term of precision, but our approach leads to a significant improvement in recall and speed. Specifically, we mapped 45,978 Wikipedia infobox attributes to DBpedia properties in 14 different languages for which mappings were not yet available. The resource is made available in an open format.
-
Type Inference on Noisy RDF Data
,
Heiko Paulheim and Christian Bizer
,
497-512
,
[OpenAccess]
,
[Publisher]
Type information is very valuable in knowledge bases. However, most large open knowledge bases are incomplete with respect to type information, and, at the same time, contain noisy and incorrect data. That makes classic type inference by reasoning difficult. In this paper, we propose the heuristic link-based type inference mechanism SD-Type, which can handle noisy and incorrect data. Instead of leveraging T-box information from the schema, SDType takes the actual use of a schema into account and thus is also robust to misused schema elements.
-
What's in a 'nym'? Synonyms in Biomedical Ontology Matching
,
Catia Pesquita,Cosmin Stroe,Daniel Faria,Emanuel Santos,Isabel Cruz and Francisco Couto
,
513-528
,
[OpenAccess]
,
[Publisher]
To bring the Life Sciences domain closer to a Semantic Web realization it is fundamental to establish meaningful relations between biomedical ontologies. The successful application of ontology matching techniques is strongly tied to an effective exploration of the complex and diverse biomedical terminology contained in biomedical ontologies. In this paper, we present an overview of the lexical components of several biomedical ontologies and investigate how different approaches for their use can impact the performance of ontology matching techniques. We propose novel approaches for exploring the different types of synonyms encoded by the ontologies and for extending them based both on internal synonym derivation and on external ontologies. We evaluate our approaches using AgreementMaker, a successful ontology matching platform that implements several lexical matchers, and apply them to a set of four benchmark biomedical ontology matching tasks. Our results demonstrate the impact that an adequate consideration of ontology synonyms can have on matching performance, and validate our novel approach for combining internal and external synonym sources as a competitive and in many cases improved solution for biomedical ontology matching.
-
A Distributed Reasoning Platform to Preserve Energy in Wireless Sensor Networks
,
Femke Ongenae,Stijn Verstichel,Maarten Wijnants and Filip De Turck
A distributed reasoning platform is presented to reduce the energy consumption of Wireless Sensor Networks (WSNs) offering geospatial services by minimizing the amount of wireless communication. It combines local, rule-based reasoning on the sensors and gateways with global, ontology-based reasoning on the back-end servers. The Semantic Sensor Network (SNN) Ontology was extended to model the WSN energy consumption. Two prototypes are presented: the Personal Parking Assistant (PPA) and Garbage Bin Tampering Monitor (GBTM).
-
A Hybrid Natural Language Approach to Manage Semantic Interoperability for Public Health Analytics
,
Maxime Lavigne,Arash Shaban-Nejad,Anya Okhmatovskaia,Luke Mondor and David L. Buckeridge
This paper discusses the integration of an ontology with a natural language query engine to calculate and interpret epidemiological indicators for population health assessments. In this paper, we discuss the application of this approach to one type of possible query, which retrieves health determinants, causally associated with diabetes mellitus.
-
A Machine Reader for the Semantic Web
,
Aldo Gangemi,Valentina Presutti,Francesco Draicchio and Andrea Giovanni Nuzzolese
FRED is a machine reading tool for converting text into internally well-connected and quality linked-data-ready ontologies in web-service-acceptable time. It implements a novel approach for ontology design from natural language sentences, combining Discourse Representation Theory (DRT), linguistic frame semantics, and Ontology Design Patterns (ODP). The tool is based on Boxer which implements a DRT-compliant deep parser. The logical output of Boxer is enriched with semantic data from VerbNet (or FrameNet) frames and transformed into RDF/OWL by means of a mapping model and a set of heuristics following best practices of OWL ontologies and RDF data design. The current version of the tool includes Earmark-based markup, and enrichment with WSD and NER off-the-shelf components.
-
A Protein Annotation Framework Empowered with Semantic Reasoning
,
Jemma Wu,Edmond Breen,Xiaomin Song,Brett Cooke and Mark Molloy
This paper presents an association discovery framework for proteins based on semantic annotations from biomedical literature. An automatic ontology-based annotation method is used to create a semantic protein annotation knowledge base. A semantic reasoning service enables realisation reasoning on original annotations to infer more accurate associations and executes semantic query transformation. A case study on protein-disease association discovery on a real-world colorectal cancer dataset is presented.
-
A Restful Interface for RDF Stream Processors
,
Marco Balduini and Emanuele Della Valle
This poster proposes a minimal, backward compatible and combinable restful interface for RDF Stream Engine.
-
A Search Interface for Researchers to Explore Affinities in a Linked Data Knowledge Base
,
Laurens De Vocht,Selver Softic,Erik Mannens,Rik Van de Walle and Martin Ebner
Research information is widely available on the Web. Both as peer-reviewed research publications or as resources shared via (micro)blogging platforms or other Social Media. Usually the platforms supporting this information exchange have an API that allows access to the structured content. This opens a new way to search and explore research information. In this paper, we present an approach that visualizes interactively an aligned knowledge base of these resources. We show that visualizing resources, such as conferences, publications and proceedings, expose affnities between researchers and those resources. We characterize each affinity, between researchers and resources, by the amount of shared interests and other commonalities.
-
A Study on the Correspondence between FCA and $\mathcal{ELI}$ Ontologies
,
Melisachew Wudage Chekol and Amedeo Napoli
The description logic $\mathcal{EL}$ has been used to support ontology design in various domains, and especially in biology and medecine. $\mathcal{EL}$ is known for its efficient reasoning and query answering capabilities. By contrast, ontology design and query answering can be supported and guided within an FCA framework. Accordingly, in this paper, we propose a formal transformation of $\mathcal{ELI}$ (an extension of $\mathcal{EL}$ with \textit{inverse roles}) ontologies into an FCA framework, i.e. $K_\mathrm{\mathcal{ELI}}$, and we provide a formal characterization of this transformation. Then we show that SPARQL query answering over $\mathcal{ELI}$ ontologies can be reduced to lattice query answering over $K_\mathrm{\mathcal{ELI}}$ concept lattices. This simplifies the query answering task and shows that some basic semantic web tasks can be improved when considered from an FCA perspective.
-
A user interface to build interactive visualizations for the semantic web
,
Miguel Ceriani,Paolo Bottoni and Simona Valentini
While the web of linked data gets increasingly richer in size and complexity, its use is still constrained by the lack of applications consuming this data. We propose a Web-based tool to build and execute complex applications to transform, integrate and visualize Semantic Web data. Applications are composed as pipelines of a few basic components and completely based on Semantic Web standards, including SPARQL Construct for data transformation and SPARQL Update for state transition. The main novelty of the approach lays in the support to interaction, through the availability of user interface event streams as pipeline inputs.
-
ActiveRaUL: A Web form-based User Interface to create and maintain RDF data
,
Anila Sahar Butt,Armin Haller,Shepherd Liu and Lexing Xie
With the advent of Linked Data the amount of automatically generated machine-readable data on the Web, often obtained by means of mapping relational data to RDF, has risen significantly. However, manually created, quality-assured and crowd-sourced data based on ontologies, is not available in the quantities that would realise the full potential of the semantic Web. One of the barriers for semantic Web novices to create machine-readable data, is the lack of easy-to-use Web publishing tools that separate the schema modelling from the data creation. In this demonstration we present ActiveRaUL, a Web service that supports the automatic generation of Web form-based user interfaces from any input ontology. The resulting Web forms are unique in supporting users, inexperienced in semantic Web technologies, to create and maintain RDF data modelled according to an ontology. We report on a use case based on the Sensor Network Ontology that supports the viability of our approach.
-
Adding Time to Linked Data: A Generic Memento proxy through PROV
,
Miel Vander Sande,Sam Coppens,Ruben Verborgh,Erik Mannens and Rik Van de Walle
Linked Data resources change rapidly over time, making a valid consistent state difficult. As a solution, the Memento framework offers content negotiation in the datetime dimension. However, due to a lack of formally described versioning, every server needs a costly custom implementation. In this poster paper, we exploit published provenance of Linked Data resources to implement a generic Memento servics. Based on the w3c prov standard, we propose a loosely coupled architecture that offers a Memento interface to any Linked Data service publishing provenance.
-
An FCA Framework for Knowledge Discovery in SPARQL Query Answers
,
Melisachew Wudage Chekol and Amedeo Napoli
Formal concept analysis (FCA) is used for knowledge discovery within data. In FCA, concept lattices are very good tools for classification and organization of data, hence, they enable the user to visualize the answers of its SPARQL query as concept lattices instead of the usual answer formats such as: RDF/XML, JSON, CSV, and HTML. Consequently, in this work, we apply FCA to reveal hidden relations within SPARQL query answers by means of concept lattices.
-
Assisted Policy Management for SPARQL Endpoints Access Control
,
Luca Costabello,Serena Villata,Iacopo Vagliano and Fabien Gandon
Shi3ld is a context-aware authorization framework for protecting SPARQL endpoints. It assumes the definition of access policies using RDF and SPARQL, and the specification of named graphs to identify the protected resources. These assumptions lead to the incapability for users who are not familiar with such languages and technologies to use the authorization framework. In this paper, we present a graphical user interface to support dataset administrators to define access policies and the target elements protected by such policies.
-
Best-effort Linked Data Query Processing with time constraints using ADERIS-Hybrid
,
Steven Lynden,Isao Kojima,Akiyoshi Matono and Akihito Nakamura
Answering SPARQL queries over the Web of Linked Data is a challenging problem. Approaches based on distributed query processing provide up-to-date results but can suffer from delayed response times, indexing-based approaches provide fast response times but results can be out-of-date and the costs of indexing the growing Web of Linked Data are potentially huge. Hybrid approaches try to offer the best of both. In this demo paper we describe a system for answering SPARQL queries within fixed time constraints by accessing SPARQL endpoints and the Web of Linked Data directly.
-
CEDAR: a Fast Taxonomic Reasoner Based on Lattice Operations
,
Samir Amir and Hassan Aït-Kaci
Taxonomy classification and query answering are the core reasoning services provided by most of the Semantic Web (SW) reasoners. However, the algorithms used by those reasoners are based on Tableau method or Rules. These well-known methods in the literature have already shown their limitations for large-scale reasoning.In this demonstration, we shall present the CEDAR system for classifying and reasoning on very large taxonomies using a technique based on lattice operations. This technique makes the CEDAR reasoner perform on a par with the best systems for concept classification and several orders-of-magnitude more efficiently in terms of response time for query-answering. The experiments were carried out using very large taxonomies (Wikipedia: 111599 sorts, MESH: 286381 sorts, NCBI: 903617 sorts and Biomodels: 182651 sorts). The results achieved by CEDAR were compared to those obtained by well-known Semantic Web reasoners, namely FaCT++, Pellet, HermiT, TrOWL, SnoRocket and RacerPro.
-
Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications
,
Bernardo Pereira Nunes,Besnik Fetahu,Stefan Dietze and Marco Antonio Casanova
Cite4Me is a Web application that leverages Semantic Web technologies to provide a new perspective on search and retrieval of bibliographical data. The Web application presented in this work focuses on: (i) semantic recommendation of papers; (ii) novel semantic search & retrieval of papers; (iii) data interlinking of bibliographical data with related data sources from LOD; (iv) innovative user interface design; and (v) sentiment analysis of extracted paper citations. Finally, as this work also targets some educational aspects, our application provides an in-depth analysis of the data that guides a user on his research field.
-
Comparing ontologies with ecco
,
Rafael S. Gonçalves,Bijan Parsia and Uli Sattler
The detection and presentation of changes between OWL ontologies (in the form of a diff) is an important service for ontology engineering, being an active research topic. We present here a diff tool that incorporates structural and semantic techniques in order to, firstly, distinguish effectual and ineffectual changes between ontologies and, secondly, align and categorise those changes according to their impact. Such a categorisation of changes is shown to facilitate the navigation through, and analysis of change sets. The tool is made available as a web-based application, as well as a standalone command-line tool. Both of these output an XML change set file and a transformation into HTML, which allows users to browse through and focus on those changes of utmost interest using any web-browser.
-
Context Aware Sensor Configuration Model for Internet of Things
,
Charith Perera,Arkady Zaslavsky,Michael Compton,Peter Christen and Dimitrios Georgakopoulos
We propose a Context Aware Sensor Configuration Model (CASCoM) to address the challenge of automated context-aware configuration of filtering, fusion, and reasoning mechanisms in IoT middleware according to the problems at hand. We incorporate semantic technologies in solving the above challenges.
-
Coordinating Social Care and Healthcare using Semantic Web Technologies
,
Spyros Kotoulas,Vanessa Lopez,Martin Stephenson,Pierpaolo Tommasi,Wei Jia Shen,Gang Hu,Marco Luca Sbodio,Veli Bicer,Anastasios Kementsietsidis,M. Mustafa Rafique,Jason Ellis,Thomas Erickson,Kavitha Srinivas,Kevin McAuliffe,Guo Tong Xie and Pol Mac Aonghusa
Social care and Healthcare are unique in terms of cultural importance, economic size and domain complexity. Combining information systems from both domains poses unique scientific and technical challenges with regard to information representation, access, integration and retrieval granularity. We present a semantics-based approach that is uniquely positioned to access information across domains using a combination of business rules and contextual exploration. A proof of concept illustrates that semantic technologies can cope in a scenario where traditional data integration approaches are too costly and reduce the addressable information space.
-
Curating Semantic Linked Open Datasets for Software Engineering
,
Kavi Mahesh,Aparna Nagarajan,Apoorva Rao Balevalachilu and Karthik Rajendra Prasad
A typical software engineer spends a significant amount of time and effort reading technical manuals to find answers to questions especially those related to features, versions, compatibilities and dependencies of software and hardware components, languages, standards, modules, libraries and products. It is currently not possible to provide a semantic solution to their problem primarily due to the non-availability of comprehensive semantic datasets in the domains of information technology. In this work, we have extracted, integrated and curated a linked open dataset (LOD) called LOaD-IT exclusively on this domain from a variety of sources including other LODs such as Freebase and DBPedia, technical documentation such as JavaDocs and others. Further, we have built a technical helpdesk system using a semantic query engine that derives answers from LOaD-IT. Our system demonstrates how productivity of the software engineer can be improved by eliminating the need to read through lengthy technical manuals. We expect LOaD-IT to become more comprehensive in the future and to find other related practical applications.
-
D-SPARQ: Distributed, Scalable and Efficient RDF Query Engine
,
Raghava Mutharaju,Sherif Sakr,Alessandra Sala and Pascal Hitzler
We present D-SPARQ, a distributed RDF query engine that combines the MapReduce processing framework with a NoSQL distributed data store, MongoDB. The performance of processing SPARQL queries mainly depends on the efficiency of handling the join operations between the RDF triple patterns. Our system features two unique characteristics that enable efficiently tackling this challenge: 1) Identifying specific patterns of the input queries that enable improving the performance by running different parts of the query in a parallel mode. 2) Using the triple selectivity information for reordering the individual triples of the input query within the identified query patterns. The preliminary results demonstrate the scalability and efficiency of our distributed RDF query engine.
-
DRETa: Extracting RDF from Wikitables
,
Emir Muñoz,Aidan Hogan and Alessandra Mileo
Tables are widely used in Wikipedia articles to display relational information – they are inherently concise and information rich. However, aside from info-boxes, there are no automatic methods to exploit the integrated content of these tables. We thus present DRETa: a tool that uses DBpedia as a reference knowledge-base to extract RDF triples from generic Wikipedia tables.
-
Demo: Swip, a Semantic Web Interface using Patterns
,
Camille Pradel,Ollivier Haemmerlé and Nathalie Hernandez
Our purpose is to provide end-users a means to query ontology based knowledge bases using natural language queries and thus hide the complexity of formulating a query expressed in a graph query language such as SPARQL. The main originality of our approach lies in the use of query patterns. Our contribution is materialized in a system named SWIP, standing for Semantic Web Interface Using Patterns. The demo will present use cases of this system.
-
Demonstrating The Entity Registry System: Implementing 5-Star Linked Data Without the Web
,
Marat Charlaganov,Philippe Cudré-Mauroux,Christian Dinu,Christophe Guéret,Martin Grund and Teodor Macicas
Linked Data applications often assume that connectivity to data repositories and entity resolution services are always available. This may not be a valid assumption in many cases. Indeed, there are about 4.5 billion people in the world who have no or limited Web access. Many data-driven applications may have a critical impact on the life of those people, but are inaccessible to those populations due to the architecture of today's data registries. In this demonstration, we show a new open-source system that can be used as a general-purpose entity registry suitable for deployment in poorly-connected or ad-hoc environments.
-
Demonstration: Semantic Web Enabled Smart Farm with GSN
,
Raj Gaire,Laurent Lefort,Michael Compton,Gregory Falzon,David Lamb and Kerry Taylor
GSN is an open source middleware for managing data produced by sensors deployed in a sensor network. We have extended and enhanced GSN to enable (i) semantically aware preparation, exchange and processing of the data (ii) user specified event processing for alerts and (iii) associate sensor data to 'things'. Here, we demonstrate our smart farm as a use case of a semantically aware sensor network for better integration of sensor data.
-
Denoting Data in the Grounded Annotation Framework
,
Marieke Van Erp,Antske Fokkens,Piek Vossen,Sara Tonelli,Willem Robert Van Hage,Luciano Serafini,Rachele Sprugnoli and Jesper Hoeksema
Semantic web applications are integrating data from more and more different types of sources about events. However, most data annotation frameworks do not translate well to semantic web. We present the grounded annotation framework (GAF), a two-layered framework that aims to build a bridge between mentions of events in a data source such as a text document and their formal representation as instance}. By choosing a two-layered approach, neither the mention layer, nor the semantic layer needs to compromise on what can be represented. We demonstrate the strengths of GAF in flexibility and reasoning through a use case on earthquakes in Southeast Asia.
-
DiTTO: Diagrams Transformation inTo OWL
,
Aldo Gangemi and Silvio Peroni
In this paper we introduce DiTTO, an online service that allows one to convert a E/R diagram created through the yEd diagram editor into a proper OWL ontology according to three different conversion strategies.
-
Discoverability of SPARQL Endpoints in Linked Open Data
,
Heiko Paulheim and Sven Hertling
Accessing Linked Open Data sources with query languages such as SPARQL provides more flexible possibilities than access based on derefencerable URIs only. However, discovering a SPARQL endpoint on the fly, given a URI, is not trivial. This paper provides a quantitative analysis on the automatic discoverability of SPARQL endpoints using different mechanisms.
-
Distributed SPARQL Throughput Increase: On the effectiveness of Workload-driven RDF partitioning
,
Cosmin Basca and Abraham Bernstein
The Web of Data (WoD) continues to grow steadily each year. At over 31 billion triples in 2011, querying this globally distributed data space poses several scalability challenges. One critical aspect when processing distributed SPARQL queries is given by the number and type of distributed joins needed. Traditionally, query optimizers alleviate this issue by attempting to find an optimal query plan assuming a given and fixed data distribution. Discarding this fixed data partitioning assumption, offers the opportunity to create a data distribution that minimizes the number of distributed joins. Recent research focused on data- and query-driven partitioning strategies for both RDF and relational data. In this paper we propose a novel and naive workload-driven approach to data partitioning and investigate the impact of various critical factors on the number of resulting distributed joins. In a preliminary experiment we empirically compare our method to traditional partitioning strategies using a DBpedia query log of 400’000 queries and observe that it can produce up to 50% less distributed joins than an expert (manual) partitioning scheme, 45% less than partitioning based on hashing by subject and up to 83% less distributed joins than just random assignment.
-
Do it yourself (DIY) Jeopardy QA System
,
Andre Freitas and Edward Curry
This work demonstrates Treo, a framework which converges elements from Natural Language Processing, Semantic Web, Information Retrieval and Databases, to create a semantic search engine and question answering (QA) system for heterogeneous data. Jeopardy and Question Answering queries over open domain structured and unstructured data are used to demonstrate the approach. In this work, Treo is extended to cope with unstructured data in addition to structured data. The setup of the framework is done in 3 steps and can be adapted to other datasets by practitioners in a simple DIY process.
-
Editing R2RML Mappings Made Easy.
,
Kunal Sengupta,Peter Haase,Michael Schmidt and Pascal Hitzler
The new W3C standard R2RML\footnote{See: http://www.w3.org/TR/r2rml/} defines a language for expressing mappings from relational databases to RDF, allowing applications built on top of the W3C Semantic Technology stack to seamlessly integrate relational data. A major obstacle in using R2RML, though, is the creation and maintenance of mappings. In this demo, we present a novel R2RML mapping editor, which provides a user interface to create and edit mappings interactively. Hiding the R2RML vocabulary intricacies from the user, the editor enables even non-experts to create R2RML mappings in a guided way, offers immediate feedback by means of integrated preview functionality, and covers all the major language constructs defined in the R2RML standard.
-
Efficient Computation of Relationship-Centrality in Large Entity-Relationship Graphs
,
Stephan Seufert,Srikanta J. Bedathur,Johannes Hoffart,Andrey Gubichev and Klaus Berberich
Given two sets of entities – potentially the results of two queries on aknowledge graph like YAGO or DBpedia – characterizing the relationship betweenthese sets in the form of important people, events and organizations is an analyticstask useful in many domains. In this paper, we present an intuitive and efficientlycomputable vertex centrality measure that captures the importance of a nodewith respect to the explanation of the relationship between the pair of query sets.Using a weighted link graph of entities contained in the English Wikipedia, wedemonstrate the usefulness of the proposed measure.
-
Enriching Concept Search across Semantic Web Ontologies
,
Chetana Gavankar,Vishwajeet Kumar,Yuan-Fang Li and Ganesh Ramakrishnan
Semantic Web ontologies are fast-growing knowledge sources on the Web. Searching relevant concepts from this large repository is a challenging problem. The current Semantic Web search engines provide either (1) coarse-grained search over ontologies or (2) very fine-grained search over individuals. We believe searching and ranking concepts across ontologies provides an ideal granularity for certain tasks such as ontology population and web page annotation. Towards this objective, we propose a novel approach of indexing concepts using ontology axioms in an inverted file structure and ranking them using a dynamic ranking algorithm. Our proposed method is generic and domain-independent. A preliminary evaluation indicates that our proposed method is effective, outperforming the search function of BioPortal, a large and widely-used ontology repository.
-
Explaining Clusters with Inductive Logic Programming and Linked Data
,
Ilaria Tiddi,Mathieu D'Aquin and Enrico Motta
Knowledge Discovery consists in discovering hidden regulari- ties in large amount of data using data mining techniques. The obtained patterns require an interpretation that is usually achieved using some background knowledge given by experts from several domains. On the other hand, the rise of Linked Data has increased the number of con- nected cross-disciplinary knowledge, in the form of RDF datasets, classes and relationships. Here we show how Linked Data can be used in an Inductive Logic Programming process, where they provide background knowledge for finding hypotheses regarding the unrevealed connections between items of a cluster. By using an example with clusters of books, we show how different Linked Data sources can be used to automatically generate rules giving an underlying explanation to such clusters.
-
Exploring Linked Open Data with Tag Clouds
,
Xingjian Zhang,Dezhao Song,Sambhawa Priya and Jeff Heflin
In this paper we present the contextual tag cloud system: a novel application that helps users explore a large scale RDF dataset. Unlike folksonomy tags used in most traditional tag clouds, the tags in our system are ontological terms (classes and properties), and a user can construct a context with a set of tags that defines a subset of instances. Then in the contextual tag cloud, the font size of each tag depends on the number of instances that are associated with that tag and all tags in the context. Each contextual tag cloud serves as a summary of the distribution of relevant data, and by changing the context, the user can quickly gain an understanding of patterns in the data. Furthermore, the user can choose to include RDFS taxonomic and/or domain/range entailment in the calculations of tag sizes, thereby understanding the impact of semantics on the data. The system runs on the BTC2012 dataset with more than 1.4 billion triples from which we extract over 380,000 tags. Several scalability challenges must be overcome in order to achieve a responsive interface.
-
Extending R2RML to a source-independent mapping language for RDF
,
Anastasia Dimou,Miel Vander Sande,Pieter Colpaert,Erik Mannens and Rik Van de Walle
Although reaching the fifth star of the Open Data deployment scheme demands the data to be represented in RDF and linked, a generic and standard mapping procedure to deploy raw data in RDF was not established so far. Only the R2RML mapping language was standardized but its applicability is limited to mappings from relational databases to RDF. We propose the extension of R2RML to also support mappings of data sources in other structured formats. Broadening its scope, the focus is put on the mappings and their optimal reuse. The language becomes source-agnostic, and resources are integrated and interlinked at a primary stage.
-
Finite Models in RDF(S), with datatypes
,
Peter Patel-Schneider and Pat Hayes
The details of reasoning in RDF are generally well known. The model-theoretic characteristcs of RDF have been less studied, particularly when datatypes are added. RDF reasoning can be performed by only considering finite models or pre-models, and sometimes only very small models need be considered.
-
GRAPHIUM: Visualizing Performance of Graph and RDF Engines on Linked Data
,
Alejandro Flores,Guillermo Palma,Maria-Esther Vidal,Domingo De Abreu,Valeria Pestana,Jose Pinero,Jonathan Queipo and Jose Sanchez
We present GRAPHIUM a tool to visualize trends and patterns in the performance of existing graph and RDF engines. We will demonstrate GRAPHIUM and attendees will be able to observe and analyze the performance exhibited by Neo4j, DEX, HypergraphDB and RDF-3x when core graph-based and mining tasks are run against a variety of benchmark of graphs of diverse characteristics.
-
GetThere: A Rural Passenger Information System Utilising Linked Data & Citizen Sensing
,
David Corsar,Peter Edwards,Chris Baillie,Milan Markovic,Konstantinos Papangelis and John Nelson
This demo paper describes a real-time passenger information system based on citizen sensing and linked data.
-
Git2PROV: Exposing Version Control System Content as W3C PROV
,
Tom De Nies,Sara Magliacane,Ruben Verborgh,Sam Coppens,Paul Groth,Erik Mannens and Rik Van de Walle
Data provenance is defined as information about entities, activities and people producing or modifying a piece of data. On the Web, the interchange of standardized provenance of (linked) data is an essential step towards establishing trust. One mechanism to track (part of) the provenance of data, is through the use of version control systems (VCS), such as Git. These systems are widely used to facilitate collaboration primarily for both code and data. Here, we describe a system to expose the provenance stored in VCS in a new standard Web-native format: W3C PROV. This enables the easy publication of VCS provenance on the Web and subsequent integration with other systems that make use of PROV. The system is exposed as a RESTful Web service, which allows integration into user-friendly tools, such as browser plugins.
-
Hunting for Inconsistencies in Multilingual DBpedia with QAKiS
,
Elena Cabrio,Julien Cojan,Serena Villata and Fabien Gandon
QAKiS, a system for open domain Question Answering over linked data, allows to query DBpedia multilingual chapters with natural language questions. But since such chapters can contain different information w.r.t. the English version (e.g. more specificity on certain topics, or fill information gaps), i) different results can be obtained for the same query, and ii) the combination of these query results may lead to inconsistent information about the same topic. To reconcile information obtainedby distributed SPARQL endpoints, an argumentation-based module is integrated into QAKiS to reason over inconsistent information sets, and to provide a unique and motivated answer to the user.
-
IncMap: Pay as you go Matching of Relational Schemata to OWL Ontologies
,
Christoph Pinkel,Carsten Binnig,Evgeny Kharlamov and Peter Haase
Ontology Based Data Access (OBDA) enables access to relational data with a complex structure through ontologies as conceptual domain models. A key component of an OBDA system are mappings between the schematic elements in the ontology and their correspondences in the relational schema. Today, in existing OBDA systems these mappings typically need to be compiled by hand. In this paper we present IncMap, a system that supports a semiautomatic approach for matching relational schemata and ontologies. Our approach is based on a novel matching technique that represents the schematic elements of an ontology and a relational schema in a unified way. Finally, IncMap can extend user-verified mapping suggestions in a pay as you go fashion.
-
Interlinking Multilingual LOD Resources: A Study on Connecting Chinese, Japanese, and Korean Resources Using the Unihan Database
,
Saemi Jang,Satria Hutomo,Soon Gill Hong and Mun Yi
This study proposes a novel method with which Chinese, Japanese, and Korean (CJK) resources on the Web can be effectively matched and connected. The three countries share Chinese characters even though Japan and Korea have their own language. Utilizing the Unihan database, which covers more than 45,000 characters commonly used by the three countries, we show that the proposed method outperforms the traditional method based on string matching in finding similar characters and words used in these countries. The results represent a first step towards overcoming the multilingual barrier in semantically interlinking Asian LOD resources.
-
KbQAS: A Knowledge-based QA System
,
Dat Quoc Nguyen,Dai Quoc Nguyen and Son Bao Pham
In this demo paper, we present the first ontology-based Vietnamese question answeringsystem KbQAS in which a knowledge acquisition approach for question analysis is integrated.
-
Modeling and Reasoning Upon Facebook Privacy Settings
,
Mathieu D'Aquin and Keerthi Thomas
Understanding the way information is propagated and made visible on Facebook is a difficult task. The privacy settings and the rules that apply to individual items are reasonably straightforward. However, for the user to track all of the information that needs to be integrated and the inferences that can be made on their posts is complex, to the extent that it is almost impossible for any individual to achieve. In this demonstration, we investigate the use of knowledge modeling and reasoning techniques (including basic ontological representation, rules and epistemic logics) to make these inferences explicit to the user.
-
Monitoring SPARQL Endpoint Status
,
Pierre-Yves Vandenbussche,Carlos Buil Aranda,Aidan Hogan and Jürgen Umbrich
We demo an online system that tracks the availability of over four-hundred public SPARQL endpoints and makes up-to-date results available to the public. Our demo currently focuses on how often an endpoint is online/offline, but we plan to extend the system to collect metrics about available meta-data descriptions, SPARQL features supported, and performance for generic queries.
-
Network-Aware Workload Scheduling for Scalable Linked Data Stream Processing
,
Lorenz Fischer,Thomas Scharrenbach and Abraham Bernstein
In order to cope with the ever-increasing data volume, distributed stream processing systems have been proposed. To ensure scalability most distributed systems partition the data and distribute the workload among multiple machines. This approach does, however, raise the question how the data and the workload should be partitioned and distributed. A uniform scheduling strategy---a uniform distribution of computation load among available machines---typically used by stream processing systems, disregards network-load as one of the major bottlenecks for throughput resulting in an immense load in terms of inter-machine communication. We propose a graph-partitioning based approach for workload scheduling within stream processing systems.We implemented a distributed triple-stream processing engine on top of the Storm realtime computation framework and evaluate its communication behavior using two real-world datasets. We show that the application of graph partitioning algorithms can decrease inter-machine communication substantially (by 40% to 99%) whilst maintaining an even workload distribution, even using very limited data statistics. We also find that processing RDF data as single triples at a time rather than graph fragments (containing multiple triples), may decrease throughput indicating the usefulness of semantics.
-
NoHR: Querying EL with Non-monotonic rules
,
Vadim Ivanov,Matthias Knorr and Joao Leite
We present NoHR, a Protege plug-in that allows the user to take an EL ontology, add a set of non-monotonic (logic programming) rules - suitable e.g. to express defaults and exceptions - and query the combined knowledge base. Provided the given ontology alone is consistent, the system is capable of dealing with potential inconsistencies between the ontology and the rules, and, after an initial brief pre-processing period utilizing OWL 2 EL reasoner ELK, returns answers to queries at an interactive response time by means of XSB Prolog.
-
ONTOMS2: an Efficient and Scalable ONTOlogy Management System with an Incremental Reasoning
,
Min-Joong Lee,Jong-Ryul Lee,Sangyeon Kim,Myung-Jae Park and Chin-Wan Chung
We present ONTOMS2, an efficient and scalable ONTOlogy Management System with an incremental reasoning. ONTOMS2 stores an OWL document and processes OWL-QL and SPARQL queries. Especially, ONTOMS2 supports SPARQL Update queries with an incremental instance reasoning of inverseOf, symmetric and transitive properties.
-
OU Social: Reaching Students in Social Media
,
Miriam Fernandez,Harith Alani and Stuart Brown
This work describes OU Social, an application that collects and analyses data from public Facebook groups set up by students to discuss particular Open University courses. This application exploits semantic technologies to monitor the behaviour of users over time as well as the topics that emerge from Facebook group discussions. The paper describes the architecture of OU Social and provides a brief overview of the analysis results obtained from 44 different Facebook groups examined over a 6 year period (2007-2013)
-
On the Semantics of R2RML and its Relationship with the Direct Mapping
,
Juan F. Sequeda
The W3C Relational Database to RDF (RDB2RDF) standards are positioned to bridge the gap between Relational Databases and the Semantic Web. The standards consist of two interrelated and complementary specifications: “Direct Mapping of Relational Data to RDF” and “R2RML: RDB to RDF Mapping Language”. In this paper we present initial results on the formal study of the R2RML mapping language by defining its semantics using Datalog. We prove that there are a total of 57 distinct Datalog rules which can be used to generate RDF triples from a relational table. Additionally, we provide insights on the relationship between R2RML and Direct Mapping.
-
Optique 1.0: Semantic Access to Big Data; The Case of Norwegian Petroleum Directorate’s FactPages
,
Evgeny Kharlamov,Martin Giese,Ernesto Jiménez-Ruiz,Martin G. Skjæveland,Ahmet Soylu,Dmitriy Zheleznyakov,Timea Bagosi,Marco Console,Peter Haase,Ian Horrocks,Sarunas Marciuska,Christoph Pinkel,Mariano Rodriguez-Muro,Marco Ruzzi,Kunal Sengupta,Michael Schmidt,Evgenij Thorstensen,Johannes Trame and Arild Waaler
The Optique project aims at developing an end-to-end system for semantic data access to Big Data in industries such as Statoil ASA and Siemens AG. In our demonstration we present the first version of the Optique system customised for the Norwegian Petroleum Directorate's FactPages, a public data available for engineers at Statoil ASA. The system provides different options, including visual, to formulate queries over ontologies and to display query answers. Optique~1.0 offers two installation wizards that allow to extract ontologies from relational schemas, extract and define mappings connecting ontologies and schemas, and align and approximate ontologies. Moreover, the system offers tools to edit these components and highly optimised techniques for query answering.
-
PigSPARQL: A SPARQL Query Processing Baseline for Big Data
,
Alexander Schätzle,Martin Przyjaciel-Zablocki,Thomas Hornung and Georg Lausen
In this paper, we discuss PigSPARQL, a competitive, yet easy to use, SPARQL query processing system based on MapReduce and thus intended for Big Data applications. Instead of a direct mapping, PigSPARQL uses the query language of Pig, a data analysis platform on top of Hadoop, as an intermediate layer between SPARQL and MapReduce. The additional level of abstraction makes our approach independent of the actual Hadoop version. Thus, it is automatically compatible to future changes of the Hadoop framework as they will be neutralized by the Pig layer and allows ad-hoc SPARQL query processing on large RDF graphs out of the box. In the paper we first revisit PigSPARQL and demonstrate PigSPARQL's gain of efficiency simply because switching from version Pig 0.5.0 to Pig 0.11.0. Because of this sustainability, PigSPARQL is an attractive long-term baseline for comparing various MapReduce based SPARQL implementations. This is underlined by PigSPARQL's competitiveness with existing systems, e.g. HadoopRDF.
-
Profiling of Linked Datasets using Structured Descriptions
,
Besnik Fetahu,Stefan Dietze,Bernardo Pereira Nunes,Davide Taibi and Marco Antonio Casanova
While there exists an increasingly large number of Linked Data, metadata about the content covered by individual datasets is sparse. In this paper, we introduce a processing pipeline to automatically assess, annotate and index available linked datasets. Given a minimal description of a dataset from the DataHub, the process produces a structured RDF-based description that includes information about its main topics. Additionally, the generated descriptions embed datasets into an interlinked graph of datasets based on shared topic vocabularies. We adopt and integrate techniques for Named Entity Recognition and automated data validation, providing a consistent workflow for dataset profiling and annotation. Finally, we validate the results obtained with our tool.
-
Publishing Data from the Smithsonian American Art Museum as Linked Open Data
,
Craig Knoblock,Pedro Szekely,Shubham Gupta,Animesh Manglik,Ruben Verborgh and Fengyu Yang
Museums around the world have built databases with meta-data about millions of objects, their history, the people who created them, and the entities they represent. This data is stored in proprietary databases and is not readily available for use. Recently, museums embraced the Semantic Web as a means to make this data available to the world, but the experience so far shows that publishing museum data to the linked data cloud is difficult: the databases are large and complex, the information is richly structured and varies from museum to museum, and it is difficult to link the data to other datasets. We have been collaborating with the Smithsonian American Art Museum to create a set of tools that allow museums and other cultural heritage institutions to publish their data as Linked Open Data. In this demonstration we will show the end-to-end process of starting with the original source data, modeling the data with respect to a ontology of cultural heritage data, linking the data to DBpedia, and then publishing the information as Linked Open Data. Video: http://youtu.be/1Vaytr09H1w
-
Query Suggestion by Concept Instantiation
,
Jack Sun,Franky,Kenny Zhu and Haixun Wang
A class of search queries which contain abstract concepts are studied in this paper. These queries cannot be correctly interpreted by traditional keyword-based search engines. This paper presents a simple framework that detects and instantiates the abstract concepts by their concrete entities or meanings to produce alternate queries that yield better search results.
-
RDFChain: Chain Centric Storage for Scalable Join Processing of RDF Graphs using MapReduce and HBase
,
Pilsik Choi,Jooik Jung and Kyong-Ho Lee
As a massive linked open data is available in RDF, the scalable storage and efficient retrieval using MapReduce have been actively studied. Most of previous researches focus on reducing the number of MapReduce jobs for processing join operations in SPARQL queries. However, the cost of shuffle phase still occurs due to their reduce-side joins. In this paper, we propose RDFChain which supports the scalable storage and efficient retrieval of a large volume of RDF data using a combination of MapReduce and HBase which is NoSQL storage system. Since the proposed storage schema of RDFChain reflects all the possible join patterns of queries, it provides a reduced number of storage accesses depending on the join pattern of a query. In addition, the proposed cost-based map-side join of RDFChain reduces the number of map jobs since it processes as many joins as possible in a map job using statistics.
-
RelClus: Clustering-based Relationship Search
,
Yanan Zhang,Gong Cheng and Yuzhong Qu
Searching and browsing relationships between entities is an important task in many domains. To support users in interactively exploring a large set of relationships, we present a novel relationship search engine called RelClus, which automatically groups search results into a dynamically generated hierarchy with meaningful labels. This hierarchical clustering of relationships exploits their schematic patterns and a similarity measure based on information theory.
-
SEJP: Designing Interactive Scientometrics with Linked Data and Semantic Web Reasoning
,
Grant McKenzie,Krzysztof Janowicz,Yingjie Hu,Kunal Sengupta and Pascal Hitzler
In this demo paper we introduce a Linked Data-driven, Semantically-enabled Journal Portal (SEJP) that offers a variety of interactive scientometrics modules. SEJP allows editors, reviewers, authors, and readers to explore and analyze (meta)data published by a journal. Besides Linked Data created from the journal's internal data, SEJP also links out to other sources and includes them to develop more powerful modules. These modules range from simple descriptive statistics, over the spatial analysis of visitors and authors, to topics trending modules. While SEJP will be available for multiple journals, this paper shows its deployment to the Semantic Web journal by IOS Press. Due to its open & transparent review process, SWJ offers a wide variety of additional information, e.g., about reviewers, editors, paper decisions, and so forth.
-
SILURIAN: a Sparql vIsuaLizer for UndeRstanding querIes And federatioNs
,
Simon Castillo,Guillermo Palma and Maria-Esther Vidal
SPARQL federated queries can be affected by both characteristics of the query and datasets in the federation. We present SILURIAN a Sparql vIsuaLizer for UndeRstanding querIes And federatioNs. SILURIAN visualizes SPARQL queries and, thus, it allows the analysis and understanding of a query complexity with respect to relevant endpoints and shapes of the possible plans.
-
SPACE: SParql index for efficient Auto ComplEtion
,
Kasjen Kraemer,Renata Dividino and Gerd Gröner
Querying Linked Data means to pose queries on various data sources without information about the data and the schema of the data. This demo shows SPACE, a tool to support autocompletion for SPARQL queries. It takes as input SPARQL query logs and builds an index structure for efficient and fast computation of query suggestions. To demonstrate SPACE, we use available query logs from the USEWOD Data Challenge 2013.
-
SemantEco Annotator
,
Patrice Seyed,Timothy Lebo,Evan Patton,Katherine Chastain,Brendan Ashby and Deborah McGuinness
Generating useful RDF linked data is not a straightforward process for scientists using today’s tools. In this paper we introduce the SemantEco Annotator, a semantic web application that leverages community-based vocabularies and ontologies during the translation process itself to ease the process of drawing out implicit relationships in tabular data so that they may be immediately available for use within the LOD cloud. Our goal for the SemantEco Annotator is to make advanced RDF translation techniques available to the layperson.
-
Semantic Enrichment of Mobile Phone Data Records Using Linked Open Data
,
Zolzaya Dashdorj and Luciano Serafini
The pervasivity of mobile phones opens an unprecedented opportunityof deepening into the human dynamics through the analysis of the data they generate.This enables a novel human-driven approach to service creation in a wideset of domains such as health-care, transportation and urban safety. The telecomoperators own and manage billions of mobile network events (like the Call DetailedRecords - CDR) per day: the interpretation of such a big stream of dataneeds a deep understanding of the context where the events have occurred. Theexploitation of available background knowledge is a key element in this scenario.In this paper we introduce a novel method for the semantic interpretation of humanbehavior in mobility based on the merge of the mobile network data streamand the geo-referred available background knowledge. We modeled the humanbehavior making use of the geo and time-referenced knowledge available on theweb (e.g., geo-tagged resources, info on weather forecast, social events, etc.)matching it with the mobile network coverage map. The model is intended tocharacterize the contexts where the mobile network events occur in order to helpin interpreting the behavioral traits that generated by them. This will allow us toachieve a set of predictive tasks such as the prediction of human activities in certain contextual conditions (e.g., when an accident occurs on highway before theworking time starts, etc.), or the characterization of exceptional events detectedfrom anomalies in mobile network data.We created an ontological and stochastic high-level representation behavioralmodel (HRBModel) that maps the human activities to the different contexts.Given the mobile phone network and the geo-tagged resource Openstreetmap,the model is used to rank the activities associated to a particular network event(e.g. a sudden call amount peak) according to their probability. We also describethe design of an experimental evaluation and the preliminary evaluation resultsto measure the performance of the model and to improve the activity predictiontask.
-
Semantic tools for improving software development in open source communities
,
Gregor Leban
Software development communities use different communication channels such as mailing lists, forums and bug tracking systems. These channels are not integrated which makes finding information difficult and inefficient. As a result of the ALERT project we developed a system that is able to collect and annotate information from various communication channels and store it in a single knowledge base. Using the stored knowledge the system can provide users valuable functionalities such as semantic search, finding potential bug duplicates, custom notifications and issue recommendations.
-
SexTant: Visualizing Time-Evolving Linked Geospatial Data
,
Konstantina Bereta,Charalampos Nikolaou,Manos Karpathiotakis,Kostis Kyzirakos and Manolis Koubarakis
The linked open data cloud is constantly evolving as datasets are continuously updated with newer versions. As a result, representing, querying, and visualizing the temporal dimension of linked data is crucial. This is especially important for geospatial datasets that form the backbone of large scale open data publication efforts in many sectors of the economy (the public sector, the Earth observation sector). Although there has been some work on the representation and querying of linked geospatial data that change over time, to the best of our knowledge, there is currently no tool that offers spatio-temporal visualization of such data. In this demo paper we present the system SexTant that addresses this issue. SexTant is a web-based tool that enables the exploration of time-evolving linked geospatial data as well as the creation, sharing, and collaborative editing of "temporally-enriched" thematic maps by combining different sources of geospatial and temporal information.
-
TRT - A Tripleset Recommendation Tool
,
Alexander Arturo Mera Caraballo,Bernardo Pereira Nunes,Giseli Rabello Lopes,Luiz André P. Paes Leme,Marco Antonio Casanova and Stefan Dietze
According to the Linked Data principles, a tripleset should be interlinked with others to take advantage of existing knowledge. However, interlinking is a laborious task. Thus, users interlink their triplesets mostly with data hubs, such as DBpedia and Freebase, ignoring the more specic yet often even more promising triplesets. To alleviate this problem, this paper describes a tripleset interlinking recommendation tool based on link prediction techniques and evaluates the tool on a real-world tripleset repository.
-
The Benefits of Incremental Reasoning in OWL EL
,
Yevgeny Kazakov and Pavel Klinov
This demo will present the advantages of the new, bookkeeping-free method for incremental reasoning in OWL EL on incremental classification of large ontologies. In particular, we will show how a typical experience of a user editing a large ontology can be improved if the reasoner (or ontology IDE) provides the capability of instantaneously re-classifying the ontology in the background mode when a change is made. In addition, we intend to demonstrate how incremental reasoning helps in other tasks such as answering DL queries and computing explanations of entailments. We will use our OWL EL reasoner ELK and its Protege plug-in as the main tools to highlight these benefits.
-
The Empirical Robustness of Description Logic Classification
,
Rafael S. Gonçalves,Nicolas Matentzoglu,Bijan Parsia and Uli Sattler
In spite of the recent renaissance in lightweight description logics (DLs), many prominent DLs, such as that underlying the Web Ontology Language (OWL), have high worst case complexity for their key inference services. Modern reasoners have a large array of optimization, tuned calculi, and implementation tricks that allow them to perform very well in a variety of application scenarios, even though the complexity results ensure that they will perform poorly for some inputs. For users, the key question is how often they will encounter those pathological inputs in practice, that is, how robust are reasoners. We attempt to determine this question for classification of existing ontologies as they are found on the Web. It is a fairly common user task to examine ontologies published on the Web as part of their development process. Thus, the robustness of reasoners in this scenario is both directly interesting and provides some hints toward answering the broader question. From our experiments, we show that the current crop of OWL reasoners, in collaboration, is very robust against the Web.
-
Towards Semantic Annotations of Web Tables
,
Stefan Zwicklbauer,Christoph Einsiedler,Michael Granitzer and Christin Seifert
Web tables comprise a rich source of factual information.However, without semantic annotation of the tables’ content the infor-mation is not usable for automatic integration and search. We propose amethodology to annotate table headers with semantic type informationbased on the content of column’s cells. In our experiments on 50 tableswe achieved an F1 value of 0.55, where the accuracy greatly varies de-pending on the used ontology. Regarding computational complexity wefound out that 94% of the maximal F1 score on average 20 cells (37%)need to be considered. Results suggest that the choice of the ontologyplays a more crucial role for type inference than the number of cells used.
-
Towards the Natural Ontology of Wikipedia
,
Andrea Giovanni Nuzzolese,Aldo Gangemi,Valentina Presutti and Paolo Ciancarini
In this paper we present preliminary results on the extraction of ORA: the Natural Ontology of Wikipedia. ORA is obtained through an automatic process that analyses the natural language definitions of DBpedia entities provided by their Wikipedia pages. Hence, this ontology reflects the richness of terms used and agreed by the crowds, and can be updated periodically according to the evolution of Wikipedia.
-
TripleRush
,
Philip Stutz,Mihaela Verman,Lorenz Fischer and Abraham Bernstein
TripleRush is a parallel in-memory triple store designed to address the need for efficient graph stores that answer queries over large-scale graph data fast. To that end it leverages a novel, graph-based architecture. Specifically, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched queries are routed in parallel along different paths of this index structure. We show experimentally that TripleRush takes about a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two common benchmarks.
-
Using Ontologies to Identify Patients with Diabetes in Electronic Health Records
,
Hairong Yu,Siaw-Teng Liaw,Jane Taggart and Alireza Rahimi
This paper describes a work in progress that explores the applicability of ontology for providing solutions in medical domain. We investigate whether it is feasible to use ontologies and ontology-based data access to automate one of common clinical tasks that are constantly faced by general practitioners but labor intensive and error prone in term of relevant information retrieved from electronic health records. The focus of our study is on improving diabetes patient selection for clinical trials or medical research. The biggest impediment to automating such clinical tasks is the essential requirement of bridging the semantic gaps between existing patient data from electronic health records, such as reasons for visit, chronic conditions and diagnoses from practice notes, pathology tests and prescriptions stored in general practice information systems, and the ways which researchers or general practitioners interpret those records. Our current comprehension is that automation of identifying diabetes patients for clinical or research purposes can be specified systematically as a solution supported by semantic retrieval. We detail the challenges to build a realistic case study, which consists of solving issues related to conceptualization of data and domain context, integration of different datasets, ontology creation based on SNOMED CT-AU® standard, mapping between existing data and ontology, and dilemma of data fitness for research use. Our prototype is based on data which scale to thirteen years of approximate 100,000 anonymous patient records from four general practices in south western Sydney.
-
XLore: A Large-scale English-Chinese Bilingual Knowledge Graph
,
Zhigang Wang,Juanzi Li,Zhichun Wang,Shuangjie Li,Mingyang Li,Dongsheng Zhang,Yao Shi,Yongbin Liu and Jie Tang
Current Wikipedia-based multilingual knowledge bases still suffer the following problems: (i) the scarcity of non-English knowledge, (ii) the noise in the semantic relations and (iii) the limited coverage of equivalent cross-lingual entities. In this demo, we present a large-scale bilingual knowledge graph named XLore, which has adequately solved the above problems.