-
DC Proposal: Automatically transforming keyword queries to SPARQL on large-scale knowledge bases
,
Saeedeh Shekarpour
,
357-364
,
[OpenAccess]
,
[Publisher]
The amount of Linked Data containing statistics is increasing; and so is the need for concepts of consuming it. Yet there are challenges, e.g., heterogeneous ways to describe quality information, mathematical functions, and categorisation hierarchies. In order to automatically, flexibly, and scalable integrate statistical Linked Data for expressive analysis we propose to use Semantic Web ontologies to build and evolve a well-interlinked conceptual model of statistical data for Online Analytical Processing.
-
DC Proposal: Automation of Service lifecycle on the Cloud by using Semantic technologies
,
Karuna P. Joshi
,
285-292
,
[OpenAccess]
,
[Publisher]
Managing virtualized services efficiently over the cloud is an open challenge. We propose a semantically rich, policy-based framework to automate the lifecycle of cloud services. We have divided the IT service lifecycle into the five phases of requirements, discovery, negotiation, composition, and consumption. We detail each phase and describe the high level ontologies that we have developed to describe them. Our research complements previous work on ontologies for service descriptions in that it goes beyond simple matchmaking and is focused on supporting negotiation for the particulars of IT services.
-
DC Proposal: Capturing Knowledge Evolution and Expertise in Community-driven Knowledge Curation Platforms
,
Hasti Ziaimatin
,
381-388
,
[OpenAccess]
,
[Publisher]
Vast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic representation language like RDF. Most current approaches to generating Semantic Web representations from tables requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table's meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. We describe techniques grounded in graphical models and probabilistic reasoning to infer meaning associated with a table. Using background knowledge from the Linked Open Data cloud, we jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns and represent the inferred meaning as graph of RDF triples. A table's meaning is thus captured by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.
-
DC Proposal: Decision support methods in community-driven knowledge curation platforms
,
Razan Paul
,
333-340
,
[OpenAccess]
,
[Publisher]
The study of ontology design patterns (ODPs) is a fairly recent development. Such patterns simplify ontology development by codifying and reusing known best practices, thus lowering the barrier to entry of ontology engineering. However, while ODPs appear to be a promising addition to research and while such patterns are being presented and used, work on patterns as artifacts of their own, i.e. methods of developing, identifying and evaluating them, is still uncommon. Consequently, little is known about what ODP features or characteristics are beneficial or harmful in different ontology engineering situations. The presented PhD project aims to remedy this by studying ODP quality characteristics and what impact these characteristics have on the usability of ODPs themselves and on the suitability of the resulting ontologies.
-
DC Proposal: Enriching Unstructured Media Content About Events to Enable Semi-Automated Summaries, Compilations, and Improved Search by Leveraging Social Networks
,
Thomas Steiner
,
365-372
,
[OpenAccess]
,
[Publisher]
Skeletal dysplasias comprise a group of genetic diseasescharacterized by highly complex, heterogeneous and sparse data. Performingefficient and automated knowledge discovery in this domain poses seriouschallenges, one of the main issues being the lack of a proper formalization.Semantic Web technologies can, however, provide the appropriate means forencoding the knowledge and hence enabling complex forms of reasoning. Weaim to develop decision support methods in the skeletal dysplasia domain byapplying uncertainty reasoning over Semantic Web data. More specifically, wedevise techniques for semi-automated diagnosis and key disease featureinferencing from an existing pool of patient cases – that are shared anddiscussed in the SKELETOME community-driven knowledge curationplatform. The outcome of our research will enable clinicians and researchers toacquire a critical mass of structured knowledge that will sustain a betterunderstanding of these genetic diseases and foster advances in the field.
-
DC Proposal: Evaluating Trustworthiness of Web Content using Semantic Web Technologies
,
Jarutas Pattanaphanchai
,
325-332
,
[OpenAccess]
,
[Publisher]
Since the Linked Data is continuously growing on the Web, the quality of overall data can rapidly degrade over time. The research proposed here deals with the quality assessment in the Linked Data and the temporal linking techniques. First, we conduct an in-depth study of appropriate dimensions and their respectively metrics by defining a data quality methodology as a set of guidelines and techniques that defines a rational process to assess and improve the quality of data. Second, since the linking of entities is used to assess and improve the quality of data, we show that applying decay in similarity computation can already improve over traditional linkage techniques. This paper describes the core problem, presents the proposed approach, reports on initial results, and lists planned future tasks
-
DC Proposal: Graphical Models and Probabilistic Reasoning for Generating Linked Data from Tables
,
Varish Mulwad
,
317-324
,
[OpenAccess]
,
[Publisher]
Vast amounts of information is encoded in tables found in documents, on the Web, and in spreadsheets or databases. Integrating or searching over this information benefits from understanding its intended meaning and making it explicit in a semantic representation language like RDF. Most current approaches to generating Semantic Web representations from tables requires human input to create schemas and often results in graphs that do not follow best practices for linked data. Evidence for a table’s meaning can be found in its column headers, cell values, implicit relations between columns, caption and surrounding text but also requires general and domain-specific background knowledge. We describe techniques grounded in graphical models and probabilistic reasoning to infer meaning associated with a table. Using background knowledge from the Linked Open Data cloud, we jointly infer the semantics of column headers, table cell values (e.g., strings and numbers) and relations between columns and represent the inferred meaning as graph of RDF triples. A table’s meaning is thus captured by mapping columns to classes in an appropriate ontology, linking cell values to literal constants, implied measurements, or entities in the linked data cloud (existing or new) and discovering or and identifying relations between columns.
-
DC Proposal: Knowledge Based Access Control Policy Specification and Enforcement
,
Sabrina Kirrane
,
293-300
,
[OpenAccess]
,
[Publisher]
The explosion of digital content and the heterogeneity of enterprise content sources have resulted in a pressing need for advanced tools and technologies, to support enterprise content search and analysis. Semantic technology and linked data may be the long term solution to this growing problem. Our research explores the application of access control to a knowledge discovery platform. In order to ensure integrated information is only accessible to authorised individuals, existing access control policies need to be associated with the data. Through in-depth analysis we aim to propose an access control model and enforcement framework which can be used to represent and enforce various access models both inside and outside the enterprise. Furthermore, through experimentation we plan to develop a methodology which can be used as a guideline for the lifting of distributed access control policies from the individual data sources to a linked data network.
-
DC Proposal: Model for News Filtering with Named Entities
,
Ivo Lasek
,
309-316
,
[OpenAccess]
,
[Publisher]
Semantic Web technologies are becoming more interleaved withgeospatial databases, which should lead to an easier integration and querying of spatial data. This is fostered by a growing amount of publicly available geospatial data like OpenStreetMap. However, the integration can lead to geographic inconsistencies and contradictions when combining multiple knowledge base. Having the integration in place, users might not issue standard queries and search points-of-interest, but rather interested in regions with specific attributes assigned to them. Though, having large amounts of spatial data available, standard databases and reasoners do not provide the means for (quantitative) spatial queries, or struggle to answer them efficiently. We seek to combine spatial reasoning, (nonmonotonic) logic programming, and ontologies for integrating geospatial databases and Semantic Web technologies. The focus of our investigation will be on a modular design, on efficiently process large amounts of spatial data, and on enabling default reasoning. We propose a two-tier design related to HEX-programs, which should lead to a plausible trade-off between modularity and efficiency. Furthermore, we consider suitable geo-ontologies to semantically annotate and link different geospatial sources. Finally, the findings should lead to a proof-of-concept implementation, which will be tested for efficiency and modularity in artificial and real-world use cases.
-
DC Proposal: Online Analytical Processing of Statistical Linked Data
,
Benedikt Kämpgen
,
301-308
,
[OpenAccess]
,
[Publisher]
The amount of Linked Data containing statistics is increasing; and so is the need for concepts of analysing these statistics. Yet, there are challenges, e.g., discovering datasets, integrating data of different granularities, or selecting mathematical functions. To automatically, flexibly, and scalable integrate statistical Linked Data for expressive and reliable analysis, we propose to use expressive Semantic Web ontologies to build and evolve a well-interlinked conceptual model of statistical data for Online Analytical Processing.
-
DC Proposal: Ontology Learning from Noisy Linked Data
,
Man Zhu
,
373-380
,
[OpenAccess]
,
[Publisher]
Mobile devices like smartphones together with social networks enable people to generate, share, and consume enormous amounts of media content. Common search operations, for example searching for a music clip based on artist name and song title on video platforms such as YouTube, can be achieved both based on potentially shallow human-generated metadata, or based on more profound content analysis, driven by Optical Character Recognition (OCR) or Automatic Speech Recognition (ASR). However, more advanced use cases, such as summaries or compilations of several pieces of media content covering a certain event, are hard, if not impossible to fulfill at large scale. One example of such event can be a keynote speech held at a conference, where, given a stable network connection, media content is published on social networks while the event is still going on.In our thesis, we develop a framework for media content processing, leveraging social networks, utilizing the Web of Data and fine-grained media content addressing schemes like Media Fragments URIs to provide a scalable and sophisticated solution to realize the above use cases: media content summaries and compilations. We evaluate our approach on the entity level against social media platform APIs in conjunction with Linked (Open) Data sources, comparing the current manual approaches against our semi-automated approach. Our proposed framework can be used as an extension for existing video platforms.
-
DC Proposal: PRISSMA, Towards Mobile Adaptive Presentation of the Web of Data
,
Luca Costabello
,
269-276
,
[OpenAccess]
,
[Publisher]
Ontology learning - loosely, the process of knowledge extrac-tion from diverse data sources - provides (semi-) automatic support forontology construction. As the `Web of Linked Data' vision of the Seman-tic Web is coming true, the `explosion' of Linked Data provides morethan sufficient data for ontology learning algorithms in terms of quan-tity. However, with respect to quality, notable issue of noises (e.g.,partialor erroneous data) arises from Linked Data construction. Our doctoralresearches will make theoretical and engineering contribution to ontologylearning approaches for noisy Linked data. More exactly, we will use theapproach of Statistical Relational Learning (SRL) to develop learningalgorithms for the underlying tasks. In particular, we will learn OWLaxioms inductively from Linked Data under probabilistic setting, andanalyze the noises in the Linked Data on the basis of the learned ax-ioms. Finally, we will make the evaluation on proposed approaches withvarious experiments.
-
DC Proposal: Towards Linked Data Assessment and Linking Temporal Facts
,
Anisa Rula
,
341-348
,
[OpenAccess]
,
[Publisher]
Since the Linked Data is continuously growing on the Web, the quality of overall data can rapidly degrade over time. The research proposed here deals with the quality assessment in the Linked Data and the temporal linking techniques. First, we conduct an in-depth study of appropriate dimensions and their respectively metrics by defining a data quality framework that evaluates, along these dimensions, linked published data on the Web. Second, since the assessment and improvement of the Linked Data quality such as accuracy or the resolution of heterogeneities is performed through record linkage techniques, we propose an extended technique that apply time in similarity computation which can improve over traditional linkage techniques. This paper describes the core problem, presents the proposed approach, reports on initial results, and lists planned future tasks.
-
DC Proposal: Towards a Framework for Efficient Query Answering and Integration of Geospatial Data
,
Patrik Schneider
,
349-356
,
[OpenAccess]
,
[Publisher]
Semantic Web technologies are becoming more interleaved with geospatial databases, which should lead to an easier integration and querying of spatial data. This is fostered by a growing amount of publicly available geospatial data like OpenStreetMap. However, the integration can lead to geographic inconsistencies when combining multiple knowledge bases. Having the integration in place, users might not just issue a points-of-interest search, but rather might be interested in regions with specific attributes assigned to them. Though, having large amounts of spatial data available, standard databases and reasoners do not provide the means for (quantitative) spatial queries, or struggle to answer them efficiently. We seek to combine spatial reasoning, (nonmonotonic) logic programming, and ontologies for integrating geospatial databases with Semantic Web technologies. The focus of our investigation will be on a modular design, on efficient processing of large amounts of spatial data, and on enabling default reasoning. We propose a two-tier design related to HEX-programs, which should lead to a plausible trade-off between modularity and efficiency. Furthermore, we consider suitable geo-ontologies to semantically annotate and link different sources. Finally, the findings should lead to a proof-of-concept implementation, which will be tested for efficiency and modularity in artificial and real-world use cases. Background and Problem Statement Fostered by a popular demand for location-aware search applications, linking and querying spatial data has become an active research field. At the same time, governments open up their official datasets for public use, and collaborative projects like OpenStreetMap (OSM) are becoming large sources of spatial data (http://www.openstreetmap.org/). In this context, geospatial databases are the backbone for storing and querying these data. Hence they have been extensively studied by the Geographic Information Systems (GIS) community (cf. [9, 18]). Geospatial databases often have the drawback that querying them is complicated, inference mechanisms are virtually non-existent, and extending them is difficult. In response, Semantic Web technologies are becoming more interleaved with geospatial databases [5, 24], which should lead to an easier integration and querying of spatial Supported by the Austrian Research Promotion Agency (FFG) project P828897, the Marie Curie action IRSES under Grant No. 24761 (Net2), and the EC project OntoRule (IST-2009-231875).
-
DC Proposal: Towards an ODP Quality Model
,
Karl Hammar
,
277-284
,
[OpenAccess]
,
[Publisher]
Expertise modeling has been the subject of extensive research in two main disciplines - Information Retrieval (IR) and Social Network Analysis (SNA). Both IR and SNA techniques build the expertise model through a document-centric approach providing a macro-perspective on the knowledge emerging from large corpus of static documents. With the emergence of the Web of Data, there has been a significant shift from static to evolving documents, characterized by micro-contributions. Thus, the existing macro-perspective is no longer sufficient to track the evolution of both knowledge and expertise. The aim of this research is to provide an all-encompassing, domain-agnostic model for expertise profiling in the context of dynamic, living documents and evolving knowledge bases. Our approach combines: (i) fine-grained provenance, (ii) weighted mappings of Linked Data concepts to expertise profiles, via the application of IR-inspired techniques on micro-contributions, and (iii) collaboration networks - to create and enrich expertise profiles in community-centered environments.
-
Cyber Scientific Test Language
,
Peter Haglich,Robert Grimshaw,Steven Wilder,Marian Nodine and J. Bryan Lyles
,
97-111
,
[OpenAccess]
,
[Publisher]
The Cyber Scientific Method (CSM) is a formalism for experimentation on computer systems, hardware, software, and networks on the National Cyber Range. This formalism provides rigor to cyber tests to ensure that knowledge can be shared and experiment results can be viewed with confidence, knowing exactly what was tested under what conditions. The Cyber Scientific Test Language (CSTL) is an ontology-based language to describe CSM experiments. CSTL descriptions encompass test objectives, statistical experiment design, test network composition, sensor placement, and data analysis and visualization. CSTL expresses information about CSM experiments throughout their lifecycle, from initial test design through detailed test network description, instrumentation and control network augmentation, testbed buildout, data collection, and final analysis. The detailed representation of this information in a formal ontology has several benefits. It supports the use of general-purpose reasoners to query and recombine test specifications to support rapidly building an experiment network and testbed on the range. Additionally, it facilitates knowledge management and retrieval of test procedures and results. Test specifications can be viewed either with custom tools or with general-purpose applications. This paper provides an overview of CSTL and how it is used throughout the cyber test design, execution, analysis, and review process.
-
How to "Make a Bridge to the new Town" using OntoAccess
,
Matthias Hert,Giacomo Ghezzi,Michael Würsch and Harald Gall
,
112-127
,
[OpenAccess]
,
[Publisher]
Business-critical legacy applications often rely on relational databases to sustain daily operations. Introducing Semantic Web technology in newly developed systems is often difficult, as these systems need to run in tandem with their predecessors and cooperatively read and update existing data. A common pattern is to incrementally migrate data from a legacy system to its successor by running the new system in parallel, with a data bridge in between. Existing approaches that can deployed as a data bridge in theory, restrict Semantic Web-enabled applications to read legacy data in practice, disallowing update operations completely. This paper explains how our RDB-to-RDF tool OntoAccess can be used to transition legacy systems into Semantic Web-enabled applications. By means of a case study, we exemplify how we successfully made a bridge between one of our own large-scale legacy applications and its long-term replacement. We elaborate on challenges we faced during the migration process and how we were able to overcome them.
-
KOIOS: Utilizing Semantic Search for Easy-Access and Visualization of Structured Environmental Data
,
Veli Bicer,Thanh Tran,Andreas Abecker and Radoslav Nedkov
,
1-16
,
[OpenAccess]
,
[Publisher]
With the increasing interest in environmental issues, the amount of publicly available environmental data on the Web is continuously growing. Despite its importance, the uptake of environmental information by the ordinary Web users is still very limited due to intransparent access to complex and distributed databases. As a remedy to this problem, in this work, we propose the use of semantic search technologies recently developed as an intuitive way to easily access structured data and lower the barriers to obtain information satisfying users' information need. Our proposed system, namely KOIOS, enables a simple, keyword-based search on structured environmental data and built on top of a commercial Environmental Information System (EIS). A prototype system implemented successfully shows the intuitive means to facilitate search and access to complex environmental information by applying semantic search techniques.
-
Leveraging Community-built Knowledge for Type Coercion in Question Answering
,
Aditya Kalyanpur,J. William Murdock,James Fan and Christopher A. Welty
,
144-156
,
[OpenAccess]
,
[Publisher]
Error
-
Linking Data Across Universities: an integrated video lectures dataset
,
Miriam Fernández,Mathieu d'Aquin and Enrico Motta
,
49-64
,
[OpenAccess]
,
[Publisher]
This paper presents our work and experience interlinking educational information across universities through the use of Linked Data principles and technologies. More specifically this paper is focused on selecting, extracting, structuring and interlinking information of video lectures produced by 27 different educational institutions. For this purpose, selected information from several websites and YouTube channels have been scrapped and structured according to well-known vocabularies, like FOAF1, or the W3C Ontology for Media Resources2. To integrate this information, the extracted videos have been categorized under a common classification space, the taxonomy defined by the Open Directory Project3. An evaluation of this categorization process has been conducted obtaining a 98% degree of coverage and 89% degree of correctness. As a result of this process a new Linked Data dataset has been released containing more than 14.000 video lectures from 27 different institutions and categorized under a common classification scheme.
-
Linking Semantic Desktop Data to the Web of Data
,
Laura Dragan,Renaud Delbru,Tudor Groza,Siegfried Handschuh and Stefan Decker
,
33-48
,
[OpenAccess]
,
[Publisher]
The goal of the Semantic Desktop is to enable better organization of the personal information on our computers, by applying semantic technologies on the desktop. However, information on our desktop is often incomplete, as it is based on our subjective view, or limited knowledge about an entity. On the other hand, the Web of Data contains information about virtually everything, generally from multiple sources. Connecting the desktop to the Web of Data would thus enrich and complement desktop information. Bringing in information from the Web of Data automatically would take the burden of searching for information off the user. In addition, connecting the two networks of data opens up the possibility of advanced personal services on the desktop. Our solution tackles the problems raised above by using a semantic search engine for the Web of Data, such as Sindice, to find and retrieve a relevant subset of entities from the web. We present a matching framework, using a combination of configurable heuristics and rules to compare data graphs, that achieves a high degree of precision in the linking decision. We evaluate our methodology with real-world data; create a gold standard from relevance judgements by experts, and we measure the performance of our system against it. We show that it is possible to automatically link desktop data with web data in an effective way.
-
Mind Your Metadata: Exploiting Semantics for Configuration, Adaptation, and Provenance in Scientific Workflows
,
Yolanda Gil,Pedro A. Szekely,Sandra Villamizar,Thomas C. Harmon,Varun Ratnakar,Shubham Gupta,Maria Muslea,Fabio Silva and Craig A. Knoblock
,
65-80
,
[OpenAccess]
,
[Publisher]
Scientific metadata containing semantic descriptions of scientific data is expensive to capture and is typically not used across entire data analytic processes. We present an approach where semantic metadata is generated as scientific data is being prepared, and then subsequently used to configure models and to customize them to the data. The metadata captured includes sensor descriptions, data characteristics, data types, and process documentation. This metadata is then used in a workflow system to select analytic models dynamically and to set up model parameters automatically. In addition, all aspects of data processing are documented, and the system is able to generate extensive provenance records for new data products based on the metadata. As a result, the system can dynamically select analytic models based on the metadata properties of the data it is processing, generating more accurate results. We show results in analyzing stream metabolism for watershed ecosystem management.
-
Privacy-Aware and Scalable Content Dissemination in Distributed Social Networks
,
Pavan Kapanipathi,Julia Anaya,Amit P. Sheth,Brett Slatkin and Alexandre Passant
,
157-172
,
[OpenAccess]
,
[Publisher]
Error
-
Rule-based OWL Reasoning for specific Embedded Devices
,
Christian Seitz and René Schönfelder
,
237-252
,
[OpenAccess]
,
[Publisher]
Ontologies have been used for formal representation of knowledge for many years now. One possible knowledge representation language for ontologies is the OWL 2 Web Ontology Language, informally OWL2. The OWL specification includes the definition of variants of OWL, with different levels of expressiveness. OWL DL and OWL Lite are based on Description Logics, for which sound, complete, and terminating reasoners exit. Unfortunately, all these reasoners are too complex for embedded systems. But since evaluation of ontologies on these resource constrained devices becomes more and more necessary (e.g. for diagnostics) we developed a OWL reasoner for embedded devices. We use the OWL 2 sublanguage OWL 2 RL, which can be completely implemented using rule-based reasoning engines. In this paper we present the used embedded hardware, the implemented reasoning component, and results regarding performance and memory consumption.
-
SCMS - Semantifying Content Management Systems
,
Axel-Cyrille Ngonga Ngomo,Norman Heino,Klaus Lyko,René Speck and Martin Kaltenböck
,
189-204
,
[OpenAccess]
,
[Publisher]
The migration to the Semantic Web requires from CMS that they integrate humanand machine-readable data to support their seamless integration into the Semantic Web. Yet, there is still a blatant need for frameworks that can be easily integrated into CMS and allow to transform their content into machine-readable knowledge with high accuracy. In this paper, we describe the SCMS (Semantic Content Management Systems) framework, whose main goals are the extraction of knowledge from unstructured data in any CMS and the integration of the extracted knowledge into the same CMS. Our framework integrates a highly accurate knowledge extraction pipeline. In addition, it relies on the RDF and HTTP standards for communication and can thus be integrated in virtually any CMS. We present how our framework is being used in the energy sector. We also evaluate our approach and show that our framework outperforms even commercial software by reaching up to 96% F-score.
-
The MetaLex Document Server - Legal Documents as Versioned Linked Data
,
Rinke Hoekstra
,
128-143
,
[OpenAccess]
,
[Publisher]
This paper introduces the MetaLex Document Server (MDS), an ongoing project to improve access to legal sources (regulations, court rulings) by means of a generic legal XML syntax (CEN MetaLex) and Linked Data. The MDS defines a generic conversion mechanism from legacy legal XML syntaxes to CEN MetaLex, RDF and Pajek network files, and discloses content by means of HTTP-based content negotiation, a SPARQL endpoint and a basic search interface. MDS combines a trans- parent (versioned) and opaque (content-based) naming scheme for URIs of parts of legal texts, allowing for tracking of version information at the URI-level, as well as reverse engineering of versioned metadata from sources that provide only partial information, such as many web-based legal content services. The MDS hosts all current 27 thousand national regulations of the Netherlands, comprising some 87.9M triples.
-
Using Semantic Web Technologies to Build a Community-driven Knowledge Curation Platform for the Skeletal Dysplasia Domain
,
Tudor Groza,Andreas Zankl,Yuan-Fang Li and Jane Hunter
,
81-96
,
[OpenAccess]
,
[Publisher]
In this paper we report on our on-going efforts in building SKELETOME – a community-driven knowledge curation platform for the skeletal dysplasia domain. SKELETOME introduces an ontologydriven knowledge engineering cycle that supports the continuous evolution of the domain knowledge. Newly submitted, undiagnosed patient cases undergo a collaborative diagnosis process that transforms them into well-structured case studies, classified, linked and discoverable based on their likely diagnosis(es). The paper presents the community requirements driving the design of the platform, the underlying implementation details and the results of a preliminary usability study. Because SKELETOME is built on Drupal 7, we discuss the limitations of some of its embedded Semantic Web components and describe a set of new modules, developed to handle these limitations (which will soon be released as open source to the community).
-
Wiki-based conceptual modeling: an experience with the Public Administration
,
Cristiano Casagni,Chiara Di Francescomarino,Mauro Dragoni,Licia Fiorentini,Luca Franci,Matteo Gerosa,Chiara Ghidini,Federica Rizzoli,Marco Rospocher,Anna Rovella,Luciano Serafini,Stefania Sparaco and Alessandro Tabarroni
,
17-32
,
[OpenAccess]
,
[Publisher]
The dematerialization of documents produced within the Public Administration (PA) represents a key contribution that Information and Communication Technology can provide towards the modernization of services within the PA. The availability of proper and precise models of the administrative procedures, and of the specific “entities” related to these procedures, such as the documents involved in the procedures or the organizational roles performing the activities, is an important step towards both (1) the replacement of paper-based procedures with electronic-based ones, and (2) the definition of guidelines and functions needed to safely store, catalogue, manage and retrieve in an appropriate archival system the electronic documents produced within the PA. In this paper we report the experience of customizing a semantic wiki based tool (MoKi) for the modeling of administrative procedures (processes) and their related “entities” (ontologies). The tool has been used and evaluated by several domain experts from different Italian regions in the context of a national project. This experience, and the reported evaluation, highlight the potential and criticality of using semantic wiki-based tools for the modeling of complex domains composed of processes and ontologies in a real setting.
-
Zhishi.me - Weaving Chinese Linking Open Data
,
Xing Niu,Xinruo Sun,Haofen Wang,Shu Rong,Guilin Qi and Yong Yu
,
205-220
,
[OpenAccess]
,
[Publisher]
Linking Open Data (LOD) has become one of the most important community efforts to publish high-quality interconnected semantic data. Such data has been widely used in many applications to provide intelligent services like entity search, personalized recommendation and so on. While DBpedia, one of the LOD core data sources, contains resources described in multilingual versions and semantic data in English is proliferating, there is very few work on publishing Chinese semantic data. In this paper, we present Zhishi.me, the first effort to publish large scale Chinese semantic data and link them together as a Chinese LOD (CLOD). More precisely, we identify important structural features in three largest Chinese encyclopedia sites (i.e., Baidu Baike, Hudong Baike, and Chinese Wikipedia) for extraction and propose several data-level mapping strategies for automatic link discovery. As a result, the CLOD has more than 5 million distinct entities and we simply link CLOD with the existing LOD based on the multilingual characteristic of Wikipedia. Finally, we also introduce three Web access entries namely SPARQL endpoint, lookup interface and detailed data view, which conform to the principles of publishing data sources to LOD.
-
A Clustering-based Approach to Ontology Alignment
,
Songyun Duan,Achille Fokoue,Kavitha Srinivas and Brian Byrne
,
146-161
,
[OpenAccess]
,
[Publisher]
Ontology alignment is an important problem for the linked data web, as more and more ontologies and ontology instances get published for specific domains such as government and healthcare. A number of (semi-)automated alignment systems have been proposed in recent years. Most combine a set of similarity functions on lexical, semantic and structural features to align ontologies. Although these functions work well in many cases of ontology alignments, they fail to capture alignments when terms or structure varies vastly across ontologies. In this case, one is forced to rely on manual alignment. In this paper, we study whether it is feasible to re-use such expert provided ontology alignments for new alignment tasks. We focus in particular on many-to-one alignments, where the opportunity for re-use is feasible if alignments are stable. Specifically, we define the notion of a cluster as being made of multiple entities in the source ontology S that are mapped to the same entity in the target ontology T . We test the stability hypothesis that the formed clusters of source ontology are stable across alignments to different target ontologies. If this hypothesis is valid, the clusters of an ontology S, built from an existing alignment with an ontology T , can be effectively exploited to align S with a new ontology T ′ . Evaluation on both manual and automated high-quality alignments show remarkable stability of clusters across ontology alignments in the financial domain and the healthcare and life sciences domain. Experimental evaluation also demonstrates the effectiveness of utilizing the stability of clusters in improving the alignment process in terms of precision and recall.
-
A Machine Learning Approach to Multilingual and Cross-lingual Ontology Matching
,
Dennis Spohr,Laura Hollink and Philipp Cimiano
,
665-680
,
[OpenAccess]
,
[Publisher]
Ontology matching is a task that has attracted considerable attention in recent years. With very few exceptions, however, research in ontology matching has focused primarily on the development of monolingual matching algorithms. As more and more resources become available in more than one language, novel algorithms are required which are capable of matching ontologies which share more than one language, or ontologies which are multilingual but do not share any languages. In this paper, we discuss several approaches to learning a matching function between two ontologies using a small set of manually aligned concepts, and evaluate them on different pairs of financial accounting standards, showing that multilingual information can indeed improve the matching quality, even in cross-lingual scenarios. In addition to this, as current research on ontology matching does not make a satisfactory distinction between multilingual and cross-lingual ontology matching, we provide precise definitions of these terms in relation to monolingual ontology matching, and quantify their effects on different matching algorithms.
-
A Native and Adaptive Approach for Unified Processing of Linked Streams and Linked Data
,
Danh Le Phuoc,Minh Dao-Tran,Josiane Xavier Parreira and Manfred Hauswirth
,
370-388
,
[OpenAccess]
,
[Publisher]
In this paper we address the problem of scalable, native and adaptive query processing over Linked Stream Data integrated with Linked Data. Linked Stream Data consists of data generated by stream sources, e.g., sensors, enriched with semantic descriptions, following the standards proposed for Linked Data. This enables the integration of stream data with Linked Data collections and facilitates a wide range of novel applications. Currently available systems use a “black box” approach which delegates the processing to other engines such as stream/event processing engines and SPARQL query processors by translating to their provided languages. As the experimental results described in this paper show, the need for query translation and data transformation, as well as the lack of full control over the query execution, pose major drawbacks in terms of efficiency. To remedy these drawbacks, we present CQELS (Continuous Query Evaluation over Linked Streams ), a native and adaptive query processor for unified query processing over Linked Stream Data and Linked Data. In contrast to the existing systems, CQELS uses a “white box” approach and implements the required query operators natively to avoid the overhead and limitations of closed system regimes. CQELS provides a flexible query execution framework with the query processor dynamically adapting to the changes in the input data. During query execution, it continuously reorders operators according to some heuristics to achieve improved query execution in terms of delay and complexity. Moreover, external disk access on large Linked Data collections is reduced with the use of data encoding and caching of intermediate query results. To demonstrate the efficiency of our approach, we present extensive experimental performance evaluations in terms of query execution time, under varied query types, dataset sizes, and number of parallel queries. These results show that CQELS outperforms related approaches by orders of magnitude.
-
A Novel Approach to Visualizing and Navigating Ontologies
,
Enrico Motta,Paul Mulholland,Silvio Peroni,Mathieu d'Aquin,José Manuél Gómez-Pérez,Victor Mendez and Fouad Zablith
,
470-486
,
[OpenAccess]
,
[Publisher]
Observational studies in the literature have highlighted low levels of user satisfaction in relation to the support for ontology visualization and exploration provided by current ontology engineering tools. These issues are particularly problematic for non-expert users, who rely on effective tool support to abstract from representational details and to be able to make sense of the contents and the structure of ontologies. To address these issues, we have developed a novel solution for visualizing and navigating ontologies, KC-Viz, which exploits an empirically-validated ontology summarization method, both to provide concise views of large ontologies, and also to support a ‘middle-out’ ontology navigation approach, starting from the most information-rich nodes (key concepts). In this paper we present the main features of KC-Viz and also discuss the encouraging results derived from a preliminary empirical evaluation, which suggest that the use of KC-Viz provides performance advantages to users tackling realistic browsing and visualization tasks. Supplementary data gathered through questionnaires also convey additional interesting findings, including evidence that prior experience in ontology engineering affects not just objective performance in ontology engineering tasks but also subjective views on the usability of ontology engineering tools.
-
ANAPSID: An Adaptive Query Processing Engine for SPARQL Endpoints
,
Maribel Acosta,Maria-Esther Vidal,Tomas Lampo,Julio Castillo and Edna Ruckhaus
,
18-34
,
[OpenAccess]
,
[Publisher]
Following the design rules of Linked Data, the number of available SPARQL endpoints that support remote query processing is quickly growing; however, because of the lack of adaptivity, query executions may frequently be unsuccessful. First, fixed plans identified following the traditional optimize-thenexecute paradigm, may timeout as a consequence of endpoint availability. Second, because blocking operators are usually implemented, endpoint query engines are not able to incrementally produce results, and may become blocked if data sources stop sending data. We present ANAPSID, an adaptive query engine for SPARQL endpoints that adapts query execution schedulers to data availability and run-time conditions. ANAPSID provides physical SPARQL operators that detect when a source becomes blocked or data tra ffic is bursty, and opportunistically, the operators produce results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previously computed matches to secondary memory avoiding duplicates. We compared ANAPSID performance with respect to RDF stores and endpoints, and observed that ANAPSID speeds up execution time, in some cases, in more than one order of magnitude.
-
Alignment-based trust for resource finding in semantic P2P networks
,
Manuel Atencia,Jérôme Euzenat,Giuseppe Pirrò and Marie-Christine Rousset
,
51-66
,
[OpenAccess]
,
[Publisher]
Error
-
An Empirical Study of Vocabulary Relatedness and Its Application to Recommender Systems
,
Gong Cheng,Saisai Gong and Yuzhong Qu
,
98-113
,
[OpenAccess]
,
[Publisher]
Error
-
An Ontology Design Pattern for Referential Qualities
,
Jens Ortmann and Desiree Daniel
,
537-552
,
[OpenAccess]
,
[Publisher]
Referential qualities are qualities of an entity taken with reference to another entity. For example the vulnerability of a coast to sea level rise. In contrast to most non-relational qualities which only depend on their host, referential qualities require a referent additional to their host, i.e. a quality Q of an entity X taken with reference to another entity R. These qualities occur frequently in ecological systems, which make concepts from these systems challenging to model in formal ontology. In this paper, we discuss exemplary resilience, vulnerability and affordance as qualities of an entity taken with reference to an external factor. We suggest an ontology design pattern for referential qualities. The design pattern is anchored in the foundational ontology DOLCE and evaluated using implementations for the notions affordance, resilience and vulnerability.
-
Automatically Generating Data Linkages Using a Domain-Independent Candidate Selection Approach
,
Dezhao Song and Jeff Heflin
,
649-664
,
[OpenAccess]
,
[Publisher]
One challenge for Linked Data is scalably establishing highquality owl:sameAs links between instances (e.g., people, geographical locations, publications, etc.) in different data sources. Traditional approaches to this entity coreference problem do not scale because they exhaustively compare every pair of instances. In this paper, we propose a candidate selection algorithm for pruning the search space for entity coreference. We select candidate instance pairs by computing a character-level similarity on discriminating literal values that are chosen using domain-independent unsupervised learning. We index the instances on the chosen predicates’ literal values to efficiently look up similar instances. We evaluate our approach on two RDF and three structured datasets. We show that the traditional metrics don’t always accurately reflect the relative benefits of candidate selection, and propose additional metrics. We show that our algorithm frequently outperforms alternatives and is able to process 1 million instances in under one hour on a single Sun Workstation. Furthermore, on the RDF datasets, we show that the entire entity coreference process scales well by applying our technique. Surprisingly, this high recall, low precision filtering mechanism frequently leads to higher F-scores in the overall system.
-
Capturing Instance Level Ontology Evolution for DL-Lite
,
Evgeny Kharlamov and Dmitriy Zheleznyakov
,
321-337
,
[OpenAccess]
,
[Publisher]
Error
-
Concurrent Classification of EL Ontologies
,
Yevgeny Kazakov,Markus Krötzsch and Frantisek Simancik
,
305-320
,
[OpenAccess]
,
[Publisher]
We describe an optimised consequence-based procedure for classification of ontologies expressed in a polynomial fragment ELH R + of the OWL 2 EL profile. A distinguishing property of our procedure is that it can take advantage of multiple processors/cores, which increasingly prevail in computer systems. Our solution is based on a variant of the ‘given clause’ saturation algorithm for first-order theorem proving, where we assign derived axioms to ‘contexts’ within which they can be used and which can be processed independently. We describe an implementation of our procedure within the Java-based reasoner ELK. Our implementation is light-weight in the sense that an overhead of managing concurrent computations is minimal. This is achieved by employing lock-free data structures and operations such as ‘compare-and-swap’. We report on preliminary experimental results demonstrating a substantial speedup of ontology classification on multi-core systems. In particular, one of the largest and widely-used medical ontologies SNOMED CT can be classified in as little as 5 seconds.
-
Connecting the Dots: A Multi-pivot Approach to Data Exploration
,
Igor O. Popov,Monica M. C. Schraefel,Wendy Hall and Nigel Shadbolt
,
553-568
,
[OpenAccess]
,
[Publisher]
The purpose of data browsers is to help users identify and query data effectively without being overwhelmed by large complex graphs of data. A proposed solution to identify and query data in graph-based datasets is Pivoting (or set-oriented browsing ), a many-to-many graph browsing technique that allows users to navigate the graph by starting from a set of instances followed by navigation through common links. Relying solely on navigation, however, makes it difficult for users to find paths or even see if the element of interest is in the graph when the points of interest may be many vertices apart. Further challenges include finding paths which require combinations of forward and backward links in order to make the necessary connections which further adds to the complexity of pivoting. In order to mitigate the effects of these problems and enhance the strengths of pivoting we present a multi-pivot approach which we embodied in tool called Visor. Visor allows users to explore from multiple points in the graph, helping users connect key points of interest in the graph on the conceptual level, visually occluding the remainder parts of the graph, thus helping create a road-map for navigation. We carried out an user study to demonstrate the viability of our approach.
-
DBpedia SPARQL Benchmark – Performance Assessment with Real Queries on Real Data
,
Mohamed Morsey,Jens Lehmann,Sören Auer and Axel-Cyrille Ngonga Ngomo
,
454-469
,
[OpenAccess]
,
[Publisher]
Triple stores are the backbone of increasingly many Data Web applications. It is thus evident that the performance of those stores is mission critical for individual projects as well as for data integration on the Data Web in general. Consequently, it is of central importance during the implementation of any of these applications to have a clear picture of the weaknesses and strengths of current triple store implementations. In this paper, we propose a generic SPARQL benchmark creation procedure, which we apply to the DBpedia knowledge base. Previous approaches often compared relational and triple stores and, thus, settled on measuring performance against a relational database which had been converted to RDF by using SQL-like queries. In contrast to those approaches, our benchmark is based on queries that were actually issued by humans and applications against existing RDF data not resembling a relational schema. Our generic procedure for benchmark creation is based on query-log mining, clustering and SPARQL feature analysis. We argue that a pure SPARQL benchmark is more useful to compare existing triple stores and provide results for the popular triple store implementations Virtuoso, Sesame, Jena-TDB, and BigOWLIM. The subsequent comparison of our results with other benchmark results indicates that the performance of triple stores is by far less homogeneous than suggested by previous benchmarks. 1
-
Decomposition and Modular Structure of BioPortal Ontologies
,
Chiara Del Vescovo,Damian Gessler,Pavel Klinov,Bijan Parsia,Ulrike Sattler,Thomas Schneider and Andrew Winget
,
130-145
,
[OpenAccess]
,
[Publisher]
Abstract We present the first large scale investigation into the modular structure of a substantial collection of state-of-the-art biomedical ontologies, namely those maintained in the NCBO BioPortal repository. 5 Using the notion of Atomic Decomposition, we partition BioPortal ontologies into logically coherent subsets (atoms), which are related to each other by a notion of dependency. We analyze various aspects of the resulting structures, and discuss their implications on applications of ontologies. In particular, we describe and investigate the usage of these ontology decompositions to extract modules, for instance, to facilitate matchmaking of semantic Web services in SSWAP (Simple Semantic Web Architecture and Protocol). Descriptions of those services use terms from BioPortal so service discovery requires reasoning with respect to relevant fragments of ontologies (i.e., modules). We present a novel algorithm for extracting modules from decomposed BioPortal ontologies which is able to quickly identify atoms that need to be included in a module to ensure logically complete reasoning. Compared to existing module extraction algorithms, it has a number of benefits, including improved performance and the possibility to avoid loading the entire ontology into memory. The algorithm is also evaluated on BioPortal ontologies and the results are presented and discussed.
-
Effective and Efficient Entity Search in RDF data
,
Roi Blanco,Peter Mika and Sebastiano Vigna
,
83-97
,
[OpenAccess]
,
[Publisher]
Triple stores have long provided RDF storage as well as data access using expressive, formal query languages such as SPARQL. The new end users of the Semantic Web, however, are mostly unaware of SPARQL and overwhelmingly prefer imprecise, informal keyword queries for searching over data. At the same time, the amount of data on the Semantic Web is approaching the limits of the architectures that provide support for the full expressivity of SPARQL. These factors combined have led to an increased interest in semantic search, i.e. access to RDF data using Information Retrieval methods. In this work, we propose a method for effective and efficient entity search over RDF data. We describe an adaptation of the BM25F ranking function for RDF data, and demonstrate that it outperforms other state-of-the-art methods in ranking RDF resources. We also propose a set of new index structures for efficient retrieval and ranking of results. We implement these results using the open-source MG4J framework.
-
Effectively Interpreting Keyword Queries on RDF Databases with a Rear View
,
Haizhou Fu and Kemafor Anyanwu
,
193-208
,
[OpenAccess]
,
[Publisher]
Effective techniques for keyword search over RDF databases incorporate an explicit interpretation phase that maps keywords in a keyword query to structured query constructs. Because of the ambiguity of keyword queries, it is often not possible to generate a unique interpretation for a keyword query. Consequently, heuristics geared toward generating the top-K likeliest user-intended interpretations have been proposed. However, heuristics currently proposed fail to capture any userdependent characteristics, but rather depend on database-dependent properties such as occurrence frequency of subgraph pattern connecting keywords. This leads to the problem of generating top-K interpretations that are not aligned with user intentions. In this paper, we propose a contextaware approach for keyword query interpretation that personalizes the interpretation process based on a user’s query context. Our approach addresses the novel problem of using a sequence of structured queries corresponding to interpretations of keyword queries in the query history as contextual information for biasing the interpretation of a new query. Experimental results presented over DBPedia dataset show that our approach outperforms the state-of-the-art technique on both efficiency and effectiveness, particularly for ambiguous queries.
-
Enabling fine-grained HTTP caching of SPARQL query results
,
Gregory Todd Williams and Jesse Weaver
,
762-777
,
[OpenAccess]
,
[Publisher]
As SPARQL endpoints are increasingly used to serve linked data, their ability to scale becomes crucial. Although much work has been done to improve query evaluation, little has been done to take advantage of caching. Effective solutions for caching query results can improve scalability by reducing latency, network IO, and CPU overhead. We show that simple augmentation of the database indexes found in common SPARQL implementations can directly lead to effective caching at the HTTP protocol level. Using tests from the Berlin SPARQL benchmark, we evaluate the potential of such caching to improve overall efficiency of SPARQL query evaluation.
-
Encyclopedic Knowledge Patterns from Wikipedia Links
,
Andrea Giovanni Nuzzolese,Aldo Gangemi,Valentina Presutti and Paolo Ciancarini
,
520-536
,
[OpenAccess]
,
[Publisher]
What is the most intuitive way of organizing concepts for describing things? What are the most relevant types of things that people use for describing other things? Wikipedia and Linked Data offer knowledge engineering researchers a chance to empirically identifying invariances in conceptual organization of knowledge i.e. knowledge patterns. In this paper, we present a resource of Encyclopedic Knowledge Patterns that have been discovered by analyizing the Wikipedia page links dataset, describe their evaluation with a user study, and discuss why it enables a number of research directions contributing to the realization of a meaningful Semantic Web.
-
Extending Functional Dependency to Detect Abnormal Data in RDF Graphs
,
Yang Yu and Jeff Heflin
,
794-809
,
[OpenAccess]
,
[Publisher]
Data quality issues arise in the Semantic Web because data is created by diverse people and/or automated tools. In particular, erroneous triples may occur due to factual errors in the original data source, the acquisition tools employed, misuse of ontologies, or errors in ontology alignment. We propose that the degree to which a triple deviates from similar triples can be an important heuristic for identifying errors. Inspired by functional dependency, which has shown promise in database data quality research, we introduce value-clustered graph functional dependency to detect abnormal data in RDF graphs. To better deal with Semantic Web data, this extends the concept of functional dependency on several aspects. First, there is the issue of scale, since we must consider the whole data schema instead of being restricted to one database relation. Second, it deals with multi-valued properties without explicit value correlations as specified as tuples in databases. Third, it uses clustering to consider classes of values. Focusing on these characteristics, we propose a number of heuristics and algorithms to efficiently discover the extended dependencies and use them to detect abnormal data. Experiments have shown that the system is efficient on multiple data sets and also detects many quality problems in real world data.
-
Extending Logic Programs with Description Logic Expressions for the Semantic Web
,
Yi-Dong Shen and Kewen Wang
,
633-648
,
[OpenAccess]
,
[Publisher]
Recently much attention has been directed to extending logic programming with description logic (DL) expressions, so that logic programs have access to DL knowledge bases and thus are able to reason with ontologies in the Semantic Web. In this paper, we propose a new extension of logic programs with DL expressions, called normal DL logic programs. In a normal DL logic program arbitrary DL expressions are allowed to appear in rule bodies and atomic DL expressions (i.e., atomic concepts and atomic roles) allowed in rule heads. We extend the key condition of well-supportedness for normal logic programs under the standard answer set semantics to normal DL logic programs and define an answer set semantics for DL logic programs which satisfies the extended well-supportedness condition. We show that the answer set semantics for normal DL logic programs is decidable if the underlying description logic is decidable (e.g. SHOIN or SROIQ).
-
Extracting Semantic User Networks From Informal Communication Exchanges
,
Anna Lisa Gentile,Vitaveska Lanfranchi,Suvodeep Mazumdar and Fabio Ciravegna
,
209-224
,
[OpenAccess]
,
[Publisher]
Nowadays communication exchanges are an integral and time consuming part of people’s job, especially for the so called knowledge workers. Contents discussed during meetings, instant messaging exchanges, email exchanges therefore constitute a potential source of knowledge within an organisation, which is only shared with those immediately involved in the particular communication act. This poses a knowledge management issue, as this kind of contents become “buried knowledge”. This work uses semantic technologies to extract buried knowledge, enabling expertise finding and topic trends spotting. Specifically we claim it is possible to automatically model people’s expertise by monitoring informal communication exchanges (email) and semantically annotating their content to derive dynamic user profiles. Profiles are then used to calculate similarity between people and plot semantic knowledge-based networks. The major contribution and novelty of this work is the exploitation of semantic concepts captured from informal content to build a semantic network which reflects people expertise rather than capturing social interactions. We validate the approach using contents from a research group internal mailing list, using email exchanges within the group collected over a ten months period.
-
FedBench: A Benchmark Suite for Federated Semantic Data Query Processing
,
Michael Schmidt,Olaf Görlitz,Peter Haase,Günter Ladwig,Andreas Schwarte and Thanh Tran
,
585-600
,
[OpenAccess]
,
[Publisher]
In this paper we present FedBench, a comprehensive benchmark suite for testing and analyzing the performance of federated query processing strategies on semantic data. The major challenge lies in the heterogeneity of semantic data use cases, where applications may face different settings at both the data and query level, such as varying data access interfaces, incomplete knowledge about data sources, availability of different statistics, and varying degrees of query expressiveness. Accounting for this heterogeneity, we present a highly flexible benchmark suite, which can be customized to accommodate a variety of use cases and compare competing approaches. We discuss design decisions, highlight the flexibility in customization, and elaborate on the choice of data and query sets. The practicability of our benchmark is demonstrated by a rigorous evaluation of various application scenarios, where we indicate both the benefits as well as limitations of the state-of-the-art federated query processing strategies for semantic data.
-
FedX: Optimization Techniques for Federated Query Processing on Linked Data
,
Andreas Schwarte,Peter Haase,Katja Hose,Ralf Schenkel and Michael Schmidt
,
601-616
,
[OpenAccess]
,
[Publisher]
Motivated by the ongoing success of Linked Data and the growing amount of semantic data sources available on the Web, new challenges to query processing are emerging. Especially in distributed settings that require joining data provided by multiple sources, sophisticated optimization techniques are necessary for efficient query processing. We propose novel join processing and grouping techniques to minimize the number of remote requests, and develop an effective solution for source selection in the absence of preprocessed metadata. We present FedX, a practical framework that enables efficient SPARQL query processing on heterogeneous, virtually integrated Linked Data sources. In experiments, we demonstrate the practicability and efficiency of our framework on a set of real-world queries and data sources from the Linked Open Data cloud. With FedX we achieve a significant improvement in query performance over state-of-the-art federated query engines.
-
Generating Resource Profiles by Exploiting the Context of Social Annotations
,
Ricardo Kawase,George Papadakis and Fabian Abel
,
289-304
,
[OpenAccess]
,
[Publisher]
Typical tagging systems merely capture that part of the tagging interactions that enrich the semantics of tag assignments according to the system’s purposes. The common practice is to build tag-based resource or user profiles on the basis of statistics about tags, disregarding the additional evidence that pertain to the resource, the user or the tag assignment itself. Thus, the main bulk of this valuable information is ignored when generating user or resource profiles. In this work, we formalize the notion of tag-based and context-based resource profiles and introduce a generic strategy for building such profiles by incorporating available context information from all parts involved in a tag assignment. Our method takes into account not only the contextual information attached to the tag, the user and the resource, but also the metadata attached to the tag assignment itself. We demonstrate and evaluate our approach on two different social tagging systems and analyze the impact of several context-based resource modeling strategies within the scope of tag recommendations. The outcomes of our study suggest a significant improvement over other methods typically employed for this task.
-
Getting the Meaning Right: A Complementary Distributional Layer for the Web Semantics
,
Vít Novácek,Siegfried Handschuh and Stefan Decker
,
504-519
,
[OpenAccess]
,
[Publisher]
We aim at providing a complementary layer for the web semantics, catering for bottom-up phenomena that are empirically observable on the Semantic Web rather than being merely asserted by it. We focus on meaning that is not associated with particular semantic descriptions, but emerges from the multitude of explicit and implicit links on the web of data. We claim that the current approaches are mostly top-down and thus lack a proper mechanisms for capturing the emergent aspects of the web meaning. To fill this gap, we have proposed a framework based on distributional semantics (a successful bottom-up approach to meaning representation in computational linguistics) that is, however, still compatible with the top-down Semantic Web principles due to inherent support of rules. We evaluated our solution in a knowledge consolidation experiment, which confirmed the promising potential of our approach.
-
Inspecting regularities in ontology design using clustering
,
Eleni Mikroyannidi,Luigi Iannone,Robert Stevens and Alan L. Rector
,
438-453
,
[OpenAccess]
,
[Publisher]
We propose a novel application of clustering analysis to identify regularities in the usage of entities in axioms within an ontology. We argue that such regularities will be able to help to identify parts of the schemas and guidelines upon which ontologies are often built, especially in the absence of explicit documentation. Such analysis can also isolate irregular entities, thus highlighting possible deviations from the initial design. The clusters we obtain can be fully described in terms of generalised axioms that offer a synthetic representation of the detected regularity. In this paper we discuss the results of the application of our analysis to different ontologies and we discuss the potential advantages of incorporating it into future authoring tools.
-
Labels in the Web of Data
,
Basil Ell,Denny Vrandecic and Elena Paslaru Bontas Simperl
,
162-176
,
[OpenAccess]
,
[Publisher]
Entities on the Web of Data need to have labels in order to be exposable to humans in a meaningful way. These labels can then be used for exploring the data, i.e., for displaying the entities in a linked data browser or other front-end applications, but also to support keywordbased or natural-language based search over the Web of Data. Far too many applications fall back to exposing the URIs of the entities to the user in the absence of more easily understandable representations such as human-readable labels. In this work we introduce a number of labelrelated metrics: completeness of the labeling, the efficient accessibility of the labels, unambiguity of labeling, and the multilinguality of the labeling. We report our findings from measuring the Web of Data using these metrics. We also investigate which properties are used for labeling purposes, since many vocabularies define further labeling properties beyond the standard property from RDFS.
-
Large Scale Fuzzy pD * Reasoning Using MapReduce
,
Chang Liu,Guilin Qi,Haofen Wang and Yong Yu
,
405-420
,
[OpenAccess]
,
[Publisher]
The MapReduce framework has proved to be very efficient for data-intensive tasks. Earlier work has tried to use MapReduce for large scale reasoning for pD ∗ semantics and has shown promising results. In this paper, we move a step forward to consider scalable reasoning on top of semantic data under fuzzy pD ∗ semantics (i.e., an extension of OWL pD ∗ semantics with fuzzy vagueness). To the best of our knowledge, this is the first work to investigate how MapReduce can help to solve the scalability issue of fuzzy OWL reasoning. While most of the optimizations used by the existing MapReduce framework for pD ∗ semantics are also applicable for fuzzy pD ∗ semantics, unique challenges arise when we handle the fuzzy information. We identify these key challenges, and propose a solution for tackling each of them. Furthermore, we implement a prototype system for the evaluation purpose. The experimental results show that the running time of our system is comparable with that of WebPIE, the state-of-the-art inference engine for scalable reasoning in pD ∗ semantics.
-
Learning Relational Bayesian Classifiers from RDF Data
,
Harris T. Lin,Neeraj Koul and Vasant Honavar
,
389-404
,
[OpenAccess]
,
[Publisher]
The increasing availability of large RDF datasets offers an exciting opportunity to use such data to build predictive models using machine learning algorithms. However, the massive size and distributed nature of RDF data calls for approaches to learning from RDF data in a setting where the data can be accessed only through a query interface, e.g., the SPARQL endpoint of the RDF store. In applications where the data are subject to frequent updates, there is a need for algorithms that allow the predictive model to be incrementally updated in response to changes in the data. Furthermore, in some applications, the attributes that are relevant for specific prediction tasks are not known a priori and hence need to be discovered by the algorithm. We present an approach to learning Relational Bayesian Classifiers (RBCs) from RDF data that addresses such scenarios. Specifically, we show how to build RBCs from RDF data using statistical queries through the SPARQL endpoint of the RDF store. We compare the communication complexity of our algorithm with one that requires direct centralized access to the data and hence has to retrieve the entire RDF dataset from the remote location for processing. We establish the conditions under which the RBC models can be incrementally updated in response to addition or deletion of RDF data. We show how our approach can be extended to the setting where the attributes that are relevant for prediction are not known a priori, by selectively crawling the RDF data for attributes of interest. We provide open source implementation and evaluate the proposed approach on several large RDF datasets.
-
Leveraging the Semantics of Tweets for Adaptive Faceted Search on Twitter
,
Fabian Abel,Ilknur Celik,Geert-Jan Houben and Patrick Siehndel
,
1-17
,
[OpenAccess]
,
[Publisher]
In the last few years, Twitter has become a powerful tool for publishing and discussing information. Yet, content exploration in Twitter requires substantial effort. Users often have to scan information streams by hand. In this paper, we approach this problem by means of faceted search. We propose strategies for inferring facets and facet values on Twitter by enriching the semantics of individual Twitter messages (tweets) and present different methods, including personalized and context-adaptive methods, for making faceted search on Twitter more effective. We conduct a large-scale evaluation of faceted search strategies, show significant improvements over keyword search and reveal significant benefits of those strategies that (i) further enrich the semantics of tweets by exploiting links posted in tweets, and that (ii) support users in selecting facet value pairs by adapting the faceted search interface to the specific needs and preferences of a user. Key words: faceted search, twitter, semantic enrichment, adaptation 1
-
Link Prediction for Annotation Graphs Using Graph Summarization
,
Andreas Thor,Philip Anderson,Louiqa Raschid,Saket Navlakha,Barna Saha,Samir Khuller and Xiao-Ning Zhang
,
714-729
,
[OpenAccess]
,
[Publisher]
Annotation graph datasets are a natural representation of scientific knowledge. They are common in the life sciences where genes or proteins are annotated with controlled vocabulary terms (CV terms) from ontologies. The W3C Linking Open Data (LOD) initiative and semantic Web technologies are playing a leading role in making such datasets widely available. Scientists can mine these datasets to discover patterns of annotation. While ontology alignment and integration across datasets has been explored in the context of the semantic Web, there is no current approach to mine such patterns in annotation graph datasets. In this paper, we propose a novel approach for link prediction; it is a preliminary task when discovering more complex patterns. Our prediction is based on a complementary methodology of graph summarization (GS) and dense subgraphs (DSG). GS can exploit and summarize knowledge captured within the ontologies and in the annotation patterns. DSG uses the ontology structure, in particular the distance between CV terms, to filter the graph, and to find promising subgraphs. We develop a scoring function based on multiple heuristics to rank the predictions. We perform an extensive evaluation on Arabidopsis thaliana genes.
-
Local Closed World Semantics: Grounded Circumscription for OWL
,
Kunal Sengupta,Adila Alfa Krisnadhi and Pascal Hitzler
,
617-632
,
[OpenAccess]
,
[Publisher]
We present a new approach to adding closed world reasoning to the Web Ontology Language OWL. It transcends previous work on circumscriptive description logics which had the drawback of yielding an undecidable logic unless severe restrictions were imposed. In particular, it was not possible, in general, to apply local closure to roles. In this paper, we provide a new approach, called grounded circumscription, which is applicable to SROIQ and other description logics around OWL without these restrictions. We show that the resulting language is decidable, and we derive an upper complexity bound. We also provide a decision procedure in the form of a tableaux algorithm.
-
LogMap: Logic-based and Scalable Ontology Matching
,
Ernesto Jiménez-Ruiz and Bernardo Cuenca Grau
,
273-288
,
[OpenAccess]
,
[Publisher]
In this paper, we present LogMap—a highly scalable ontology matching system with ‘built-in’ reasoning and diagnosis capabilities. To the best of our knowledge, LogMap is the only matching system that can deal with semantically rich ontologies containing tens (and even hundreds) of thousands of classes. In contrast to most existing tools, LogMap also implements algorithms for ‘on the fly’ unsatisfiability detection and repair. Our experiments with the ontologies NCI, FMA and SNOMED CT confirm that our system can efficiently match even the largest existing bio-medical ontologies. Furthermore, LogMap is able to produce a ‘clean’ set of output mappings in many cases, in the sense that the ontology obtained by integrating LogMap’s output mappings with the input ontologies is consistent and does not contain unsatisfiable classes.
-
Modelling and Analysis of User Behaviour in Online Communities
,
Sofia Angeletou,Matthew Rowe and Harith Alani
,
35-50
,
[OpenAccess]
,
[Publisher]
Understanding and forecasting the health of an online community is of great value to its owners and managers who have vested interests in its longevity and success. Nevertheless, the association between community evolution and the behavioural patterns and trends of its members is not clearly understood, which hinders our ability of making accurate predictions of whether a community is flourishing or diminishing. In this paper we use statistical analysis, combined with a semantic model and rules for representing and computing behaviour in online communities. We apply this model on a number of forum communities from Boards.ie to categorise behaviour of community members over time, and report on how different behaviour compositions correlate with positive and negative community growth in these forums.
-
On Blank Nodes
,
Alejandro Mallea,Marcelo Arenas,Aidan Hogan and Axel Polleres
,
421-437
,
[OpenAccess]
,
[Publisher]
Blank nodes are defined in RDF as ‘existential variables’ in the same way that has been used before in mathematical logic. However, evidence suggests that actual usage of RDF does not follow this definition. In this paper we thoroughly cover the issue of blank nodes, from incomplete information in database theory, over different treatments of blank nodes across the W3C stack of RDFrelated standards, to empirical analysis of RDF data publicly available on the Web. We then summarize alternative approaches to the problem, weighing up advantages and disadvantages, also discussing proposals for Skolemization.
-
Practical RDF Schema reasoning with annotated Semantic Web data
,
Carlos Viegas Damásio and Filipe Ferreira
,
746-761
,
[OpenAccess]
,
[Publisher]
Semantic Web data with annotations is becoming available, being YAGO knowledge base a prominent example. In this paper we present an approach to perform the closure of large RDF Schema annotated semantic web data using standard database technology. In particular, we exploit several alternatives to address the problem of computing transitive closure with real fuzzy semantic data extracted from YAGO in the PostgreSQL database management system. We benchmark the several alternatives and compare to classical RDF Schema reasoning, providing the first implementation of annotated RDF schema in persistent storage.
-
QueryPIE: Backward reasoning for OWL Horst over very large knowledge bases
,
Jacopo Urbani,Frank van Harmelen,Stefan Schlobach and Henri E. Bal
,
730-745
,
[OpenAccess]
,
[Publisher]
Both materialization and backward-chaining as different modes of performing inference have complementary advantages and disadvantages. Materialization enables very efficient responses at query time, but at the cost of an expensive up front closure computation, which needs to be redone every time the knowledge base changes. Backward-chaining does not need such an expensive and change-sensitive precomputation, and is therefore suitable for more frequently changing knowledge bases, but has to perform more computation at query time. Materialization has been studied extensively in the recent semantic web literature, and is now available in industrial-strength systems. In this work, we focus instead on backward-chaining, and we present an hybrid algorithm to perform efficient backward-chaining reasoning on very large datasets expressed in the OWL Horst (pD∗) fragment. As a proof of concept, we have implemented a prototype called QueryPIE (Query Parallel Inference Engine), and we have tested its performance on different datasets of up to 1 billion triples. Our parallel implementation greatly reduces the reasoning complexity of a naive backward-chaining approach and returns results for single query-patterns in the order of milliseconds when running on a modest 8 machine cluster. To the best of our knowledge, QueryPIE is the first reported backwardchaining reasoner for OWL Horst that efficiently scales to a billion triples.
-
Querying OWL 2 QL and Non-monotonic Rules
,
Matthias Knorr and José Júlio Alferes
,
338-353
,
[OpenAccess]
,
[Publisher]
Answering (conjunctive) queries is an important reasoning task in Description Logics (DL), hence also in highly expressive ontology languages, such as OWL. Extending such ontology languages with rules, such as those expressible in RIF-Core, and further with non-monotonic rules, integrating default negation as described in the RIF-FLD, yields an even more expressive language that allows for modeling defaults, exceptions, and integrity constraints. Here, we present a top-down procedure for querying knowledge bases (KB) that combine non-monotonic rules with an ontology in DL-Lite R – the DL underlying the OWL 2 profile OWL 2 QL. This profile aims particularly at answering queries in an efficient way for KB with large ABoxes. Our procedure extends the query-answering facility to KB that also include non-monotonic rules, while maintaining tractability of reasoning (w.r.t. data complexity). We show that the answers are sound and complete w.r.t. the well-founded MKNF model for hybrid MKNF KB K.
-
RELIN: Relatedness and Informativeness-based Centrality for Entity Summarization
,
Gong Cheng,Thanh Tran and Yuzhong Qu
,
114-129
,
[OpenAccess]
,
[Publisher]
Linked Data is developing towards a large, global repository for structured, interlinked descriptions of real-world entities. An emerging problem in many Web applications making use of data like Linked Data is how a lengthy description can be tailored to the task of quickly identifying the underlying entity. As a solution to this novel problem of entity summarization, we propose RELIN, a variant of the random surfer model that leverages the relatedness and informativeness of description elements for ranking. We present an implementation of this conceptual model, which captures the semantics of description elements based on linguistic and information theory concepts. In experiments involving real-world data sets and users, our approach outperforms the baselines, producing summaries that better match handcrafted ones and further, shown to be useful in a concrete task.
-
Repairing Ontologies for Incomplete Reasoners
,
Giorgos Stoilos,Bernardo Cuenca Grau,Boris Motik and Ian Horrocks
,
681-696
,
[OpenAccess]
,
[Publisher]
The need for scalable query answering often forces Semantic Web applications to use incomplete OWL 2 reasoners, which in some cases fail to derive all answers to a query. This is clearly undesirable, and in some applications may even be unacceptable. To address this problem, we investigate the problem of ‘repairing’ an ontology T —that is, computing an ontology R such that a reasoner that is incomplete for T becomes complete when used with T ∪ R. We identify conditions on T and the reasoner that make this possible, present a practical algorithm for computing R, and present a preliminary evaluation which shows that, in some realistic cases, repairs are feasible to compute, reasonable in size, and do not significantly affect reasoner performance.
-
Semantic Search: Reconciling Expressive Querying and Exploratory Search
,
Sébastien Ferré and Alice Hermann
,
177-192
,
[OpenAccess]
,
[Publisher]
Faceted search and querying are two well-known paradigms to search the Semantic Web. Querying languages, such as SPARQL, offer expressive means for searching RDF datasets, but they are difficult to use. Query assistants help users to write well-formed queries, but they do not prevent empty results. Faceted search supports exploratory search, i.e., guided navigation that returns rich feedbacks to users, and prevents them to fall in dead-ends (empty results). However, faceted search systems do not offer the same expressiveness as query languages. We introduce Query-based Faceted Search (QFS), the combination of an expressive query language and faceted search, to reconcile the two paradigms. In this paper, the LISQL query language generalizes existing semantic faceted search systems, and covers most features of SPARQL. A prototype, Sewelis (aka. Camelis 2), has been implemented, and a usability evaluation demonstrated that QFS retains the ease-of-use of faceted search, and enables users to build complex queries with little training.
-
ShareAlike Your Data: Self-Referential Usage Policies for the Semantic Web
,
Markus Krötzsch and Sebastian Speiser
,
354-369
,
[OpenAccess]
,
[Publisher]
Numerous forms of policies, licensing terms, and related conditions are associated with Web data and services. A natural goal for facilitating the reuse and re-combination of such content is to model usage policies as part of the data so as to enable their exchange and automated processing. This paper thus proposes a concrete policy modelling language. A particular di fficulty are self-referential policies such as Creative Commons ShareAlike, that mandate that derived content is published under some license with the same permissions and requirements. We present a general semantic framework for evaluating such recursive statements, show that it has desirable formal properties, and explain how it can be evaluated using existing tools. We then show that our approach is compatible with both OWL DL and Datalog, and illustrate how one can concretely model self-referential policies in these languages to obtain desired conclusions.
-
The Cognitive Complexity of OWL Justifications
,
Matthew Horridge,Samantha Bail,Bijan Parsia and Ulrike Sattler
,
241-256
,
[OpenAccess]
,
[Publisher]
In this paper, we present an approach to determining the cognitive complexity of justifications for entailments of OWL ontologies. We introduce a simple cognitive complexity model and present the results of validating that model via experiments involving OWL users. The validation is based on test data derived from a large and diverse corpus of naturally occurring justifications. Our contributions include validation for the cognitive complexity model, new insights into justification complexity, a significant corpus with novel analyses of justifications suitable for experimentation, and an experimental protocol suitable for model validation and refinement.
-
The Justificatory Structure of the NCBO BioPortal Ontologies
,
Samantha Bail,Matthew Horridge,Bijan Parsia and Ulrike Sattler
,
67-82
,
[OpenAccess]
,
[Publisher]
Current ontology development tools offer debugging support by presenting justifications for entailments of OWL ontologies. While these minimal subsets have been shown to support debugging and understanding tasks, the occurrence of multiple justifications presents a significant cognitive challenge to users. In many cases even a single entailment may have many distinct justifications, and justifications for distinct entailments may be critically related. However, it is currently unknown how prevalent significant numbers of multiple justifications per entailment are in the field. To address this lack, we examine the justifications from an independently motivated corpus of actively used biomedical ontologies from the NCBO BioPortal. We find that the majority of ontologies contain multiple justifications, while also exhibiting structural features (such as patterns) which can be exploited in order to reduce user effort in the ontology engineering process.
-
Verification of the OWL-Time Ontology
,
Michael Grüninger
,
225-240
,
[OpenAccess]
,
[Publisher]
Ontology verification is concerned with the relationship between the intended structures for an ontology and the models of the axiomatization of the ontology. The verification of a particular ontology requires characterization of the models of the ontology up to isomorphism and a proof that these models are equivalent to the intended structures for the ontology. In this paper we provide the verification of the ontology of time introduced by Hobbs and Pan, which is a first-order axiomatization of OWL-Time. We identify five modules within this ontology and present a complete account of the metatheoretic relationships among the modules and between other time ontologies for points and intervals.
-
Visualizing Ontologies: a case study
,
John Howse,Gem Stapleton,Kerry Taylor and Peter Chapman
,
257-272
,
[OpenAccess]
,
[Publisher]
Concept diagrams were introduced for precisely specifying ontologies in a manner more readily accessible to developers and other stakeholders than symbolic notations. In this paper, we present a case study on the use of concept diagrams in visually specifying the Semantic Sensor Networks (SSN) ontology. The SSN ontology was originally developed by an Incubator Group of the W3C. In the ontology, a sensor is a physical object that implements sensing and an observation is observed by a single sensor. These, and other, roles and concepts are captured visually, but precisely, by concept diagrams. We consider the lessons learnt from developing this visual model and show how to convert description logic axioms into concept diagrams. We also demonstrate how to merge simple concept diagram axioms into more complex axioms, whilst ensuring that diagrams remain relatively uncluttered.
-
Watermarking for Ontologies
,
Fabian M. Suchanek,David Gross-Amblard and Serge Abiteboul
,
697-713
,
[OpenAccess]
,
[Publisher]
In this paper, we study watermarking methods to prove the ownership of an ontology. Different from existing approaches, we propose to watermark not by altering existing statements, but by removing them. Thereby, our approach does not introduce false statements into the ontology. We show how ownership of ontologies can be established with provably tight probability bounds, even if only parts of the ontology are being re-used. We finally demonstrate the viability of our approach on real-world ontologies.
-
Wheat and Chaff - Practically Feasible Interactive Ontology Revision
,
Nadeschda Nikitina,Birte Glimm and Sebastian Rudolph
,
487-503
,
[OpenAccess]
,
[Publisher]
When ontological knowledge is acquired automatically, quality control is essential. We consider the tightest possible approach – an exhaustive manual inspection of the acquired data. By using automated reasoning, we partially automate the process: after each expert decision, axioms that are entailed by the already approved statements are automatically approved, whereas axioms that would lead to an inconsistency are declined. Adequate axiom ranking strategies are essential in this setting to minimize the amount of expert decisions. In this paper, we present a generalization of the previously proposed ranking techniques which works well for arbitrary validity ratios – the proportion of valid statements within a dataset – whereas the previously described ranking functions were either tailored towards validity ratios of exactly 100% and 0% or were optimizing the worst case. The validity ratio – generally not known a priori – is continuously estimated over the course of the inspection process. We further employ partitioning techniques to significantly reduce the computational e ffort. We provide an implementation supporting all these optimizations as well as featuring a user front-end for successive axiom evaluation, thereby making our proposed strategy applicable to practical scenarios. This is witnessed by our evaluation showing that the novel parameterized ranking function almost achieves the maximum possible automation and that the computation time needed for each reasoning-based, automatic decision is reduced to less than one second on average for our test dataset of over 25,000 statements.
-
dipLODocus[RDF] - Short and Long-Tail RDF Analytics for Massive Webs of Data
,
Marcin Wylot,Jigé Pont,Mariusz Wisniewski and Philippe Cudré-Mauroux
,
778-793
,
[OpenAccess]
,
[Publisher]
The proliferation of semantic data on the Web requires RDF database systems to constantly improve their scalability and transactional efficiency. At the same time, users are increasingly interested in investigating or visualizing large collections of online data by performing complex analytic queries. This paper introduces a novel database system for RDF data management called dipLODocus [RDF] , which supports both transactional and analytical queries efficiently. dipLODocus [RDF] takes advantage of a new hybrid storage model for RDF data based on recurring graph patterns. In this paper, we describe the general architecture of our system and compare its performance to state-of-the-art solutions for both transactional and analytic workloads.
-
strukt - A Pattern System for Integrating Individual and Organizational Knowledge Work
,
Ansgar Scherp,Daniel Eißing and Steffen Staab
,
569-584
,
[OpenAccess]
,
[Publisher]
Expert-driven business process management is an established means for improving efficiency of organizational knowledge work. Implicit procedural knowledge in the organization is made explicit by defining processes. This approach is not applicable to individual knowledge work due to its high complexity and variability. However, without explicitly described processes there is no analysis and efficient communication of best practices of individual knowledge work within the organization. In addition, the activities of the individual knowledge work cannot be synchronized with the activities in the organizational knowledge work. Solution to this problem is the semantic integration of individual knowledge work and organizational knowledge work by means of the patternbased core ontology strukt. The ontology allows for defining and managing the dynamic tasks of individual knowledge work in a formal way and to synchronize them with organizational business processes. Using the strukt ontology, we have implemented a prototype application for knowledge workers and have evaluated it at the use case of an architectural office conducting construction projects.
-
A Completely Automatic Direct Mapping of Relational Databases to RDF and OWL
,
Juan Sequeda,Marcelo Arenas and Daniel Miranker
,
[OpenAccess]
,
[Publisher]
Integrating relational databases with the Semantic Web can be accomplish by means of two primary approaches: automatic direct mapping or developers detailing application specific mappings. Both approaches are the subject of the W3C Relational Database to RDF (RDB2RDF) Working Group. Intuitively, a direct mapping is a default and automatic way to translate a relational database schema and its content to OWL and RDF. In this poster, we present a specification, expressed in Datalog, of a direct mapping inspired by the current Direct Mapping draft of the W3C RDB2RDF Working Group. We are currently studying four fundamental properties: monotonicity, information preservation, query preservation and semantics preservation. In particular, we observe that the combination of these properties needs to be addressed very carefully.
-
A Demonstration of DNS^3: a Semantic-Aware DNS Service
,
Philippe Cudré-Mauroux,Gianluca Demartini,Djellel Eddine Difallah,Ahmed Elsayed Mostafa,Vincenzo Russo and Matthew Thomas
,
[OpenAccess]
,
[Publisher]
Abstract The Domain Name System (DNS) is a hierarchical and distributed database used to resolve domain names into IP addresses. The current Web infrastructure heavily relies on the DNS service to allow endusers to access Web pages and Web data using meaningful names (like “www.verisign.com”) rather than cryptic sequences of numbers (e.g., “69.58.181.89”). The main functionalities of the DNS have been specified more than 25 years ago and have not fundamentally evolved since then. In this paper, we propose to demonstrate DNS 3 , an extension of the current DNS service based on security mechanisms and semantic metadata. Specifically, we show how one can embed authoritative RDF triples using the current DNS protocol, and how the naming service can take advantage of the embedded semantic metadata to publish authoritative information about the domains, to improve the performance of domain resolution through prefetching, and to alert end-users of probable threats when visiting potentially harmful domains.
-
A Personalized Mashup Using Rule-based Reasoning and Linked Data
,
Aikaterini Kalou,Georgia Solomou,Dimitrios Koutsomitropoulos and Theodore Papatheodorou
,
[OpenAccess]
,
[Publisher]
In this paper we propose an intelligent personalization service, built upon the idea of combining Linked Data with Semantic Web rules. This service is mashing up information from different bookstores, and suggests users with personalized data according to their preferences. This information as well as the personalization rules are then processed and managed by a scalable knowledge repository. Finally, they are made available as Linked Data, thus enabling thirdparty recipients to consume knowledge-enhanced information.
-
A Semantic Knowledge Management Framework for Informal Communication Exchanges
,
Vitaveska Lanfranchi,Rodrigo Fontenele Carvalho,Anna Lisa Gentile,Suvodeep Mazumdar,Sam Chapman and Fabio Ciravegna
,
[OpenAccess]
,
[Publisher]
Whilst formal organizational knowledge is often stored in archives and accessed through a range of IT tools, informal knowledge from face to face meetings, email exchanges, team meetings is usually not stored and remains implicit. This work aims at finding effective ways of capturing and making use of informal communications, making it available through a knowledge management system to support tasks such as expertise finding and topic trends spotting. This paper presents an overview of the knowledge management system for informal communication exchanges, with details of how the knowledge is captured, searched and visualised.
-
A prototypical OWL Full Reasoner based on First-Order Reasoning
,
Michael Schneider
,
[OpenAccess]
,
[Publisher]
We report on our ongoing endeavour to create a prototypical reasoning system for the OWL 2 Full ontology language and several of its sublanguages, including RDFS and the OWL 2 RL/RDF Rules. Among the languages specified by the W3C OWL 2 standard, OWL 2 Full is the most expressive one, and several other languages, such as RIF and SPARQL 1.1, have dependencies on OWL 2 Full, but to date no reasoner has been implemented for this language. The basic idea underlying our system is to translate the semantics specification of OWL 2 Full and the RDF graphs representing the input ontologies into first-order logic formulae, and to apply first-order reasoners to this axiomatisation to perform one of the supported reasoning tasks: ontology consistency checking, entailment checking, or query answering. The paper explains the taken approach, summarizes the results of a recent evaluation of this approach, and gives an overview of the functionality, architecture and implementation of the reasoner. of OWL 2 Full is the RDF Abstract Syntax
[5], i.e., every RDF graph is a valid OWL 2 Full ontology. The semantics of OWL 2 Full is the OWL 2 RDF-Based Semantics
[6], which incorporates the semantics of RDFS and is specified like RDFS as a set of model-theoretic “semantic conditions”. The semantic conditions are given in the style of first-order logic (FOL) formulae and provide formal meaning for certain combinations of RDF triples that represent OWL 2 language constructs. The expressivity of the semantics of OWL 2 Full is roughly comparable with that of OWL 2 DL
[10], but it also applies to ontologies beyond the syntactic restrictions that OWL 2 DL defines to retain computational decidability; in fact, this extended flexibility renders OWL 2 Full undecidable. However, due to design decisions originally made for RDF, certain kinds of OWL 2 DL
-
An Ontological Framework for Adaptive Feedback to Support Students while Programming
,
Pedro J. Muñoz Merino,Abelardo Pardo,Maren Scheffel,Katja Niemann,Martin Wolpers,Derick Leony and Carlos Delgado Kloos
,
[OpenAccess]
,
[Publisher]
This paper presents a global framework based on ontologies to generate effective feedback for students getting error messages while learning to program. The proposed framework includes several ontologies: one of the domain, one of possible mistake types and one of the causes of these mistakes. They are connected by an intermediary ontology. The feedback is adaptive and depends on contents or students profiles (derived from interaction data) and is supported by a fifth ontology. Possible compilation errors in an introductory C programming course are presented as an example of this framework.
-
Applying Linked Data to Media Fragments and Annotations
,
Yunjia Li,Mike Wald and Gary Wills
,
[OpenAccess]
,
[Publisher]
The Web applications today have been enriched with various multimedia resources and annotations. However, there is still a lack of semantic interlinking between media fragments and annotations, which leads to the insufficient index of inside content of multimedia resources. This paper shows a demo of applying linked data principles in media fragments and annotations to improve the index of multimedia resources. Using linked data a media fragment can be universally identified by a URI and linked to annotations or other media fragments in the linked data cloud. The demo is based on the UK Parliament Debate scenario. The RDF file containing media fragments and annotations of the debate video has been published in Sindice semantic web index and linked to other resources in the linked data cloud.
-
Assessing Health Effects of Water Pollution Using a Semantic Water Quality Portal
,
Evan Patton,Ping Wang,Jin Zheng,Linyun Fu,Timothy Lebo,Li Ding,Qing Liu,Joanne Luciano and Deborah L. McGuinness
,
[OpenAccess]
,
[Publisher]
We demonstrate a semantically enabled approach for environmental monitoring as embodied in our semantic water quality portal. The portal assesses water quality utilizing two data sources, the United States Environmental Protection Agency (EPA) and the United States Geological Survey (USGS), by the user’s choice from a number of regulations, e.g. federal level regulations established by the EPA as well as state departments of environmental protection. The portal identifies pollution events using an OWLbased reasoning system and provides browsing facets generated from provenance data encoded using the Proof Markup Language (PML). We show how exposing these measurements and their provenance as semantic data enables them to be combined with additional external data sources to look for correlations between pollution levels and health effects seen in nearby populations. This submission highlights the interactive demonstration aspects of the portal and augments the more detailed technical description of the semantic infrastructure, reasoning, and benefits of the approach that has been accepted for presentation in the Semantic Web In Use track [1].
-
Browsing Linked Data with MyView
,
Gong Cheng,Huiyao Wu,Saisai Gong,Hang Zhang and Yuzhong Qu
,
[OpenAccess]
,
[Publisher]
Compared with the hypertext Web, Linked Data can satisfy more precise information needs, but presently still lacks tool support for citizen users. In this demonstration, we introduce a personalizable Linked Data browser called MyView, which enables users to query Linked Data by navigation (such as link traversal and filtering) from one entity collection to another. With MyView, users can reuse their past queries in various ways, including categorizing favored links as different views, assembling complex links with existing ones, and revisiting past queries via history and bookmark mechanisms. As an intelligent system, MyView evaluates queries in a logic programming fashion which supports reasoning, and its implementation features several strategies for dealing with the distributed, open, large-scale Web environment. It also generates explainable answers that carry provenance information, and supports source control. Finally, it interacts with users to resolve entity coreference for discovering more sources and reducing redundancy.
-
Building SPARQL-Enabled Applications with Android Devices
,
Mathieu D'Aquin,Andriy Nikolov and Enrico Motta
,
[OpenAccess]
,
[Publisher]
In this paper, we show how features can be added to an Android device (a smartphone) to enable mobile applications to expose data through a SPARQL endpoint. Using simple query federation mechanisms, we describe a demonstrator illustrating how SPARQL-Enabled Android devices can allow us to rapidly develop applications mashing-up data from a collaborative network of sensor-based data sources.
-
Castor: Using Constraint Programming to Solve SPARQL Queries
,
Vianney Le Clément De Saint-Marcq,Yves Deville,Christine Solnon and Pierre-Antoine Champin
,
[OpenAccess]
,
[Publisher]
-
Combining N-gram Retrieval with Weights Propagation on Massive RDF Graphs
,
He Hu and Xiaoyong Du
-
Computing Fine-grained Semantic Annotations of Texts
,
Yue Ma and François Lévy
,
[OpenAccess]
,
[Publisher]
-
Conservative Repurposing of RDF Data
,
Audun Stolpe and Martin G. Skjæveland
,
[OpenAccess]
,
[Publisher]
-
DBpedia internationalization - a graphical tool for I18n infobox-to-ontology mappings
,
Charalampos Bratsas,Lazaros Ioannidis,Dimitris Kontokostas,Soren Auer,Christian Bizer,Sebastian Hellmann and Ioannis Antoniou
,
[OpenAccess]
,
[Publisher]
During the past two decades, the use of the Web has spread across multiple countries and cultures. While the Semantic Web is already served in many languages, we are still facing challenges concerning its internationalization. The DBpedia project, a community effort to extract structured information from Wikipedia, is already supporting multiple languages. This paper presents a graphical tool for creating internationalized mappings for DBpedia.
-
DEW BEADS: A Framework for Distributional and Emergent Web Semantics
,
Vit Novacek,Tudor Groza and Siegfried Handschuh
,
[OpenAccess]
,
[Publisher]
This is an extension of an accepted ISWC’11 research track contribution [1], which introduces an alternative, bottom-up and emergent conception of the Web semantics based on the distributional hypothesis (an approach that is rather complementary to the top-down Semantic Web standards based on logics and model theory). The promising potential of our proposal has been demonstrated in [1] by a thoroughly evaluated experiment in knowledge consolidation. In this more technically oriented demo paper, we augment [1] by an overview of DEW BEADS – an open source framework we implemented to test our research ideas 1 . We describe the framework’s architecture and features in Section 2. Section 3 then shows an example of DEW BEADS deployment to exploration of knowledge in life science publications. We also outline the general framework’s usage in custom applications there.
-
DeMoSt: a Tool for Exploring the Decomposition and the Modular Structure of OWL Ontologies
,
Chiara Del Vescovo,Pavel Klinov,Bijan Parsia,Uli Sattler and Thomas Schneider
,
[OpenAccess]
,
[Publisher]
We intend to complete our paper "Decomposition and Modular Structure of BioPortal Ontologies", accepted for ISWC2011, with the presentation of DeMoSt, a tool to explore the decomposition and the modular structure of an ontology.
-
Entendre: Interactive Semantic Feedback for Ontology Authoring
,
Ronald Denaux,Dhavalkumar Thakker,Vania Dimitrova and Anthony Cohn
,
[OpenAccess]
,
[Publisher]
This demonstration presents Entendre, a framework to analyse ontology authors’ inputs and provide meaningful feedback at a semantic level. The feedback aims to make ontology authors aware of potential issues such as inconsistency, class unsatisfiability, unexpected logical implications, redundancy and isolated entities. The implementation of Entendre that will be demonstrated, extends a CNL-based ontology authoring environment, allowing users without prior knowledge engineering experience to build ontologies while becoming aware of the implications of the formal semantics of OWL. An initial evaluation shows that the feedback is helpful to both novice and experienced ontology authors.
-
Evaluating Adaptive Query Processing Techniques for Federations of SPARQL Endpoints
,
Maribel Acosta and Maria Vidal
,
[OpenAccess]
,
[Publisher]
We present ANAPSID and illustrate the benefits of adaptive semantic data management techniques for accessing a federation of SPARQL endpoints. ANAPSID adapts query execution schedulers to data availability and runtime conditions, implements physical SPARQL operators that detect when an endpoint becomes blocked or data traffic is bursty, and opportunistically produces results as quickly as data arrives from the sources. Additionally, ANAPSID operators implement main memory replacement policies to move previous computed matches to secondary memory avoiding duplicates. We show ANAPSID performance with respect to a variety of RDF engines for queries of diverse complexity, and show that ANAPSID may outperform existing engines.
-
EventMedia: Visualizing Events and Associated Media
,
Houda Khrouf and Raphaël Troncy
,
[OpenAccess]
,
[Publisher]
A wide variety of past and upcoming events are announced and described by several social online services. These web sites range from general event directories to local city guides that may have illustrating media. They often overlap in terms of coverage, and provide each their own social networks features to support users in sharing and deciding upon attending events. The information about the events, the social connections and the representative media are therefore all spread and locked in amongst these services providing overall limited event coverage and no interoperability of the description. In this paper, we present a web-based environment producing and consuming linked data to provide an explicit interlinking of event-related and up-to-date information. We propose interactive and user-friendly interfaces to visualize events with the aim to meet the user needs: relive experiences based on media, and support decision making for attending upcoming events.
-
Facets
,
Giovanni Bartolomeo
,
[OpenAccess]
,
[Publisher]
We present a novel approach to the resource identity problem in RDF documents based on the notion of context-dependant facets.
-
GATE Mímir: Answering Questions Google Can’t
,
Mark Greenwood,Valentin Tablan and Diana Maynard
,
[OpenAccess]
,
[Publisher]
Free text makes up a large proportion of the vast amounts of information generated by modern society, and search engines such as Google are exceptionally good at finding, indexing and searching this. However, the rise of the Semantic Web and the publishing of increasingly large amounts of structured and interlinked data now means that useful information is distributed across multiple sources and in a variety of formats, which cannot be easily reconciled by these search engines as it is not amenable to free text search. Hence, questions which we may wish to ask of society’s collective knowledge cannot be easily answered. For example, it is difficult to see how traditional search engines could be used to locate documents in which a person born in Sheffield is being quoted. In this paper, we describe GATE M´ımir which indexes not only free text, but also semantic annotations and knowledge base data. The resulting multi-paradigm index allows us to search across multiple information sources in order to answer questions which are either infeasible or impossible to answer using current web search engines. GATE M´ımir GATE M´ımir 1 is a multi-paradigm information management index and repository which can be used to index and search over text, annotations, semantic schemas (ontologies), and semantic metadata (instance data). It allows queries that arbitrarily mix full-text, structural, linguistic and semantic queries, and that can scale to gigabytes of text. We briefly describe in this paper the underlying architecture; full details can be found in [1]. A GATE M´ımir index supports combinations of the following data types: Text: Support for full text search represents the most basic indexing functionality, and it is required in most (if not all) cases. In GATE M´ımir this is implemented as an MG4J 2 inverted index. Annotations: The first step in abstracting away from plain text document content is the production of annotations. Annotations in this case are linguistic metadata associated with text snippets in the documents. For example, if the 1 Open source and available via http://gate.ac.uk/family/mimir.html 2 http://mg4j.dsi.unimi.it/
-
HDT-it: Storing, Sharing and Visualizing Huge RDF Datasets
,
Mario Arias Gallego,Javier D. Fernández,Miguel A. Martinez-Prieto and Claudio Gutierrez
,
[OpenAccess]
,
[Publisher]
Huge RDF Datasets are currently being published in the Linked-Open-Data cloud. Appropriate data structures are required to address scalability and performance issues when storing, sharing and querying these datasets. HDT (Header, Dictionary, Triples) is a binary format that represents RDF data in a compressed manner, therefore saving space whilst providing fast search operations directly on the compressed representation. These facts make it an excellent format when storing or sharing millions of triples. HDT-it is a tool that can generate and consume RDF files in HDT format. It demonstrates the capabilities of HDT by allowing to search basic triple patterns against HDT files, and also visualizes the 3D adjacency matrix of the underlying RDF graph to provide an overview of the dataset distribution.
-
How to Represent Knowledge Diversity
,
Andreas Thalhammer,Ioan Toma,Rakebul Hasan,Elena Simperl and Denny Vrandecic
,
[OpenAccess]
,
[Publisher]
Information on the Web includes a huge diversity of opinions, viewpoints, sentiments, emotions, and biases. Accordingly, more and more methods, techniques and tools are available to extract these semantics from text. Representation and exchange of diversity-related information can be easily supported by the use of semantic technologies. For this, we introduce the Knowledge Diversity Ontology (KDO).
-
Identifying User Interests in Folksonomies
,
Elias Zavitsanos,George Vouros and Georgios Paliouras
,
[OpenAccess]
,
[Publisher]
This paper proposes a probabilistic method for classifying folksonomy users to specific domains and for identifying their specific interests to these domains. The proposed method uses a hierarchical probabilistic topic modeling approach that exploits tags to induce hierarchies of latent topics. These hierarchies represent domain conceptualizations of specific domains that are either collective or user-specific. We propose two alternative methods that exploit the induced hierarchies for classifying and identifying users’ interests to specific domains and provide preliminary evaluation results.
-
Interactive Data Integration with MappingAssistant
,
Jan Noessner,Faraz Fallahi,Eva Kiss and Heiner Stuckenschmidt
,
[OpenAccess]
,
[Publisher]
Due to the heterogeneity of distributed systems data integration is a main success factor in real-life business. Applying semantic web technologies for matching data is one successful approach for data integration. Ontoprise uses ontologies as target schema for integrating different sources like databases, text-files and ontologies. However, the created target ontology and the corresponding mapping-rules might be error-prone. Hence, we developed the conflict resolution framework MappingAssistant which detects wrong rules or facts on the instance level in an interactive way. In this demo we present the MappingAssistant framework and an evaluation which emphasizes that users are used to investigate data on the instance level.
-
LDIF - Linked Data Integration Framework
,
Andreas Schultz,Andrea Matteini,Robert Isele,Christian Bizer and Christian Becker
,
[OpenAccess]
,
[Publisher]
The Linked Data Integration Framework can be used within Linked Data applications to translate heterogeneous data from the Web of Linked Data into a clean local target representation while keeping track of data provenance. LDIF provides an expressive mapping language for translating data from the various vocabularies that are used on the Web into a consistent, local target vocabulary. LDIF includes an identity resolution component which discovers URI aliases in the input data and replaces them with a single target URI based on user-provided matching heuristics. For provenance tracking, the LDIF framework employs the Named Graphs data model. This paper describes the architecture of the LDIF framework and presents a performance evaluation of a life science use case.
-
Latest Developments in KC-Viz
,
Enrico Motta,Silvio Peroni and Mathieu D'Aquin
,
[OpenAccess]
,
[Publisher]
KC-Viz is a novel tool for visualizing and navigating ontologies, realised as a plug-in of the NeOn Toolkit. KC-Viz provides a comprehensive set of features to support ontology visualization and navigation, including an innovative mechanism for generating overviews of very large ontologies, which is based on an empirically validated ontology summarization algorithm. In this paper, we present an overview of the tool, in particular focusing on the features introduced in the latest version (v1.3.1).
-
Linked Data Quality Assessment through Network Analysis
,
Christophe Guéret,Paul Groth,Claus Stadler and Jens Lehmann
,
[OpenAccess]
,
[Publisher]
Linked Data is at its core about the setting of links between resources. Links provide enriched semantics, pointers to extra information and enable the merging of data sets. However, as the amount of Linked Data has grown, there has been the need to automate the creation of links and such automated approaches can create low-quality links or unsuitable network structures. In particular, it is difficult to obtain an overall picture as to whether the links introduced improve or diminish the quality of Linked Data. In this work, we present an extensible framework that allows for the assessment of Linked Data quality from a global perspective. We test the framework on a set of known quality links and show that it effectively detects quality changes.
-
Looking at CAREY: Controlling Climatological Regions in State of Emergency
,
Maribel Acosta,Marlene Goncalves and Maria Vidal
,
[OpenAccess]
,
[Publisher]
We present CAREY, an alert system notification for regions in state of emergency. CAREY implements the mediator-wrapper architecture on top of the Geospatial Web to visualize risky regions in terms of their weather conditions. We illustrate the formalization of the problem of detecting risky regions as a two-fold problem that relies on annotations of sensor data with the ontology of Observations and Measurements (O&M) to enhance state-of-the-art data mining and ranking techniques. First, sensor observations are clustered according to their Geospatial information then, proximate regions are clustered into micro-climate areas in terms of the similarity of their weather conditions. Finally, Top-k Skyline techniques are used to identify the top-k areas that best meet criteria of risk among the areas that are incomparable with respect to this condition. We demonstrate the capabilities of CAREY.
-
Managing Linguistic Resources by Enriching Their Metadata with Linked Data
,
Christina Hoppermann,Thorsten Trippel and Claus Zinn
,
[OpenAccess]
,
[Publisher]
The NaLiDa project aims at contributing to an infrastructure for the metadata-based description and access to linguistic resources and tools. When aggregating heterogenous metadata sets from various providers to provide a single and uniform point of access to the aggregation, data curation becomes a central issue. In this paper, we describe how we use authority files from the German National Library, available as Linked Data, to tackle this issue for metadata fields about persons, organisations, and subject classifications.
-
MetaMorphosis+ - A social network of educational Web resources based on semantic integration of services and data
,
Stefan Dietze,Eleni Kaldoudi,Nikolas Dovrolis,Hong Qing Yu and Davide Taibi
,
[OpenAccess]
,
[Publisher]
Past research aiming at interoperability within the TechnologyEnhanced Learning (TEL) field has led to a fragmented landscape of competing metadata schemas and interface mechanisms. So far, Web-scale integration of resources is not facilitated, mainly due to the lack of take-up of shared principles, datasets and schemas. On the other hand, the Linked Data approach has emerged as the de-facto standard for sharing data on the Web. We propose MetaMorphosis+, a social educational application which adopts a general approach to exploit existing TEL data on the Web by allowing its exposure as Linked Data and by taking into account automated enrichment and interlinking techniques to provide rich and well-interlinked data for the educational domain.
-
MovieGoer - Semantic Social Recommendations and Personalised Location-Based Offers
,
Andreas Thalhammer,Timofey Ermilov,Katariina Nyberg,Ario Santoso and John Domingue
,
[OpenAccess]
,
[Publisher]
MovieGoer is an application designed for portable devices that provides its users with context-based services on movie schedule information. It uses social networks, such as Facebook, to replace a user’s effort to enter preferences and personal data explicitly. It gives personalised recommendations on new movies and allows the users to interact with friends sharing the same movie taste. In addition, it enables new business models by connecting cinemas and other service providers with customers. The data integration for all the above is done using specific data sets and schemas from the Linked Open Data cloud.
-
NERD: A Framework for Evaluating Named Entity Recognition Tools in the Web of Data
,
Giuseppe Rizzo and Raphaël Troncy
,
[OpenAccess]
,
[Publisher]
In this paper, we present NERD, an evaluation framework we have developed that records and analyzes ratings of Named Entity (NE) extraction and disambiguation tools working on English plain text articles performed by human beings. NERD enables the comparison of different popular Linked Data entity extractors which expose APIs such as AlchemyAPI, DBPedia Spotlight, Extractiv, OpenCalais and Zemanta. Given an article and a particular tool, a user can assess the precision of the named entities extracted, their typing and linked data URI provided for disambiguation and their subjective relevance for the text. All user interactions are stored in a database. We propose the NERD ontology that defines mappings between the types detected by the different NE extractors. The NERD framework enables then to visualize the comparative performance of these tools with respect to human assessment. Key words: Entity extraction, Linked Data, Natural Language Processing, Evaluation of Linked Data entity extraction tools 1
-
Noisy Semantic Data Processing in Seoul Road Sign Management System
,
Zhisheng Huang,Jun Fang,Stanley Park and Tony Lee
,
[OpenAccess]
,
[Publisher]
The Seoul Road Sign Management (RSM) is a system which provides the semantic integration of LOD’s Linked Geo Data and Open Street Map with Korean POI data set. That is an attempt to develop intelligent road sign management system based on the LarKC platform. The RSM data set contains over 1.1 billion triples of semantic data. However, significant amount of the RSM data are noisy (e.g., inconsistent, partial, or erroneous). We have facilitated the RSM system with the capability of processing and reasoning with noisy semantic data, so that the RSM system is robust enough to return intended answers in spite of the poor quality of the semantic data.
-
OWL to English: a tool for generating organised easily-navigated hypertexts from ontologies
,
Allan Third,Sandra Williams and Richard Power
,
[OpenAccess]
,
[Publisher]
It has frequently been observed that domain experts are not necessarily ontology experts, and that the production of ontologies would be aided if they could read and edit axioms in natural language. The SWAT Tools Verbaliser is available, via a web interface, for verbalising OWL ontologies as texts in a controlled fragment of English. Taking as input any OWL ontology, the verbaliser creates a lexicon containing entries for all the entities in the input, and uses it to generate an English sentence corresponding to each logical axiom. These sentences are then organised into a document structure similar to that of an encyclopaedia, with an entry providing a definition, typology and examples for each entity. The output is either organised, easily-navigable English text encoded in XML, or a copy of the input OWL in which each entity is annotated with its description entry. The generated texts have been evaluated in a number of ways which are briefly presented here.
-
One Simple Ontology for Linked Data Sets
,
Lihua Zhao and Ryutaro Ichise
,
[OpenAccess]
,
[Publisher]
The Linking Open Data (LOD) cloud includes over 26 billion RDF triples from various domains. In order to access linked data sets, Semantic Web users have to understand the ontology schema of the data sets. However, understanding all the ontologies used in the LOD cloud is not feasible and is time-consuming. A simple and easily understandable ontology that integrates ontology schema from different data sets is a solution to this problem. This paper proposes an automatic ontology learning method that integrates ontologies from different linked data sets, and presents case studies to show the advantages of our approach.
-
OntoFM: A Personal Ontology-based File Manager for the Desktop
,
Jenny Rompa,George Lepouras,Costas Vassilakis and Christos Tryfonopoulos
,
[OpenAccess]
,
[Publisher]
Personal ontologies have been proposed as a means to support the semantic management of user information. Assuming that a personal ontology system is in use, new tools have to be developed at user interface level to exploit the enhanced capabilities offered by the system. In this work, we present an ontology-based file manager that allows semantic searching on the user’s personal information space. The file manager exploits the ontology relations to present files associated with specific concepts, proposes new related concepts to users, and helps them explore the information space and locate the required file.
-
RDFaCE -- The RDFa Content Editor
,
Ali Khalili and Soren Auer
,
[OpenAccess]
,
[Publisher]
-
RMonto: ontological extension to RapidMiner
,
Jędrzej Potoniec and Agnieszka Lawrynowicz
,
[OpenAccess]
,
[Publisher]
We present RMonto, an ontological extension to RapidMiner, that provides possibility of machine learning with formal ontologies. RMonto is an easily extendable framework, currently providing support for unsupervised clustering with kernel methods and (frequent) pattern mining in knowledge bases. One important feature of RMonto is that it enables working directly on structured, relational data. Additionally, its custom algorithm implementations may be combined with the power of RapidMiner through transformation/extraction from the ontological data to attribute-value data.
-
RepOSE : A System for Debugging is-a Structure in Networked Taxonomies
,
Patrick Lambrix and Qiang Liu
,
[OpenAccess]
,
[Publisher]
-
Representing Text Mining Results for Structured Pharmacological Queries
,
Carina Haupt,Paul Groth and Marc Zimmermann
,
[OpenAccess]
,
[Publisher]
Several approaches integrating life science data using Semantic Web technologies have been described in the literature . However, these approaches have largely ignored the vast amount of content only available within the scientific literature. In this article, we present an RDF schema for text mining results that enables queries in SPARQL over textual and database data together. We show how real pharmacological queries can be answered over 4 billion text mined triples.
-
SEMLEX - A Framework for Visually Exploring Semantic Query Log Analysis
,
Suvodeep Mazumdar,Khadija Elbedweihy,Amparo E. Cano,Stuart N. Wrigley and Fabio Ciravegna
,
[OpenAccess]
,
[Publisher]
With organisations, local bodies and governments now releasing large amounts of linked data, there is a great opportunity for users and software agents to look for structured information, serving their information needs. Query logs, preserving such information needs can be harvested to understand what linked data consumers are looking for. Though statistical analysis of such query logs have been employed over the years to improve performance, visualising such analyses can provide a different way of exploration that could be invaluable to researchers, developers and linked data providers for discovering hidden trends and patterns. This paper presents our approach to analyse query logs and introduces SEMLEX, a tool that facilitates visual exploration of semantic query log analysis.
-
SPARQL Execution as Fast as SQL Execution on Relational Data
,
Juan Sequeda and Daniel Miranker
,
[OpenAccess]
,
[Publisher]
Relational Database to RDF (RDB2RDF) systems executes SPARQL queries on the relational data. Past studies have shown that RDB2RDF systems do not perform well, in other words, the execution time of a SPARQL query on a RDB2RDF system compared to its semantically equivalent SQL query is much slower. Therefore, we ask ourselves, what optimizations are needed in order to support effective SPARQL execution on relationally stored data? We experimented on Microsoft SQL Server, using the Barton and Berlin SPARQL Benchmark, and Ultrawrap, an automatic RDB2RDF wrapping system that has been architectured to leverage the SQL optimizer. Our initial results identify two important optimizations for effective SPARQL execution using Ultrawrap: detection of unsatisfiable conditions and self-join elimination.
-
Scaling Data Linkage Generation with Domain-Independent Candidate Selection
,
Dezhao Song and Jeff Heflin
,
[OpenAccess]
,
[Publisher]
We propose a candidate selection algorithm for scalably detecting coreferent instance pairs from heterogeneous Semantic Web data sources. Our algorithm selects candidate pairs by computing a characterlevel similarity on disambiguating literal values that are chosen using domain-independent unsupervised learning. We index the instances on such values to efficiently look up similar instances. Our algorithm is evaluated on three instance categories in two RDF datasets.
-
ScienceWISE: A Web-based Interactive Semantic Platform for Scientific Collaboration
,
Alexey Boyarsky,Philippe Cudré-Mauroux,Gianluca Demartini,Oleg Ruchayskiy and Karl Aberer
,
[OpenAccess]
,
[Publisher]
-
SemPuSH: Privacy-Aware and Scalable Broadcasting for Semantic Microblogging
,
Pavan Kapanipathi,Julia Anaya and Alexandre Passant
,
[OpenAccess]
,
[Publisher]
Users of traditional microblogging platforms such as Twitter face drawbacks in terms of (1) Privacy of status updates as a followee – reaching undesired people (2) Information overload as a follower – receiving uninteresting microposts from followees. In this paper we demonstrate distributed and user-controlled dissemination of microposts using SMOB (semantic microblogging framework) and Semantic Hub (privacy-aware implementation of PuSH 3 protocol) . The approach leverages users’ Social Graph to dynamically create group of followers who are eligible to receive micropost. The restrictions to create the groups are provided by the followee based on the hastags in the micropost. Both SMOB and Semantic Hub are available as open source.
-
Semantator: A Semi-automatic Semantic Annotation Tool for Clinical Narratives
,
Dezhao Song,Christopher Chute and Cui Tao
,
[OpenAccess]
,
[Publisher]
In this paper, we introduce Semantator, a semi-automatic tool for document annotation with Semantic Web ontologies. With a loaded free text document and an ontology, users can annotate document fragments with classes in the ontology to create instances and relate created instances with ontology properties. Also, Semantator enables automatic annotation by connecting to the NCBO annotator and the clinical Text Analysis and Knowledge Extraction Systems (cTAKES). By representing annotations in OWL, Semantator has basic reasoning capability based upon the underlying semantics of owl:disjointWith and owl:equivalentClass.
-
Semantic Index: Scalable Query Answering without Forward Chaining or Exponential Rewritings
,
Mariano Rodriguez-Muro and Diego Calvanese
,
[OpenAccess]
,
[Publisher]
-
Semantic Navigator: Use of Semantic Data in Web Navigation
,
Jan Michelfeit and Tomáš Knap
,
[OpenAccess]
,
[Publisher]
Semantic web search engines can take advantage of machineunderstandable data published on the Web to provide more precise search results and advanced query capabilities. Semantic data embedded in web documents (serialized as RDFa or microformats, for example) can be used in conjunction with a semantic web search engine to provide a better web navigation experience. We present Semantic Navigator, 1 a Mozilla Firefox extension than brings advantages of semantic search to ordinary users. Semantic Navigator detects semantic data in web documents and with the aid of a semantic web search engine, it enables users to easily navigate to related documents containing information about a selected resource or about one of its properties.
-
Semantic Smoothing for Twitter Sentiment Analysis
,
Hassan Saif,Yulan He and Harith Alani
,
[OpenAccess]
,
[Publisher]
Twitter has brought much attention recently as a hot research topic in the domain of sentiment analysis. Training sentiment classifiers from tweets data often faces the data sparsity problem partly due to the large variety of short forms introduced to tweets because of the 140-character limit. In this work we propose using semantic smoothing to alleviate the data sparseness problem. Our approach extracts semantically hidden concepts from the training documents and then incorporates these concepts as additional features for classifier training. We tested our approach using two different methods. One is shallow semantic smoothing where words are replaced with their corresponding semantic concepts; another is to interpolate the original unigram language model in the Naive Bayes (NB) classifier with the generative model of words given semantic concepts. Preliminary results show that with shallow semantic smoothing the vocabulary size has been reduced by 20%. Moreover, the interpolation method improves upon shallow semantic smoothing by over 5% in sentiment classification and slightly outperforms NB trained on unigrams only without semantic smoothing.
-
Semantically-driven recursive navigation and retrieval of data sources in the Web of Data
,
Valeria Fionda,Claudio Gutierrez and Giuseppe Pirró
,
[OpenAccess]
,
[Publisher]
This paper introduces a semantically-driven recursive RDF link-based navigation system for the Web of Data. It combines the power of regular expressions with triggers based on SPARQL queries, to perform controlled exploration and retrieval of semantic data sources.
-
Sewelis: Exploring and Editing and RDF Base in an Expressive and Interactive Way
,
Sebastien Ferre and Alice Hermann
,
[OpenAccess]
,
[Publisher]
Query-based Faceted Search (QFS), introduced in a research paper at ISWC’11, reconciles the expressiveness of querying languages (e.g., SPARQL), and the benefits of exploratory search found in faceted search. Because of the interactive nature of QFS, which is difficult to fully render in a research paper, we feel it is important to complement it with a demonstration of our QFS prototype, Sewelis (aka. Camelis 2). An important addition to the research paper is the extension of QFS to the guided edition of RDF bases, where suggestions are based on existing data. This paper motivates our approach, shortly presents Sewelis, and announces the program of the demonstration. Screencasts of the demonstration, as well as material (program and data) to reproduce it, are available at http://www.irisa.fr/LIS/softwares/sewelis. Motivation A challenge of the Semantic Web is to enable the largest audience to explore and edit the knowledge expressed in the Semantic Web languages, in particular RDF(S) and OWL [6]. Because of the complexity of those languages, existing tools generally make a trade-off between expressiveness, i.e., the coverage of the chosen language, and usability. At one end of the spectrum, there are SPARQL endpoints and OWL editors (e.g., Prot´ eg´ e), which allow for full expressiveness but require advanced logical and technical knowledge. At the other end, there are Web applications that completely hide technical details but have limited functionalities (e.g., RSS). In between, there are graph visualization systems (e.g., Bramble [9]) and faceted search systems (e.g., SlashFacet [5], BrowseRDF [8], gFacet [3]) that offer a flexible and guided exploration of RDF graphs. However, those systems are far from covering the expressiveness of SPARQL (e.g., only paths of properties can be expressed). Concerning the edition of a RDF base, there exist tools that abstract away from the concrete syntax of formal languages, and guide users [7,2]. However, while their guiding generally takes into account language grammars and domain ontologies, it lacks of accuracy because it does not take into account the existing objects and their descriptions.
-
Synth – Linked Data Application Implementation Environment
,
Mauricio Henrique De Souza Bomfim and Daniel Schwabe
,
[OpenAccess]
,
[Publisher]
In this demo, we present Synth, a new development environment to implement Linked Data applications. It is best used in conjunction with the SHDM method, making it possible to take any RDF data available on the Linked Data cloud, extend it with one’s own data, and provide a Web application that exposes and manipulates this data to perform a given set of tasks, including navigation, as well as general business logic.
-
TOPICA: A Tool for Visualising Emerging Semantics of POIs based on Social Awareness Streams
,
Amparo E. Cano,Gregoire Burel,Aba-Sah Dadzie and Fabio Ciravegna
,
[OpenAccess]
,
[Publisher]
Topica is an application that enriches the Social Web with semantic data, to enable collective perception of Points of Interest (POIs), which are human constructs that describe information about locations (e.g., restaurants, attractions, cities). Topica provides an extra layer of information, compared to existing applications for browsing POIs, by modelling hidden characteristics of POIs, by: (1) generating a Linked Data representation of the collective perception of a POI; (2) enhancing the POI representation by mashing up services that enrich the POI’s related entities; and (3) providing a visual representation of the POIs adapted to suit userand context-sensitive filters. Topica identifies topics relevant to a POI by extracting DBpedia categories from entities (e.g., People, Places) and keywords (e.g., Crete, Bonn) obtained from social awareness streams related to the POIs. Key words: Linked Data, Semantic Web, Point of Interest, Social Awareness Streams, citizen-sensing, social data mining, emerging semantics 1
-
Three ways to sprinkle POWDER
,
Stasinos Konstantopoulos
,
[OpenAccess]
,
[Publisher]
In this paper, three alternative implementations of the POWDER W3C Recommendation are presented, compared, and discussed. The main issue with implementing POWDER is that POWDER inference accesses resources’ URIs in order to mass-annotate regular expressiondelineated groupings of URIs.
-
Tools for Pattern-Based Transformation of OWL Ontologies
,
Ondrej Svab-Zamazal,Enrico Daga,Marek Dudas and Vojtech Svatek
,
[OpenAccess]
,
[Publisher]
-
Towards Policy-aware Queries over Linked Data
,
Sebastian Speiser
,
[OpenAccess]
,
[Publisher]
-
TuneSensor: A Semantic-Driven Music Recommendation Service For Digital Photo Albums
,
Jiansong Chao,Haofen Wang,Wenlei Zhou,Weinan Zhang and Yong Yu
,
[OpenAccess]
,
[Publisher]
Digital photo album softwares like
iPhoto 1 have enjoyed great popularity for years. These years, online photo album services (e.g., Flic
kr 2 and
Picasa 3 ) have been becoming more and more popular with the development of social Web. In this paper, we demonstrate our effort called TuneSensor to recommend music for photo albums automatically. In particular, we exploit semantic data to represent both images and music. Furthermore, we leverage mining techniques to capture semantic relatedness between these different types of multimedia data, which is the essential step for recommendation.
-
inContext Sensing: LOD augmented sensor data
,
Myriam Leggieri,Alexandre Passant and Manfred Hauswirth
,
[OpenAccess]
,
[Publisher]
In this demo paper, we present a system that shows how users with no expertise in sensor data can benefit of Linked Data and semantic annotations to make sense of raw sensor data. Our motivations are that (1) these users are becoming the main consumers of sensor data, but sensors conceptualisation do not consider their point of view and (2) so far, no application dynamically creates Linked Data for sensors (as the linked datasets are usually predefined).