Example Driven Data Exploration

Exploring Knowledge through Examples @ ESWC'20

Tutorial on Exploring Knowledge through Examples

Matteo Lissandrini (Aalborg University), Davide Mottin (Aarhus University), Themis Palpanas (University of Paris, IUF), Yannis Velegrakis (Utrecth University)

Exploration is one of the primordial ways to accrue knowledge about the world and its nature. As we accumulate, mostly automatically, data at unprecedented volumes and speed, our datasets have become complex and hard to understand. In this context, exploratory search provides a handy tool for progressively gather the necessary knowledge by starting from a tentative query that can provide cues about the next queries to issue.

An exploratory query should be simple enough to avoid complicate declarative languages (such as SQL or SPARQL) and convoluted mechanism, and at the same time retain the flexibility and expressiveness required to express complex information needs. Recently, we have witnessed a rediscovery of the so called example-based methods, in which the user, or the analyst circumvent query languages by using examples as input.

This shift in semantics has led to a number of methods receiving as query a set of example members of the answer set. The search system then infers the entire answer set based on the given examples and any additional information provided by the underlying database. In this tutorial, we present an excursus over the main example-based methods for exploratory analysis. We show how different data types require different techniques, and present algorithms that are specifically designed for relational, textual, and graph data. We conclude by providing a unifying view of this query-paradigm and identify new exciting research directions.

Contents of the Tutorial

ESWC'20 in Heraklion, Greece
A new version of our tutorial will be presented on at ESWC'20 in Heraklion, Greece.

Motivation

Exploratory search includes methods to efficiently extract knowledge from data repositories, even if we do not know what exactly we are looking for, nor how to precisely describe our needs. The need for new and effective exploratory search methods is particularly relevant given the current abundance and richness of today’s large datasets (e.g., Linked Open Datasets). In common exploratory settings, the user progressively acquires the knowledge by issuing a sequence of generic queries to gather intelligence about the data.

The existing body of work in data analysis, assumes the user is willing to pose several well defined or structured queries to the underlying database in order to progressively gather the required information. This assumption stems from the intuition that the user is accustomed to data analysis techniques. Yet, this assumption is not always true.

Objectives

We survey the main approaches for exploratory queries, highlighting the main differences among data models, and presenting indepth insights of the current status of research in this area. The final goal is to provide a comprehensive overview of novel data-management techniques that can empower advanced exploratory search systems.

In particular, we will highlight the existing example-based methods that have been already studied to improve knowledge graph search, SPARQL query formulation, and data exploration of RDF data. Moreover, we aim to present techniques that have been studied in other research areas and that could be suc- cessfully applied in the Semantic Web domain.

Topic Summary

The first and second part of the tutorial introduce the broad topic of data exploration, highlighting the hardness of query languages for simple users and advocating the need of different query methods.

The third part of the tutorial discusses the current main techniques for textual, and graph data, with an excursus on relational data as well in order to provide a complete picture on the power of the approach.

The fourth part of the tutorial focuses on the latest developments of machine learning to progressively discover user intention.

Outline

  1. Introduction, motivation, and formulation
  2. The origin: Example-based approaches for structured data (relational)
  3. Example-based approaches for semi-structured and unstructured data (graphs and text)
  4. Learning methods based on examples
  5. Challenges and Discussion