RDF and Linked Data Validation - ESWC'16 Tutorial

Slides

Examples and other material will be available at this repository


Abstract

RDF promises a distributed database of repurposable, machine-readable data. Although the benefits of RDF for data representation and integration are indisputable, it has not been embraced by everyday programmers and software architects who care about safely creating and accessing well-structured data. Semantic web projects still lack some common tools and methodologies that are available in more conventional settings to describe and validate data. In particular, relational databases and XML have popular technologies for defining data schemas and validating data. These currently have no analog in RDF.

Shape Expressions (ShEx) has been designed as an intuitive and human-friendly high level language for RDF validation.

In 2014, the W3c chartered a working group called RDF Data Shapes to produce a language for defining structural constraints on RDF graphs. The proposed technology has been called SHACL (Shapes Constraint Language) and a first public working draft has been published in October, 2015.

In this tutorial we will present both ShEx and SHACL using examples and RDF data modelling exercises.

Like the popular SPARQL by example tutorial, this tutorial includes step-by-step instructions with examples followed by exercises. Participants can download validation tools to use locally or use web-based interfaces like RDFShape or W3C ShEx Workbench.

Overview

RDF is growing in popularity for both data transfer and data storage/recall. In both of these capacities, it is important to describe and verify conformance with a particular graph structure. While the Semantic Web is an environment where anybody can say anything about any topic, we still need to make sure that clinical, genetic, manufacturing, etc. databases capture data in a predictable way.

When we record or exchange data, programs or human operators are expected to synthesize and interpret data. In order to safely process data, this additionally requires that the data maintains a specified structure and can be described by that structure.

Non-RDF data storage systems offer and rely on schemas both to increase data integrity and to enable efficient storage and static query analysis for optimization. SQL's Data Definition Language completely constrains what may appear in an SQL database (with minor exceptions like some databases that don't ensure homogeneity in a column). XML's use of W3C XML Schema and Relax NG typically involves validation on data creation and ingestion. Even JSON Schema is growing in popularity as that developer community recognizes the need for basic structural description.

RDF, and graph stores in general, don't demand an initial schema definition like SQL, but operate more like XML where the basic language allows many structural constructs but specific applications impose further practical demands. In that sense ShEx and SHACL work with the open spirit of RDF (natively schema-less), while giving developers and data architects a tool to impose and validate some specific constraints.

The practicalities of data exchange faced by the Open Services Life Cycles collaboration lead to the development of Resource Shapes, a language for communicating the data structures managed by Linked Data Platform endpoints. Likewise, the Dublin Core defined Description Set Profiles for describing constraints and expectations about bibliographic records. None of these underwent a standardization and implementation phase leading to widely deployed, general-purpose validation tools.

The current work developed by the Shape Expressions community and W3c Data Shapes Working group may help to improve RDF adoption in industrial scenarios where there is a real need to ensure the structure of RDF data, both in production and consumption.

More information about ShEx is available at the ShEx Primer and about SHACL at the First Public Working Draft.

Topics

Goals

Audience

The audience should be comfortable either with using git and a JVM or javascript VM like node, or just their web browser. A rudimentary knowledge of RDF and Turtle is expected. Like SPARQL by Example, this is intended to introduce the audience to a new (to them) language.

Tutoring team

Registration

To register, visit: http://2016.eswc-conferences.org/

Schedule

The tutorial will be given on 30th May, 9h-12:30h (see Conference Program)