This document explains how SPARQL might be used for querying information in a Web 2.0 environment and how the SPARQL Protocol works for remote queries.

The Web 2.0 and Semantic Web are two common ideas that formulate the future of the Web. It is not yet clear which one will survive, but it is most-likely we will get a platform containing best ideas from both. Many experts claim that Web 2.0 is just a "marketing" name for Semantic Web, although some differences may still exist. The main principle outlined by both paradigms is an ability to extract and query information across the informational space which includes Web sites, documents, databases, Web services, libraries or repositories. Semantic Web has introduced a new computing paradigm based on the notion of non-ambiguous metadata descriptions that can describe not only things you can find on the Web but also things that reside in enterprise data stores and even physical objects. These metadata descriptions have been standardized by the World Wide Web Consortium as Resource Description Framework (RDF) as early as in 1999.

The SPARQL Protocol and RDF Query Language (SPARQL) [sparkle] is a query language designed to meet the requirements and design objectives described in the "RDF Data Access Use Cases". It provides facilities to:

  • extract information in the form of URIs, blank nodes, plain and typed literals
  • extract RDF subgraphs, and
  • construct new RDF graphs based on information in the queried graphs.

As a data access language, it is suitable for both local and remote use. It's a piece of cake when we try to use SPARQL locally, but for remote use the SPARQL Protocol for RDF has been designed to be more stringent. This protocol is an interface for conveying SPARQL queries from clients to query processors, and several bindings like HTTP and SOAP have been introduced to achieve connectivity.

In this document I will explain how SPARQL might be used for querying information and how a SPARQL Protocol works for remote queries. A reader is expected to be familiar with RDF concepts.

Evolution of objectives

Although there are several standards covering RDF with regard to storing and defining data, there had not been any work done to create standards for querying or accessing RDF data. Likewise, there was no formal, publicly standardized data access protocol for interacting with remote or local RDF storage servers. There were no standards for querying RDF data when RDF storage model appeared, so many developers in commercial and in open source projects created query languages for accessing RDF data, over 20 at last count. A full list of different query language implementations can bee seen at http://www.w3.org/2001/11/13-RDF-Query-Rules/. But these languages lack both a common syntax and a common semantics. In fact, the existing query languages cover a significant semantic range: from declarative, SQL-like languages, to path languages, to rule or production-like systems. And SPARQL had to fill this gap.

SPARQL provides Web 2.0 users with a query language in much the same fashion as SQL provides relational database users with a query language.

The following requirements were taken into consideration when SPARQL was designed:

  • Graph pattern matching ability – the query language must include the capability to restrict matches on a queried graph by providing a graph pattern, which consists of one or more RDF triple patterns, to be satisfied in a query;
  • Variable binding results – It must be possible for queries to return zero or more bindings of variables. Each set of bindings is one way that the query can be satisfied by the queried graph;
  • Subgraph results – It must be possible for query results to be returned as a subgraph of the original queried graph;
  • Supportable local queries – The query language must be suitable for use in accessing local RDF data - that is, from the same machine or same system process;
  • Result limits – It must be possible to specify an upper bound on the number of query results returned;
  • Streaming results – It must be possible, when returning multiple unordered results, for the client to request that results be streamed. When the client requests streaming results, all the data in one result must be available to the client before all the data for the next result.
  • WSDL support – The protocol – including its interfaces, their operations, results, and types – must be described using WSDL. This is essential for remote queries.

Currently SPARQL requirements have stabilized and SPARQL query language is now a Candidate Recommendation which means that it will be a standard (W3C Recommendation) at the next stage.

How to write SPARQL queries

An RDF graph is a set of triples; each triple consists of a subject, a predicate and an object. These triples can come from a variety of sources. The SPARQL query language is based on matching graph patterns. The simplest graph pattern is the triple pattern, which is like an RDF triple, but with the possibility of a variable instead of an RDF term (a simple atom in RDF structure without blank nodes) in the subject, predicate or object positions. Combining triple patterns gives a basic graph pattern, where an exact match to a graph is needed to fulfill a pattern.

The example below shows a SPARQL query to find the author of a book from the information in the given RDF graph. Let's take the following RDF information (example1.rdf):

<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/author> "Peter Mikhalenko" .

The query consists of two parts, the SELECT clause and the WHERE clause. The SELECT clause identifies the variables to appear in the query results, and the WHERE clause has one triple pattern (example1.sparql.txt):

Listing A

SELECT ?author
WHERE
{
<http://example.org/book/book1> <http://purl.org/dc/elements/1.1/author> ?author .
}

This is what we will get from this simplest query:

author
------------------
"Peter Mikhalenko"

The terms delimited by "<>" are IRI references (Internationalized Resource Identifiers, described by RFC3987). IRIsare a generalization of URIs and are fully compatible with URIs and URLs. SPARQL provides two abbreviation mechanisms for IRIs, prefixed names and relative IRIs.

  • Prefixed names:The PREFIX keyword associates a prefix label with an IRI. A prefixed name is a prefix label and a local part, separated by a colon ":". It is mapped to an IRI by concatenating the local part to the IRI corresponding to the prefix.
  • Relative IRIs:The BASE keyword defines the Base IRI used to resolve relative IRIs.

The general syntax for literals is a string (enclosed in quotes, either double quotes "" or single quotes '' ), with either an optional language tag (introduced by @) or an optional datatype IRI or prefixed name (introduced by ^^).

Variables in SPARQL queries have global scope; it is the same variable everywhere in the query that the same name is used. Variables are indicated by "?"; the "?" does not form part of the variable. "$" is an alternative to "?". In a query, $varand ?
var
are the same variable.

Gathering all above said, let’s have a look at three examples (example2.sparql.txt, example3.sparql.txt, example4.sparql.txt) which express the same query.

The same piece of RDF data (example1.rdf) can be represented in a so-called Turtle format, which allows URIs to be abbreviated with prefixes (example1.rdf.turtle.txt):

Listing B

@prefix dc: <http://purl.org/dc/elements/1.1/> .
@prefix : <http://example.org/book/> .
:book1 dc:author "Peter Mikhalenko" .

The term "binding" is used as a descriptive term to refer to a pair of [variable; RDF term]. However not every binding needs to exist in every row of the table. This is how optional parts of the graph pattern may be specified syntactically with the OPTIONAL keyword applied to a graph pattern:

Let’s take a piece of data (example2.rdf.turtle.txt):

Listing C

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .

_:a rdf:type foaf:Person .
_:a foaf:name "Peter" .
_:a foaf:mbox <mailto:test@peter.com> .
_:a foaf:mbox <mailto:peter@gmail.com> .

_:b rdf:type foaf:Person .
_:b foaf:name "Mary" .

The query with OPTIONAL pattern will look like this (example5.sparql.txt):

Listing D

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE { ?x foaf:name ?name .
OPTIONAL { ?x foaf:mbox ?mbox }
}

And the result will be the following:

name mbox
------- -----------------------
"Peter" <mailto:test@peter.com>
"Peter" <mailto:peter@gmail.com>
"Mary"

There is no value of mbox in the solution where the name is "Mary". It is unbound. This query finds the names of people in the data. If there is a triple with predicate mbox and same subject, a solution will contain the object of that triple as well. In the example, only a single triple pattern is given in the optional match part of the query but, in general, it is any graph pattern. The whole graph pattern of an optional graph pattern must match for the optional graph pattern to add to the query solution.

Results can also be returned in XML using the SPARQL Variable Binding Results XML Format, we will examine it later when SPARQL Protocol will be considered.

The results of a query is the set of all pattern solutions that match the query pattern, giving all the ways a query can match the graph being queried. Each result is one solution to the query and there may be zero, one or multiple results to a query. Say, for example, we have the following data (example3.rdf.turtle.txt):

Listing E

@prefix foaf: <http://xmlns.com/foaf/0.1/> .

_:a foaf:name "John Hijacker" .
_:a foaf:mbox <mailto:jh@example.com> .
_:b foaf:name "DmitryPovarenko" .
_:b foaf:mbox <mailto:dmitry@example.org> .

Then the query (example6.sparql.txt):

Listing F

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name ?mbox
WHERE
{ ?x foaf:name ?name .
?x foaf:mbox ?mbox }

Will give the following result:

name mbox
---- ----
"John Hijacker" <mailto:jh@example.com>
"DmitryPovarenko" <mailto:dmitry@example.org>

The results enumerate the RDF terms to which the selected variables can be bound in the query pattern. There are also a number of syntactic forms that abbreviate some common sequences of triples, for details it’s better to turn to original SPARQL specification.

An RDF Literal is written in SPARQL as a string containing the lexical form of the literal, followed by an optional language tag or an optional datatype. There are convenience forms for numeric-types literals which are of type xsd:integer, xsd:decimal, xsd:double and also for xsd:boolean. The data below contains a number of RDF literals (example4.rdf.turtle.txt). The pattern in the following query has a solution :x because 42 is syntax for "42"^^<http://www.w3.org/2001/XMLSchema#integer>:

SELECT ?v WHERE { ?v ?p 42 }

The following query has a solution with variable vbeing :y:

SELECT ?v WHERE
{ ?v ?p "abc"^^<http://example.org/datatype#specialDatatype> }

Graph pattern matching creates bindings of variables. It is possible to further restrict solutions by constraining the allowable bindings of variables to RDF Terms. Value constraints take the form of boolean-valued expressions; the language also allows application-specific constraints on the values in a solution. Let’s take the following data (example5.rdf.turtle.txt) and a query (example7.sparql.txt). The result of the query will be the following dataset:

title price
------------------ -----
"The Semantic Web" 23

By having a constraint on the "price" variable, only book2 matches the query because there is a restriction on the allowable values of "price". Constraints can be given in an optional graph pattern as this example shows (the same data, example5.rdf.turtle.txt) and a query (example8.sparql.txt). The result will be the following:

title price
------------------ -----
"SPARQL Tutorial"
"The Semantic Web" 23

No price appears for the book with title "SPARQL Tutorial" because the optional graph pattern did not lead to a solution involving the variable "price".

SPARQL provides a means of combining graph patterns so that one of several alternative graph patterns may match. If more than one of the alternatives matches, all the possible pattern solutions are found. For pattern alternatives in a query you can use UNION keyword. For example, for this data (example6.rdf.turtle.txt) the query (example9.sparql.txt) will give the following result:

title
--------------------------------
"SPARQL Protocol Tutorial"
"SPARQL"
"SPARQL (updated)"
"SPARQL Query Language Tutorial"

This query finds titles of the books in the data, whether the title is recorded using Dublin Core (a standardized set of document properties) properties from version 1.0 or version 1.1.

Query patterns generate an unordered collection of solutions. These solutions are then treated as a sequence, initially in no specific order; any sequence modifiers are then applied to create another sequence. The solution sequence can be modified by adding the DISTINCT keyword which ensures that every combination of variable bindings (i.e. each solution) in the sequence is unique For example, with data example7.rdf.turtle.txt and query example10.sparql.txtresult will be the following:

name

-------
"Alice"

The ORDER BY clause takes a solution sequence and applies ordering conditions. An ordering condition can be a variable or a function call. The direction of ordering is ascending by default. It can be explicitly set to ascending or descending by enclosing the condition in ASC(
)
or DESC() respectively. If multiple conditions are given, then they are applied in turn until one gives the indication of the ordering.

The LIMIT form puts an upper bound on the number of solutions returned. If the number of actual solutions is greater than the limit, then at most the limit number of solutions will be returned. OFFSET causes the solutions generated to start after the specified number of solutions.

The SELECT form of results returns the variables directly. The syntax SELECT * is an abbreviation that selects all of the variables. For example, for data example8.rdf.turtle.txt and query example11.sparql.txt the result will be the following:

nameX nameY nickY
------- ------- -----
"Alice" "Bob"
"Alice" "Clare" "CT"

Results can be thought of as a table or result set, with one row per query solution. Some cells may be empty because a variable is not bound in that particular solution. Result sets can be accessed by the local API but also can be serialized into either XML or an RDF graph. In XML format we will have the same dataset looking like this (example11.result.sparql.xml):

Listing G

<?xml version="1.0"?>
<sparqlxmlns="http://www.w3.org/2005/sparql-results#">
<head>
<variable name="
nameX"/>
<variable name="nameY"/>
<variable name="
nickY"/>
</head>
<results>
<result>
<binding name="nameX">
<literal>Alice</literal>
</binding>
<binding name="nameY">
<literal>Bob</literal>
</binding>
</result>
<result>
<binding name="nameX">
<literal>Alice</literal>
</binding>
<binding name="nameY">
<literal>Clare</literal>
</binding>
<binding name="nickY">
<literal>CT</literal>
</binding>
</result>
</results>
</sparql>

SPARQL Protocol

SPARQL Protocol is designed in two ways: first, as an abstract interface independent of any concrete realization, implementation, or binding to another protocol; second, as HTTP and SOAP bindings of this interface.

The SPARQL Protocol is described abstractly with WSDL 2.0 in terms of a Web service that implements its interface, types, faults, and operations, as well as by HTTP and SOAP bindings. Current SPARQL Protocol description is hosted by the following address and can be used by any Web service processors or other applications: http://www.w3.org/TR/rdf-sparql-protocol/sparql-protocol-query.wsdl.

Let’s take a simple query (example12.sparql.txt) and have a look how it will work through the HTTP connection. This is an HTTP GET query that SPARQL frontend will ask from the SPARQL Web service located, say, at http://sparql.service.com/sparql:

Listing H

GET /sparql/?query=PREFIX+dc:+&
lt;http://purl.org/dc/elements/1.1/>%13SELECT+?book+?who%13WHERE+
{+?book+dc:creator+?who+}
Host: sparql.service.com
User-agent: sparql-client/0.1

In the GET request there is an URL-encoded SPARQL query (spaces are replaced by '+’ symbol, newline symbols are replaced by %13, which is a hexadecimal value of newline char number). An HTTP server will return the following for a handled query:

Listing I
HTTP/1.1 200 OK
Date: Fri, 06 May 2005 20:55:12 GMT
Server: Apache/1.3.29 (Unix) PHP/4.3.4 DAV/1.0.3
Connection: close
Content-Type: application/sparql-results+xml

<?xml version="1.0"?>
<sparqlxmlns="http://www.w3.org/2005/sparql-results#">

<head>
<variable name="book"/>
<variable name="who"/>
</head>
<results distinct="false" ordered="false">
<result>
<binding name="book"><uri>http://www.example/book/book5</uri></binding>
<binding name="who"><bnode>r29392923r2922</bnode></binding>
</result>
...
<result>
<binding name="book">
<uri>http://www.example/book/book6</uri></binding>
<binding name="who"><bnode>r8484882r49593</bnode></binding>
</result>
</results>
</sparql>

A query can be also sent over SOAP. The file example13.sparql.soap.txt contains an example of a SOAP query sent over HTTP POST query, and example13.sparql.result.soap.txt contains the corresponding SOAP response.

An evolving protocol

This article is just an introduction into SPARQL query language and its binding protocols, because it’s already evolved into a rich all-sufficient query language suitable for Web 2.0 and Semantic Web platforms, and it is impossible to cover all aspects of the language and protocol here. For further details please have a look at SPARQL specifications. There are a number of issues that SPARQL does not address yet; most notably, SPARQL is read-only and cannot modify an RDF dataset. SPARQL actually consists of three separate specifications: the query language specification, SPARQL data access Protocol, and XML format of query results.

Do you need help with XML? Gain advice from Builder AU forums

Comments

1

OnnoPaap - 19/12/06

Could you please make example13.sparql.soap.txt and example13.sparql.result.soap.txt available for download?
It's possible that this website has it somewhere, but if so, I can't find it.

» Report offensive content

Leave a comment

You must read and type the 6 chars within 0..9 and A..F

* indicates mandatory fields.

1

OnnoPaap - 19/12/06

Could you please make example13.sparql.soap.txt and example13.sparql.result.soap.txt available for download? It's possible that this website has it somewhere, but if so, I ... more

Log in


Sign up | Forgot your password?

  • Staff Microsoft prescribes more REST

    Details have begun to emerge about the next versions of Visual Studio and Windows Server this week -- and the message from Redmond is to REST up Read more »

    -- posted by Staff

  • Chris Duckett .NET looks to REST

    With news that REST will play a big part in the next version of the .NET Framework, it is timely to take a look at ADO.NET. Read more »

    -- posted by Chris Duckett

  • Renai LeMay Spellr.us needs a new dictionary

    One of the only Australian start-ups to present at the recent round of conferences in the US was Sydney-based spellr.us, which has launched a Web-based tool to check and monitor websites for spelling mistakes. Read more »

    -- posted by Renai LeMay

What's on?