Linked Data Hack Days, 13th-14th January 2010, Bristol
Organised in conjunction between DevCSI and the Research Revealed project at ILRT, January’s Linked Data Hack Day event was very much a doing event, not a talking event.
Delegates were encouraged to share skills and knowledge, and to form into self-organising teams with a common goal to produce something practical, using linked data.
Around a quarter of those present were completely new to linked data and a quarter were experts, whilst the remaining half were somewhere in between. This diversity lead to a range of discussions, mini projects and proof of concept demonstrations.
Why Did People Come?
Participants came from across the UK and Europe from a mixture of backgrounds. They were all encouraged at the outset to share their reasons for coming, their experiences, and what they hoped to learn from the event.
Some of their answers are captured in this short video:
Steer provided an introduction to the Research Revealed project, which co-hosted the event. He discussed applications of their work, including a dashboard which shows an overview what types of research are going on within an institution, and a bookmarking tool which helps to demonstrate the impact of research outputs.
He observed that each of these applications are data driven, but that the data required is stored is over the place in central systems, personal websites, publishers, partner institutions, finance departments, to name but a few. Part of their project has involved getting this data exposed as linked data so that they can use it within their applications. The team made their exposed data available for participants at the Hack Day event to play with, and suggested some challenges they would like to see tackled, based on their experiences so far.
Overview of Initial Ideas
Rogers provided a brief overview of some of the ideas which had been shared in advance on the event wiki as potential projects. These included:
- Rapid Development – to focus on getting people started with RDF really quickly;
- Optimising queries – based on a problem experienced by the Research Revealed team;
- Comparing and contrasting semantic web frameworks – an opportunity for linked data newbies to team up with experts to examine the frameworks currently available, with an emphasis on trying some of them out to get practical experience;
- Linked data for HE – including several suggestions for possible projects examining the ways in which linked data could be used within an HE context, whilst considering practical sustainability issues, the architecture required and RDF management;
- Starting from scratch – a group for those who have lots of XML data and want to move forward.
An ideas wall was provided for participants to share their own ideas in addition to these, which could then lead to the formation of small groups to focus on these issues in a practical way.
A number of available data sets were also shared on the wiki to provide material for practical prototypes or proof of concepts. These included data from: data.gov.uk, JISC project data, openbiblio.net, oxpoints, the University of Bristol, the University of Southampton and the OU.
Introduction to SPARQL
Andy Seaborne, Epimorphics Ltd, Apache Jena
At the request of many newbies, Andy Seaborne provided a one hour introduction to SPARQL, in which he covered the basics by defining SPARQL, describing the elements of RDF, and showing graphically how RDF works. He also discussed syntax, semi-structured data, and demonstrated some queries at sparql.org to give participants a practical example of how queries work and what is returned. This provided an excellent basis from which to work for newcomers to linked data.
Participants shared ideas for potential projects and areas for exploration…
This brainstorming lead to a number of distinct groupings:
1. RDF Rapid Development: particularly of interest to newbies who wanted to get some basic practical hands on experience;
2. Graphical SPARQL Queries: focusing on evaluating some tools and brainstorming features;
3. Semantic Web Frameworks: exploring the frameworks available through collaboration between newbies and experts;
4. Linked Data in Education: an umbrella group which included lots of potential projects;
5. Geo and Linked Data;
6. Characterising Large Data Sets, Stats and Visualisations.
Participants gathered together around topics of interest and discussed ideas for specific projects before splitting into subgroups, where appropriate, and getting stuck in.
End of Day One
From these initial groupings emerged four distinct groups, with several subgroups and an exchange of people and skills. By the end of day one, the groups had achieved the following:
Rapid Development Group
Those interested in exploring semantic web frameworks joined forces with this group to create a collaborative environment between newbies and experts. The group carried out an exercise to proved that they could run a couple of SPARQL queries using Fuseki and some data from the University of Southampton. Newbies reported that they had learned a lot and that the work so far had served as a good introduction to linked data, aided also by Andy Seaborne’s talk. The group came up with three possible directions in which they could diverge next:
- Answering questions about how to get XML data into a triple store – they observed that there are currently no common standard tools, so it would be useful get a couple of XML templates to turn XML data into RDF as a way of getting started;
- Doing some exercises with CURL;
- Looking at how to “put a face on” raw RDF data in order to display it back to the user.
The group had also started to accumulate the bits of sample code and notes together as a shared resource to help get people started.
Linked Data in HE Group
There were lots of ideas within this topic, so the group fragmented into several discussions and smaller projects. These included:
- Developers from the universities of Southampton, Oxford, Bristol and the OU checking that they are using the same vocabularies to enable better connections;
- Creating a prototype RDF validator to check and recommend ways to improve the structure of a data set;
- The early development of data.ac.uk as a list of available university data, URIs and email contacts, which could potentially be handed over to JISC;
- Exploring how to transfer an RDF document into a Ruby object;
- Discussing how to get people to submit a simple profile and use this to share information across universities;
- A visualisation of courses at the OU, identifying clusters of courses and the combinations of courses taken, using Gephi;
- Examining JISC project data to align identifiers and predicates so that the datasets are consistent, then moving on to making it easier for people to query across the data sets.
Graphical SPARQL Queries Group
A small held discussions about the problem space of getting non-experts engaging and exploring the data using SPARQL queries. They plan to continue by whiteboarding and possibly prototyping the type of tool that they require, based on these discussions.
Characterising Large Data Sets, Stats and Visualisations Group
Another small group examined the possible use of Graph Theory to analyse characteristics of live data sets – with a focus on making lists of resources and exploring tools, which they shared in the event wiki.
Day two began with a series of lightening talks presented by participants, discussing their current work and experiences. These included:
Mike Jones, ILRT
Gephi and Google Refine
Ben O’Steen, University of Oxford/University of Cambridge
Consuming Linked Data from Universities
Fouad Zablith, OU
Fuseki Linked Data API
Andy Seaborne, Epimorphics Ltd, Apache Jena
Project Catalogues with Simal
Sander van der Waal, OSS Watch
Representatives from several of the groups shared their experiences with me in more detail in these video interviews:
Ben O’Steen and Fouad Zablith discuss the visualisation work they carried out using Open University linked data to map student course choices….
Mark MacGillivray explains his collaborative linked data project and some of the associated issues identified by himself and his team mate, Dave Challis….
Juan Galan-Paez and Pedro Almagro-Blanco from the University of Seville, Spain, explain why they travelled to the UK to participate and what they had gained from the experience…
Libby Miller from the BBC discussed her insights from the event and some of the work carried out by her group to characterise large datasets using Graph Theory…
Elis Newton and Laura Barber, both broadcast media co-ordinators with the BBC, discuss why they came to the event and what the value was to them as complete newcomers to the linked data field…
Graham Klyne from the University of Oxford discusses how working with newcomers to the field in the Rapid Development with RDF group helped him reflect and learn whilst building up useful resources…
Chris Gutteridge from the University of Southampton describes new tools and collaborations, and explains how the higher education sector could move forward to embrace linked, open data….
End of Day Two
The event concluded with a number of demonstrations and presentations feeding back the results of each group’s efforts. These included updates from the following:
Rapid Development with RDF
The concrete outcome from this group was a new google code project, for which they are now inviting contributions to help generate a useful resource with snippets of code new linked data developers can use to get going. The group also split out into a number of subgroups based on language preferences, with some investigating quick and simple ways to abstract from the RDF so they can deliver something for their users, and others using command line scripting with CURL.
Characterising Large Datasets
The group used a (1907) Metric RDF graph structure that one group member had applied to molecule comparison context (Tanimoto coefficient), and applied this metric to BBC data overnight. This demonstrated some interesting connections between molecular and programme data, and provoked new lines of investigation for participants to explore within their own work.
Linked Data in HE
A number of further projects had evolved within this large group by day two, including:
JISC Project Data
A project which examined how JISC data can be aggregated in a harmonised way and then queried in different ways. The group looked at common schemas and how to markup that data.
Mark MacGillivray and Dave Challis created a tool which allows transferable profiles so people can keep their own data up to date even when they move institutions and therefore lose access to source data. They demonstrated a basic version of this tool using University of Southampton researcher profiles, which are exposed as RDF as well as HTML. They showed a secure edit form for key fields of data, and explained that the RDF gets validated in the background when the user submits edits. The tool uses sameAs to allow URI mapping to an external site, and the external site will generate a backwards link correspondingly, thus helping to establish trust relationships between data providers.
MacGillivray and Challis also discussed possible use cases, including some outside of education. These included online verification of your qualifications and a mechanism offering one central update point when moving house to communicate with all banks and utility companies.
The output from this event is shared on the event wiki, hosted by the ILRT. Here you will be able to find out more about how the projects have evolved since the event and find links to additional resources.