Event Report: Bibliohack
DevCSI teamed up with the Open Knowledge Foundation for two days of hacking and sharing ideas about open bibliographic metadata at Queen Mary University of London.
The event provided an opportunity to hack with open bibliographic datasets, to experiment with new tools, and to help improve existing systems to provide new ways for institutions to benefit from bibliographic data. It attracted a range of developers from the DevCSI community, bibliographic data specialists and librarians with a keen interest in making better use of open bibliographic data.
Workshops
Day one of the event saw a parallel stream of workshop sessions which provided a context for the practical work in the main hack room.
These sessions addressed the technical aspects of opening up cultural heritage data, examined some of the best open source tools available for doing that, and explored the best standards for preparing and exposing data for reuse.
Introduction to APIs and Linked Data
Preparing your Data for a Hackathon
Diverse Metatdata Standards and the Europeana Solution
Case Study: Cambridge University Library
Reuse of Open Cultural Heritage Data
Case study: The British Library
Lightning Talks
Participants in the hack element of the event shared their ideas and provided some background to their own work in the field in a series of lightning talks. These explored:
- Content and Data Mining and PDF extractor by Peter Murray Rust and Ross Mounce
- m-biblio Project by Mike Jones
- ORI/RJB by Ian Stuart
- Making a BibServer Parser by Etienne Posthumus
- IDFind: Identifying Identifiers by Emanuil Tolev
- BibServer: What we have been doing recently, how that ties into the open access index idea by Mark MacGillivray
- TEXTUS by Tom Oinn
- Pundit: Collaborative semantic annotations of texts by Simone Fonda
- Linked Data by Ian Stuart
Themes
Further discussion of these issues and the ideas shared via the event etherpad led to the realisation that many of the tools and solutions people wanted to work on throughout the event could be broadly brought under the umbrella of a Bibliographic Toolkit. Under this umbrella, a number of distinct groups formed to investigate specific issues.
These included…
BibServer Group
This group began exploring how to unite tools several tools, including PubCrawler, BibServer and BibSoup, to create a tool for abstracting and displaying bibliographic metadata. They planned to connect PubCrawler, which has collected 12 million bibliographic metadata records and will allow you to find out if papers are Open Access, with BibServer, which collects and displays the data, and BibSoup, which allows the creation of local community groups of references for specific interest areas. The group also planned to investigate how easy it was to get BibServer deployed on a variety of operating systems and document that process. They hoped to test the tool using data from the German National Bibliography data, with a subgroup working on how to parse this data ready for use.
In this short video interview, Mike Jones from the University of Bristol describes his side project associated with this group, which involved getting the m.biblio app he is currently developing to connect to BibServer…
This video is also available on Vimeo.
TEXTUS Group
This group focussed on discussing potential roles and developments for TEXTUS, an open source platform for working with collections of texts and with collections of texts and associated metadata, including annotations. The goal was to create useful documentation about how the tool can be developed and extended, rather than working code.
In this short video interview, Simone Fonda from Net7 outlines the progress the group made as a result of these discussions and explained how he sees TEXTUS developing in the future…
This video is also available on Vimeo.
Open Access Index Group
This group set out to build a list of all the journals in the world, including their access policies, where available. Their intention was to create a search facility to query this data, a form to help crowd source further data and updates, and an API to access this data. The intention is to create a tool that will enable them to get a better idea of what is available to support other work.
In this short video interview, Ian Stuart from EDINA describes the work of the group in more detail, and explains the value he feels developers can bring to the process of creating open access services…
This video is also available on Vimeo.
Useful Tools
Throughout the event, participants shared links and information about a number of useful tools. Here is a quick reference guide to some of the tools mentioned:
BibServer
Github: https://github.com/okfn/bibserver
BibSoup
Link: http://bibsoup.net
TEXTUS
Github: https://github.com/okfn/textus
IDFind
Live here: http://idfind.cottagelabs.com/
Code here: https://github.com/CottageLabs/idfind/
Microsoft Academic Search
API documentation: http://academic.research.microsoft.com/
PubCrawler
Link: https://bitbucket.org/wwmm/pub-crawler
wget
Link: http://www.gnu.org/software/wget/manual/wget.html
PDFBox
Link: http://pdfbox.apache.org/
Final Outcomes
As the event drew to a close, each of the groups provided a short presentation to describe their progress:
Open Access Index Group
The group created an academic catalogue of journal titles, which they called ACat. This was the first step of an ambition plan to create an index that will allow you to browse open access resources. During the course of the event, they collected 55,000 journal titles in an elastic database, then created a front end based on Facetview to provide a searchable interface. Whilst the journal title data was difficult to get, now they have it the group can query various APIs to determine licensing information to develop the project further.
Annotation Tools
Tom Oinn from TEXTUS explained how the event allowed him to gain a much better idea of where to take the TEXTUS project next and how to get it to play with other projects, including BibServer.
At the moment you can annotate texts using comments, but Oinn hopes to move towards any annotation being able to contain references. Notably, he hopes to move towards creating personal reading lists, which begin to look like BibSoup instances, and to start using TEXTUS as an annotation tool in its own right.
BibServer Group
The group spent a lot of time getting BibServers up and running on different architectures, noting that getting a high quality, distributable BibServer out is a high priority. Their work to identify the issues with different set ups will help with this. The group engaged in a number of discussions to explore possible connections with other tools. The big unpredicted outcome of this for this group was realising the value of linking BibServer with TEXTUS.
There were also several splinter projects from this core group, including work to connect BibServer and m.biblio, and efforts to add national bibliographic data from the UK, Germany, Spain and Sweden. The latter helped to identify the problem of character encoding, which is likely to crop up as BibServer gets used more outside the UK. The developers will be insisting on UTF 8 for ingest from now on, as a result of this work.
You can watch the final group presentations in full:
This video is also available on Vimeo.
Participant Responses
There have been a number of blog posts documenting the technical details of the event if you want to know more:
Bringing the Open German National Bibliography to a BibServer by Etienne Posthumus and Adrian Pohl.
Software Cooperative News: Bibliohack London by MJ Ray
Bibliographic References in Textus by Tom Oinn
If you have blogged about this event, please leave a comment with a link to your post.
Recent Comments