Event Report: Bibliohack

Jun 25, 2012 by

Event Report: Bibliohack

DevCSI teamed up with the Open Knowledge Foundation for two days of hacking and sharing ideas about open bibliographic metadata at Queen Mary University of London.


The event provided an opportunity to hack with open bibliographic datasets, to experiment with new tools, and to help improve existing systems to provide new ways for institutions to benefit from bibliographic data.  It attracted a range of developers from the DevCSI community, bibliographic data specialists and librarians with a keen interest in making better use of open bibliographic data.




Day one of the event saw a parallel stream of workshop sessions which provided a context for the practical work in the main hack room.


These sessions addressed the technical aspects of opening up cultural heritage data, examined some of the best open source tools available for doing that, and explored the best standards for preparing and exposing data for reuse.


Introduction to APIs and Linked Data



Preparing your Data for a Hackathon



Diverse Metatdata Standards and the Europeana Solution



Case Study: Cambridge University Library



Reuse of Open Cultural Heritage Data



Case study: The British Library




Lightning Talks


Participants in the hack element of the event shared their ideas and provided some background to their own work in the field in a series of lightning talks. These explored:


  • Content and Data Mining and PDF extractor by Peter Murray Rust and Ross Mounce
  • m-biblio Project by Mike Jones
  • ORI/RJB by Ian Stuart
  • Making a BibServer Parser by Etienne Posthumus
  • IDFind: Identifying Identifiers by Emanuil Tolev
  • BibServer: What we have been doing recently, how that ties into the open access index idea by Mark MacGillivray
  • TEXTUS by Tom Oinn
  • Pundit: Collaborative semantic annotations of texts by Simone Fonda
  • Linked Data by Ian Stuart




Further discussion of these issues and the ideas shared via the event etherpad led to the realisation that many of the tools and solutions people wanted to work on throughout the event could be broadly brought under the umbrella of a Bibliographic Toolkit.  Under this umbrella, a number of distinct groups formed to investigate specific issues.


These included…


BibServer Group


This group began exploring how to unite tools several tools, including PubCrawler, BibServer and BibSoup, to create a tool for abstracting and displaying bibliographic metadata.  They planned to connect PubCrawler, which has collected 12 million bibliographic metadata records and will allow you to find out if papers are Open Access, with BibServer, which collects and displays the data, and BibSoup, which allows the creation of local community groups of references for specific interest areas. The group also planned to investigate how easy it was to get BibServer deployed on a variety of operating systems and document that process.  They hoped to test the tool using data from the German National Bibliography data, with a subgroup working on how to parse this data ready for use.


In this short video interview, Mike Jones from the University of Bristol describes his side project associated with this group, which involved getting the m.biblio app he is currently developing to connect to BibServer…



This video is also available on Vimeo.




This group focussed on discussing potential roles and developments for TEXTUS, an open source platform for working with collections of texts and with collections of texts and associated metadata, including annotations.  The goal was to create useful documentation about how the tool can be developed and extended, rather than working code.


In this short video interview, Simone Fonda from Net7 outlines the progress the group made as a result of these discussions and explained how he sees TEXTUS developing in the future…



This video is also available on Vimeo.


Open Access Index Group


This group set out to build a list of all the journals in the world, including their access policies, where available. Their intention was to create a search facility to query this data, a form to help crowd source further data and updates, and an API to access this data. The intention is to create a tool that will enable them to get a better idea of what is available to support other work.


In this short video interview, Ian Stuart from EDINA describes the work of the group in more detail, and explains the value he feels developers can bring to the process of creating open access services…



This video is also available on Vimeo.



Useful Tools


Throughout the event, participants shared links and information about a number of useful tools.  Here is a quick reference guide to some of the tools mentioned:



Github: https://github.com/okfn/bibserver



Link: http://bibsoup.net



URL: http://textusproject.org

Github: https://github.com/okfn/textus



Live here: http://idfind.cottagelabs.com/

Code here: https://github.com/CottageLabs/idfind/


Microsoft Academic Search

API documentation: http://academic.research.microsoft.com/



Link: https://bitbucket.org/wwmm/pub-crawler



Link: http://www.gnu.org/software/wget/manual/wget.html



Link: http://pdfbox.apache.org/


Final Outcomes


As the event drew to a close, each of the groups provided a short presentation to describe their progress:


Open Access Index Group


The group created an academic catalogue of journal titles, which they called ACat.  This was the first step of an ambition plan to create an index that will allow you to browse open access resources.  During the course of the event, they collected 55,000 journal titles in an elastic database, then created a front end based on Facetview to provide a searchable interface.  Whilst the journal title data was difficult to get, now they have it the group can query various APIs to determine licensing information to develop the project further.


Annotation Tools


Tom Oinn from TEXTUS explained how the event allowed him to gain a much better idea of where to take the TEXTUS project next and how to get it to play with other projects, including BibServer.


At the moment you can annotate texts using comments, but Oinn hopes to move towards any annotation being able to contain references. Notably, he hopes to move towards creating personal reading lists, which begin to look like BibSoup instances, and to start using TEXTUS as an annotation tool in its own right.


BibServer Group


The group spent a lot of time getting BibServers up and running on different architectures, noting that getting a high quality, distributable BibServer out is a high priority.  Their work to identify the issues with different set ups will help with this.  The group engaged in a number of discussions to explore possible connections with other tools. The big unpredicted outcome of this for this group was realising the value of linking BibServer with TEXTUS.


There were also several splinter projects from this core group, including work to connect BibServer and m.biblio, and efforts to add national bibliographic data from the UK, Germany, Spain and Sweden. The latter helped to identify the problem of character encoding, which is likely to crop up as BibServer gets used more outside the UK.  The developers will be insisting on UTF 8 for ingest from now on, as a result of this work.


You can watch the final group presentations in full:



This video is also available on Vimeo.



Participant Responses


There have been a number of blog posts documenting the technical details of the event if you want to know more:


Bringing the Open German National Bibliography to a BibServer by Etienne Posthumus and Adrian Pohl.


Software Cooperative News: Bibliohack London by MJ Ray


Bibliographic References in Textus by Tom Oinn


If you have blogged about this event, please leave a comment with a link to your post.

Related Posts

Share This

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>