Hacking Research Metadata
In this guest post, Alex Ball from UKOLN gives a preview of some of the issues he hopes will be explored at our forthcoming Managing Research Data Hack Days and describes his own work in the field, which could form the basis for the development of some useful MRD tools.
With research funders putting increasing pressure on institutions to manage and expose their research data better, I guess more and more will be thinking about the schemas and systems they use for handling information about their research data. Ideally, institutions should be able to import such metadata from the data centres used by their researchers, collect metadata for locally held data, use the metadata for local discovery and reporting purposes, and export metadata to DataCite when they (the institutions) come around to minting DOIs. At the forthcoming MRD Hack Day it would be great to see some tools developed to help with those tasks, and I would be particularly pleased if some work I did on research metadata could provide some kind of basis for them.
A few years ago, JISC asked me to look into the feasibility of writing an application profile for describing scientific data. This was back when the Scholarly Works Application Profile (SWAP) was near the top of its hype curve, and application profiles based around generic resource types seemed very attractive (these days, they don’t). One problem was that the label ‘scientific data’ doesn’t pick out a coherent set of resources. In other words, a thing which is ‘scientific data’ can have more in common with something that isn’t (e.g. humanities data), than with something else that is. The other, more pressing problem was the ‘application’ bit, always a good thing to consider when designing an application profile. The application evoked by ‘scientific data’ is, naturally, doing science, and the things a crystallographer needs to know in order to use diffraction data are quite different to the things an astrophysicist needs in order to use spectral data.
In order to get anywhere, I had to think more generically on both counts. So what I ended up scoping was an application profile for a hypothetical research data catalogue, one that might be used by an institutional data repository or a national cross-search service. To see if this was in any way feasible, I looked at the metadata already used by (UK) data centres in their catalogues and compared them. The results were actually quite encouraging: I found 33 metadata elements that occurred in at least 3 of the 15 schemes to which I had access. Of these elements, one third occurred in 12 or more schemes: these were things like dataset name, date and identifier, agent, rights/restrictions, summary/description, dataset type and location. For the full list, see the scoping study report or the summary presentation.
In the end, JISC decided not to go ahead with creating the application profile formally, but the work did feed into the discussions that would eventually result in the DataCite Metadata Schema. This schema underlies a search service across all datasets that have been given a DOI, and are in that sense published. Indeed, I found I could map the whole of version 2.2 of that schema (bar one element of internal administrative metadata) to 15 of the elements I’d identified. The most notable elements from my list that the DataCite schema is missing are those relating to spatiotemporal extent, which is important for environmental data for example.
Even though the scoping study did not produce a deployment-ready application profile for research metadata, I hope there’s enough in the report to indicate what one should look like and how it might be used in interoperation with data centres, DataCite and other repositories. If the MRD Hack Day can get some of that interoperation working, I know a lot of people will be made very happy, myself included.
There is still time to sign up to attend the MRD Hack Day. Full details about the event and the booking form are available here. We are also looking for ideas that developers could tackle during the event. If you have an idea you would like to see worked on during the event, please post it here.