OR2012 Developer Challenge:
Is this research readable?


Ben O’Steen from Cottage Labs worked with Cameron Neylon to present an idea for a functional survey of access to the published literature at the DevCSI Open Repositories 2012 Developer Challenge.
In their original entry, Ben gave the following description of the idea:

We spend a lot of time arguing over whether people have access, should have access, would have access if they knew how to get it. Why don’t we actually just find out whether people really do have access to the published literature from where they sit when they’re doing their work By carrying our a survey in which we functionally check whether a human being thinks they have access to a given work we can look at how access, and its lack, effects the daily work of people interested in research. This will provide a dataset on access that can be used to support policy development and further technical work.

Crossref has recently released a beta API that allows the generation of a random doi within a given date and journal range. We will build a corpus of around ten thousand random dois obtained by taking samples from a randomly distributed set of small date ranges over the past five years. We will record the DOI, date of release, and other bibliographic metadata. This is our test set.

We will build a website that allows a survey participant to enter and provide their location or affiliation. The IP range from which the user originates will also be recorded (to test whether they are within an institutional IP range that corresponds to the claimed location/affiliation). The participant will then be presented with an embedded frame in which the site will attempt to resolve the DOI. The user will be asked whether they see a null result or 404, an abstract, a request for payment, or the full text of paper. Optionally when a participant indicates that they cannot access the full text we might attempt to identify an archived version of the paper in an institutional or disciplinary repository.

The result will be an ongoing survey that can monitor degrees of and changes in access to the published literature. A data corpus (to be released under ccZero) that will enable detailed analysis of access by demographics, location, and institution and thus provide a coherent and valuable evidence base for the development of policy and technical development.




This video is also available on Vimeo.




Slides to follow


Further Development

Are you interested in collaborating with Ben and Cameron, or discussing how this idea could be taken further?

Please leave a comment on this page.

