MRD Hack Day Ideas

 

Thanks to all of those who contributed ideas for the MRD Hack Day before the event using the comments on this post. We summarised these during the event and used them as the basis of our brainstorming session.

Here is a complete record of all the ideas shared on the ideas wall during the brainstorming session:

SWORD related

  • Implementing SWORD server
  • Create experiment with SWORD (Tim Brody)
  • Post real time data using HTTP Range (Tim Brody)
  • Finish experiment with SWORD (Tim Brody)
  • RSS and XML Time Sequences
  • ATOM Pub publishes to feeds
  • Sending Data by reference
  • Allowing other protocols for transferring data to the server.
  • Server a file retrieval co-ordinates
  • DSapce SWORD large data ingest

 

Large data
  • DSapce SWORD large data ingest
  • Ingesting large data sets (Ian Wellaway and Dan Small)
  • Moving big data – any ideas?  (Jon Besson)
  • Need a walk through of options to store files (5Gb) in a native format with a link accessible via a network share (Alex Henderson)
Metadata
  • Aggregator of research data metadata across institutions (OER Xpert for research) – (Lindsay Wood)
  • Managing ‘anecdotal’ data vs quantitative data…? Metadata explanations: User metadata is different to actual metadata…explore (John Nicholls)
  • How to ease the burden of adding metadata to data in subject areas where adding metadata is not the cultural norm? Policies? Tools? (Aaron Turner)
  • Online metadata designer tool to work with data deposit.
  • Propose a common metadata schema for repositories (e.g. institutional and other repositories share metadata, data held in one place only) (Brian Hole)
  • Exploring DataCite compliant metadata from Eprints, DSpace, CRISes etc (Alex Ball)
  • How best to deal with granular metadata, displaying for researchers (Alexis Wolton)
  • Metadata gets mapped to Data. This metadata is corrected and produces a second version this is mapped to the original data source and a corrected data source (Gerard Devine)
  • Devise effective peudonomisation algorithm to identify patient data
Identifiers
  • Tracking impact of data through identifiers
  • ORCID and DataCite Interoperability (Researcher and dataset identifiers)
  • Devise really simple unique ID generator
  • DOIs – quickest and best method to mint DOIs from eprints (Alexis Wolton)
  • Minting URIs to locate remote files (also just minting URIs)  (Jon Besson)
Other
  • Strategies for displaying groups of datafiles in e-prints (Alexis Wolton)
  • Examining Automatic Data Capture from instruments  (Kathleen Haigh-Hutchinson)
  • Concept mapping / visualisation of researchers, research projects ad outputs (papers / data)  (Lindsay Wood)
  • Look at ways to remotely access /  interrogate data in a repository (e.g. .csv, .txt, .doc, .docx etc), prototpye (Brian Hole)
  • Breadcrumbs data search
  • Online tools for documenting
  • Differences between arts and sciences data (Tom Parsons)
  • Neat and rapid way to get different software packages to use shared ontologies (i.e. code sets)
  • Integrate heterogeneous applications on Cloud like CiViCRM, CA Tissues, Onyx and I2B2 (University of Leicester)
  • Electronic signatures / proof issues / validation of data submitted in a repository (Tom Parsons)
  • Workflow for depositing final research data (Metadata requirements, mint dois (at which stage), documentation) (Tom Parsons)
  • Object Stores – what is the minimum necessary to turn an about store (Amazon S3) into a data repository (Jez Cope)
  • Digital Object Stores, Mongo Db, Couch Db (Jon Besson)

38 Comments

  1. Avatar of Mahendra Mahey

    The Orbital Project

    By Joss Winn

    The Orbital project is developing a pilot system for Engineering researchers to manage their data. Already, researchers are asking us where can they put their data so that it can be immediately cited (the urgent motivation for this is for funding bids and preparation for the REF).
    How should we quickly mint URIs for public data that will be permanently citable, but flexible enough to change with our developing system? Can the DataCite API help us right now?

    If so, we’d like to use the MRD hackdays for implementing this feature.

    • Steve Welburn

      I’m interested in finding out more about allocating identifiers via DataCite.

    • If not, the Handle system could be used, as is the case with DSpace.

      For data citation purposes, it is also useful to provide an automatic suggested citation out of the metadata fields (preferably required ones) of the deposited item. Edinburgh DataShare does this, based on the fields:
      Data Creator(s): Family Name, Given Name/Initials; Family Name, Given Name/Initials. (Date Accessioned – YEAR ONLY). Title, Time Period [Item Type]. Data Publisher / Funder.
      A DOI or handle can be added as well. This makes it easy for a user to do the right thing and include it with the rest of their citations.

  2. Avatar of Mahendra Mahey

    By Joss Winn

    The Orbital project is developing a pilot system for Engineering researchers to manage their data. Part of our work so far has been on aggregating staff profile data so that Orbital knows about a user when they log in.

    We’d like to assign a unique ID for each researcher who uses Orbital and have this in place for the first release in May.

    Is ORCID the way to go for this?

    What other people IDs should we be building into Orbital? Is ORCID ready for use now?

    What can we implement over the two days of the MRD hack event?

  3. Avatar of Mahendra Mahey

    Dealing with real-time instrument data

    Who is interested in working on hacks in this area? Please reply to this post with your ideas.

    • We are working in this area at RAL, STFC (see the above link), so we are interested in getting to know others who also work in this area, i.e. dealing with real-time instrument data in laboratories!

      A few areas we are working on: 1) characterisation and visualisation of real time data; 2) continuously monitoring and tracking of experiments; 3) sample registration and tracking; 4) automated data reduction and analysis; 5) data provenance, data publication, and semantic data repository; 6) scalability testing of laboratory data infrastructure; 7) data citation for aggregated datasets

      If you are working in these areas, please get in touch (erica.yang@stfc.ac.uk). I hope we can form a working group in one of these areas!

    • We are working in this area at RAL, STFC (see the link), so we are very interested in getting to know others who also work in this area, i.e. dealing with real-time data in laboratories!

      A few areas we are working on: 1) characterisation and visualisation of real time data; 2) continuously monitoring and tracking of experiments; 3) sample registration and tracking; 4) automated data reduction and analysis; 5) data provenance, data publication, and semantic data repository; 6) scalability testing of laboratory data infrastructure; 7) data citation for aggregated datasets

      If you are working in these areas, please get in touch (erica.yang@stfc.ac.uk). I hope we can form a working group in one of these areas!

    • Are there any real-time instrument data feeds that we can use for the hack day?

    • Tim Brody

      I would like to tie this into SWORD 2, for auto-feeding data into (data) repositories.

  4. Avatar of Mahendra Mahey

    Translating OAIS into user stories.

    Idea by Joss Winn

    Identify people who have investigated OAIS as a data model for datasets.
    Mapping to a top level generic DC based description of data sets.

    Possibly get Julie Allinson to speak about her work?

  5. Avatar of Mahendra Mahey

    Using generic level DC metadata to describe data sets

    Use the DC scoping study developed by Alex Ball, particularly chapter 5, which outlines a generic DC description, see http://www.ukoln.ac.uk/projects/sdapss/

    Chapter 5, Existing metadata standards and profiles, http://www.ukoln.ac.uk/projects/sdapss/papers/ball2009sda-v11.pdf

    Is there any scope to develop a tool which would map to generic dataset metadata?

  6. Avatar of Mahendra Mahey

    rifCS

    Are there any people interested in doing things around this standard, please reply to this posting.

  7. Avatar of Mahendra Mahey

    CERIF compatibility

    Are people who are working with CERIF or are interested in working with it like to identify some possible hacks in this space?

  8. Avatar of Mahendra Mahey

    Implementing SWORD2

    Who is implementing SWORD 2 for datasets, are there people interested in working around this area? Please reply to this posting.

    • We’re (University of Bath) interested in using SWORD2 to provide a straightforward deposit interface to multiple repositories from our Sakai-based VRE.

    • Steve Welburn

      I’m interested in finding out more about SWORD2.

    • OSS Watch is part of the DataFlow project, which involves two software projects:

      DataStage is a secure personalized ‘local’ file management environment for use at the research group level, appearing as a mapped drive on the end-user’s computer.

      DataBank is a scalable data repository designed for institutional deployment.

      You can push datasets from DataStage to DataBank using Sword.

      I am not a developer on DataFlow myself, but I would be interested to see if we can connect DataStage and DataBank to other repositories.

    • Tim Brody

      In EPrints we have gone beyond SWORD by implementing full HTTP content negotiation. This allows items to be added/updated/removed using AtomPub, using formats supported via plugins. Happy to share what we’ve been up to!

  9. Avatar of Mahendra Mahey

    Identifiers: ORCID API, Datacite/DOIs (BL DataCite API)

    Are there people interested in working with identifiers? Please reply to this posting.

  10. Avatar of Mahendra Mahey

    DMPOnline APIs.

    Is anyone interested in working on these?

  11. Avatar of Mahendra Mahey

    Object stores

    Is there anyone interested in this?

    • We’re (University of Bath) interested in how an object store such as the Hitachi Content Platform can be used to store both live research data (via a NAS interface – CIFS/NFS/etc) and archived data (via e.g. a web-based repository-type interface).

  12. Avatar of Mahendra Mahey

    NoSQL

    Is there anyone interested in this?

    • We are interested in use of NoSQL – specifically MongoDB. We’re interested in what others think of the technologies, seeing it in action, how it might be used in conjunction with a traditional RDBMS.

      • We’re using MongoDB at Lincoln to do all kinds of voodoo, and I’d be happy to share our experiences with it etc.

  13. Avatar of Mahendra Mahey

    OAuth

    Is there anyone interested in this?

  14. John Kaye

    ODIN Project: ORCID and DataCite Interopability Network

    This is an EU funded project starting in Sept 2012 with DataCite, ORCID, British Library, CERN and non-EU partners. British Library have been charged with completing a proof of concept for Social Science data and we will be recruiting to a project post in Summer 2012.

    We have chosen to look at Birth Cohort Studies research data (as they go back to 1946) and to track the impact of that the data and their authors by tracking publications, authors, researchers, secondary analysis, derived data and policy impact.

    I’m interested in seeing if there are any possible technical development solutions to this proof of concept as I work up our approach.

  15. I am interestyed in all things data management. More specifically, I would like to explore ideas relating to documentation, documenting, technical authoring and online interactive resource developmnent around data management training, teaching and planning. Does anyone have an intersecting line of interest?

Trackbacks/Pingbacks

  1. MRD Hack Days – A free hack event for developers and experts interested in Managing Research Data | Sonar - [...] visit the MRD Hack Days Ideas page for the people are interested in. Please add a new posting, or reply to ...
  2. Event Report: Managing Research Data Hack Day | DevCSI | Developer Community Supporting Innovation - [...] MRD Hack Days – Ideas [...]

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>