DevCSI | Developer Community Supporting Innovation » devcsi http://devcsi.ukoln.ac.uk Fri, 11 Jan 2013 16:06:31 +0000 en-US hourly 1 http://wordpress.org/?v=3.5.2 Citation Data Hack: Event Report http://devcsi.ukoln.ac.uk/2012/11/05/citation-data-hack-event-report/?utm_source=rss&utm_medium=rss&utm_campaign=citation-data-hack-event-report http://devcsi.ukoln.ac.uk/2012/11/05/citation-data-hack-event-report/#comments Mon, 05 Nov 2012 18:26:48 +0000 kpitkin http://devcsi.ukoln.ac.uk/?p=4880 Citation Data Hack

Max Hammond opened the Future Citation Data Hack by emphasising that, rightly or wrongly, citations are the fundamental currency for measuring the impact of research. He approached DevCSI to run a hack event exploring the practical applications of citation data to help inform the JISC Citation Data Directions project he is conducting, which will examine [...]]]>
Citation Data Hack

“What use is citation data without context?”

Max Hammond opened the Future Citation Data Hack by emphasising that, rightly or wrongly, citations are the fundamental currency for measuring the impact of research. He approached DevCSI to run a hack event exploring the practical applications of citation data to help inform the JISC Citation Data Directions project he is conducting, which will examine the whole lifecycle of citation data.
Hammond observed that whilst he is particularly interested in the citation data creation process, understanding the end use of that data is crucial to inform more strategic decisions.

The event was designed to support this work by bringing together a group of domain experts, users and developers to explore ideas related to potential real world uses of citation data and to prototype potential solutions. In particular, Hammond challenged the group to consider: Are there new ideas for the intelligence use of citation data? Are there things we can do with this data beyond the obvious?

With these questions in mind, the participants were invited to map out some of the issues and identify the most useful developments they could work on together during the hack event.

Initial Discussions

The group began with a round table brainstorm of the issues that exist in this space, and identified some of the specific questions members of the group were keen to explore in more detail.

Points of note included:

  • Acquiring complete citation data is an issue. Publishers see the citation data as their value-added thing, which makes them reticent to give it away.
  • Grey literature and questions over what constitutes a citation means that, in practice, it is impossible to have all citation data, so you will only ever have a sparse network. Under these circumstances, what can you do with only part of the data?
  • In different fields people cite in different ways. For example, applied research results in low citations. The completeness of the record depends on the field.

One of the key areas of interest was the potential practical uses of citation data, and identifying who the users might be. Within this initial discussion, members of the group who considered themselves to be users of citation data shared their ideas for potential use scenarios, including:

  • Some form of rating for different types of citations to help judge impact. There is working going on in this area currently, but the group noted that this would require researchers to apply it rigorously in practice.
  • A Klout-style system for citations: This would need to assess positive and negative citations to provide a useful single-number score per article. However, the group discussion highlighted that the issue of betweenness – where really valuable citations can be in between traditional boundaries in less typically valued places. This would be hard to address in a single score.
  •  A geographical visualisation of citations as a broad way to explore citation data. The group imagined a visualisation of citations across a world map, connecting global citations with funder information to allow users to explore impact. Further discussion of this idea highlighted that the current incompleteness of the data – including a lack of funder information and data about non-English citations – would be a major stumbling block to gaining an accurate picture in such a visualisation. The group discussed possible routes to remedy this situation, including how to encourage authors to include funding information, which is often omitted.

The discussion then moved towards identifying practical, achievable outcomes for this event, such as:

  • A comparison of several available citation datasets to identify similarities and establish if researchers working in this area may end up with different results when using graphs from different citation datasets. The published material suggests citation datasets from different sources may be very different, which is by academics claim citations are not a good way to judge them. Could this be verified or explored more deeply using the datasets provided by event participants?
  •  A mind map of the wider picture of citation, including how information and money flow around the citation system. A collaborative page was made available throughout for individual contributions to this big picture.
  •  Experiments with visualisation tools to create citation timelines and measure the decay of citations over time.

There was also interest in taking the opportunity to discuss a number of higher level issues, which could inform the practical use of citation data. Some of the issues touched upon included:

  • Identifying the barriers to data mining to establish the wider picture around citations.
  • Questions surrounding the fundamental nature of citations, such as: What is a citation? What would we do differently if we could define this? If we decided that citations are not the best measure of quality, what is?
Which is most valuable: the most-read but least cited paper, or the least-read but most cited paper?

After these initial discussions, the participants split into several groups to pursue these ideas further, with several participants floating between groups to offer ideas and expertise where required.

Discussion Group

Throughout the first day of the event, a wide ranging discussion between various participants explored the more abstract issues associated with citation data – including questions relating to the nature of citation – specifically focusing on data citation and the intrinsic link between the nature of citation and the nature of the material being cited.

The key argument made was that data citation is still very much domain specific, with disciplines having different approaches to making the underlying data available (and therefore citable), or even preserving the data at all, depending on a number of practical and cultural considerations. Members of the group discussed the citation of less conventional materials, such as plant materials or animals in Life Sciences, which rely on a single physical “master” specimen. This discussion highlighted the fact that data isn’t just digits, and citation of data will vary hugely across different disciplines depending on the form of the data being cited. The group also explored examples from archaeology, where convention dictates that a site report is cited in place of an object itself, and questioned whether metadata could be cited in place of the data in other contexts too.

In considering the issue of how you cite differing materials, the question of persistence arose, which led to an extended discussion about data preservation and the need for persistent citations, but again, it was argued that this is largely domain specific. One of the major problems they identified was that of people thinking of the URL as an identifier, even though these offer no persistence.

The discussion wrapped up with reflections about the reason for the current differentiation between data and documents, questioning why the two things are treated differently. They concluded that the understanding of what constitutes a citation in relation to a publication is not really clearly understood and may be influenced by a number of cultural and social factors, making it difficult to apply to data. To be truly understood and assigned a value, a citation needs to be considered in context.

Comparing Large Citation Datasets

Karl Ward, CrossRef
Petr Knoth, Open University
Emma Tonkin, UKOLN
David King, Open University
Sheng Li, University of Birmingham

Two members of this group brought significant citation datasets with them to the event, and there was considerable interest in comparing them to identify similarities. The datasets differed in size and composition, so the group summised that if they resulted in similar but differently sized graphs, this would allow people looking a smaller graphs to make general conclusions about their own data with more confidence. To test this, the group created a series of histograms of citations over time to identify general shapes. These could perhaps be compared against specific resources to identify features such as courtesy citations compared to the general shape of citations over time for generally relied-upon resources.

The two datasets came from the CORE project and CrossRef. Both datasets have different identifiers, so the group had to manually choosing papers that appear in both datasets to start with to enable them to make comparisons. The group also used data from Microsoft Academic Research to support their work. Through further discussion both within the group and with other event participants, they identified a list of statistical techniques and graph-based approaches that they could apply to this data, including calculating the half-life of a paper. However, they stressed the need for a statistician to become involved with the project to assist with many of these analyses.

Karl Ward described their progress and ultimate aims in this short video interview:

Click here to view the embedded video.

You can also watch this video on Vimeo.

By the end of the event, the group had explored some overall metrics to help assess the similarities between the datasets to help researchers better understand the decisions they are making by choosing to use one citation dataset over another without fully understanding the different methodologies that may have been used in compiling either dataset. These metrics included the absolute maximum number of citations that a single node might have received, absolute minimum, average and so on. However, they noted that whilst this tells you a bit about your dataset as a whole, but does not tell you how accurate any given node might be.

To pursue this further, the group picked some specific nodes to study in greater detail, creating histograms of citations over time and considering some of the odd features of some of the data – including a bias towards the present in the CrossRef data. They speculated that this might be due to publishers only recently adding DOIs to papers, enabling them to confidently describe a paper containing a citation, but conceded that this issue needs to be examined in more detail than permitted in the time available at the hack event.

Once these anomalous results had been discounted until they can be fully explained, the group used the Earth Mover’s Distance algorithm to examine the differences between normalised results for one node, comparing the same paper in the data provided by CrossRef and Microsoft Academic Research. They noted that they will need to carry out this same process over a larger group of papers, for which they will need to be able to identify the same papers within both datasets using DOIs.

Going forward, the group would like to look at the general shape of more histograms and how different papers are used over time, and to identify papers that are in the space where they were almost influential, but might not appear in a rank of frequencies.

Watch the final report by Emma Tonkin, who summarises the outcomes of the group’s efforts in full in this presentation:

Click here to view the embedded video.

You can also watch this video on Vimeo.

Information Flow

Throughout the event, Max Hammond asked participants to contribute to a collaborative project to map out the flow of information and money within the citation data ecosystem. Various participants chipped in to add sections of the citation data lifecycle, and the influences and issues that impact on that lifecycle.

Watch the map evolve in this short video:

Click here to view the embedded video.

You can also watch this video on Vimeo.

Visualisation Beyond the Citation Border

Edward Minnett, Faculty of 1000
Tanya Gray, University of Oxford
Sheng Li, University of Birmingham
Tim Brody, University of Southampton
Paul Stokes, JISC

This group were particularly interested in creating visualisations of citation data, including visualisations over time and over geographical space. Their initial approach involved attempting to make modifications to an existing library in order to create tree graphs, but they quickly found this too complicated for the time constraints of the event. As a result, they decided to take a stock visualisation and change some of the parameters to make it more readable for citations, then plug in some of the data from opencitations.net to see what patterns emerged.

Several members of the group took the opportunity to explore new or unfamiliar technologies, including Node.js, which Tim Brody used for the first time at the event to develop a proxy around opencitations.net to act as middleware, which could then be applied to other datasets. Other members of the group practiced using sparql queries to create a dynamic graph around the opencitations.net data.

Edward Minnett described the group’s progress and driving forces from his perspective in this short video interview:

Click here to view the embedded video.

You can also watch this video on Vimeo.

By the end of the event, the group had developed a middleware layer that can extract the APIs from a variety of citation data sources to allow that data to be used with any visualisation tool. In the future, this could be coupled with a system like DOI to allow you to migrate seamlessly across citation databases within a visualisation.

The approach the group took to build this was to construct a node.js based server, which sends off sparql queries to opencitations.net to retrieve metadata about a particular article and all the items that article has been cited by. This is built into a standard format in the middleware, which in turn is passed to the chosen visualisation system.

As part of this work, the group wanted to come up with a new visualisation that would help people in this space to think about citations on a timeline. They created a visualisation which connects articles cited by a particular node, and the articles that in turn cited that node, arranged on a timeline. They used the JavaScript InfoVis toolkit to create and demonstrate this visualisation in real time.

Watch the final report by Tim Brody, who summarises the outcomes of the group’s efforts in full in this presentation:

Click here to view the embedded video.

You can also watch this video on Vimeo.

Remote Participation

There was interest in the event from a number of people who were unable to attend in person. When Jimme Jardine realised he could not attend, he asked us to show a pre-prepared video to the group describing Qiqqa, his own project that was highly relevant to the discussions taking place at the event.

You can watch this video in full below:

Click here to view the embedded video.


You can also watch this video on YouTube.

Conclusions

The event was designed to directly feed into a JISC-funded project examining the life cycle of citation data by connecting the researchers directly to users and developers who can build things with citation data. The practical outcomes of the event as described in this report helped to provide insight into how citation data could be used and to identify some of the difficulties that exist with the current citation data infrastructure that prevent innovation.

Max Hammond summed up how useful the event proved for his project in this short video interview, in which he reflects how on the differences between the needs of high level stakeholders and the developers on the ground who are looking to implement solutions based on citation data:

Click here to view the embedded video.

You can also watch this video on Vimeo.

Citation data matters, so we need to get this right.
]]>
http://devcsi.ukoln.ac.uk/2012/11/05/citation-data-hack-event-report/feed/ 0
DevCSI Developer Stakeholder Survey 2011 – 2012 http://devcsi.ukoln.ac.uk/2012/06/21/devcsi-developer-stakeholder-survey-2011-2012/?utm_source=rss&utm_medium=rss&utm_campaign=devcsi-developer-stakeholder-survey-2011-2012 http://devcsi.ukoln.ac.uk/2012/06/21/devcsi-developer-stakeholder-survey-2011-2012/#comments Thu, 21 Jun 2012 14:17:19 +0000 Mahendra Mahey http://devcsi.ukoln.ac.uk/?p=4178 Evidence Base at Birmingham City University has once again been commissioned to undertake an important survey of developers working / studying in education (largely in universities and colleges) and their stakeholders on behalf of DevCSI:

http://svy.mk/devcsi12

The broad topics of this survey include: benchmarking developers across the sector; examining stakeholders’ views of software development; discovering examples of local innovation; and gathering suggestions about the on going future development of a developer community in UK education. The survey is very important for informing future work of the DevCSI project and should provide useful information as to the value and importance of developers to innovation in the education sector. It should take around 10-15 minutes to complete. So if you are developer in education, you work with developers, or your work is effected by the work of developers please fill in this important survey.

Each respondent will be able to enter a prize draw to win a £200 Amazon voucher or one of four £50 vouchers. If you would like to enter for your chance to win, please follow instructions at the end of the survey.

Thanks for your participation and good luck in the prize draw!

See the results of last survey.

Please feel free to pass this blog posting on, or repost. Thank you.

]]>
http://devcsi.ukoln.ac.uk/2012/06/21/devcsi-developer-stakeholder-survey-2011-2012/feed/ 0
DevCSI Challenge at Open Repositories 12, Edinburgh, Scotland. http://devcsi.ukoln.ac.uk/2012/06/20/devcsi-challenge-at-open-repositories-12-edinburgh-scotland/?utm_source=rss&utm_medium=rss&utm_campaign=devcsi-challenge-at-open-repositories-12-edinburgh-scotland http://devcsi.ukoln.ac.uk/2012/06/20/devcsi-challenge-at-open-repositories-12-edinburgh-scotland/#comments Wed, 20 Jun 2012 11:33:43 +0000 Mahendra Mahey http://devcsi.ukoln.ac.uk/?p=4175 Open Repositories 2012

DevCSI is proud to announce that it is once again organising the Open Repositories Developer Challenge 2012 at the Seventh International Conference on Open Repositories 2012 in Edinburgh, Scotland – Open Repositories 2012. We are working closely with the Repositories Fringe and the challenge is kindly sponsored by Microsoft Research.

-Hash Tags-

#or2012
#or12dev
#devcsi

The challenge is taking place between Tuesday July 10th and Thursday 12th of July, 2012.

The challenge is:

‘Show us some thing new and cool in the world of Open Repositories’

The initial deadline for your entry will be Tuesday 10th July 2012, at 10am, which means you can start on your entry now!

For more information, see:

http://devcsi.ukoln.ac.uk/developer-challenges/developer-challenge-or-12/

You will then have a chance to pitch your idea to an audience/experts who will give you feedback and then you will be able to make changes to your idea and then re-present to a panel of judges the next day, Wednesday 11th July 2012. Winning entries will present to the conference on Thursday 12th July 2012.

Prize money of £1000 is available for the winners and runners up. There is an *additional* prize (.Net Gadgeteer kit) for the entry that demonstrates the most innovative use of Microsoft Technology which can also be submitted to the main challenge above. We are hoping that we will be able to provide an opportunity for the winning team, time to meet up to complete their entry after the conference so that the community can benefit from it.

Please feel free to circulate this blog post.

]]>
http://devcsi.ukoln.ac.uk/2012/06/20/devcsi-challenge-at-open-repositories-12-edinburgh-scotland/feed/ 0
Event Report: Managing Research Data Hack Day http://devcsi.ukoln.ac.uk/2012/05/11/event-report-managing-research-data-hack-day/?utm_source=rss&utm_medium=rss&utm_campaign=event-report-managing-research-data-hack-day http://devcsi.ukoln.ac.uk/2012/05/11/event-report-managing-research-data-hack-day/#comments Fri, 11 May 2012 09:00:42 +0000 kpitkin http://devcsi.ukoln.ac.uk/?p=3463 IMG_2919

DevCSI worked with the JISC Managing Research Data Programme and the JISC Orbital Project to organise a Managing Research Data hack event in Manchester from 3rd-4th May, 2012. The event was designed to bring together software developers, project managers, data librarians and experts with an interest in the area of managing research data to share, [...]]]>
IMG_2919

DevCSI worked with the JISC Managing Research Data Programme and the JISC Orbital Project to organise a Managing Research Data hack event in Manchester from 3rd-4th May, 2012. The event was designed to bring together software developers, project managers, data librarians and experts with an interest in the area of managing research data to share, talk, collaborate and create useful solutions.

Participants were encouraged to develop ideas, paper prototypes or even working code to address some of the issues raised by delegates from a range of different projects. A prize was available for the best idea, with the winners receiving their expenses paid to get together and develop their idea further. There were also opportunities to share skills throughout the event.

The event followed a relaxed hack event format, opening with a series of lightning talks from participants describing their projects and areas of interest, followed by a period of brainstorming, development into the evening, and reporting back to the group to gather feedback.

 

Lightning Talks

 

History Data Management Plan (HDMP) Project

John Nicholls, University of Hull

John Nicholls

Nicholls described himself as an example of “the reason we are all here.” He represents history researchers at the University of Hull, where he works as the data manager on the JISC-funded HDMP project. This involved working with the university’s library services to create useable data sets from the information collected by ordinary historians, and has resulted in the formulation of a history data management plan, which they are now using to help inform new projects so the researchers can put their data into a useable format from the outset. He was able to offer examples of the historical data for developers to experiment with during the event, but appealed for information about the tools available for managing data that an ordinary researcher could use, and asked how he might engage in the documenting process.

More information about this project is available here.

 

MongoDB

Nick Jackson, University of Lincoln

Jackson offered to run crash course for those who were interested in storing and querying data in MongoDB, a no SQL database. He provided a brief overview of the benefits of MongoDB, which he argued was massively scaleable, agile and flexible, and explained why it is useful for handling research data.

Click here to view the embedded video.

This video is available on Vimeo.

Nick’s slides from this presentation are available here.

PIMMS (Portable Infrastructure for the Metafor Metadata System)

Gerard Devine, National Centre for Atmospheric Science (NCAS), University of Reading

Gerard Devine

PIMMS provides institutions with tools to capture information about the workflow of running simulations from the design of experiments to the implementation of experiments via running simulations models.

Devine explained how this works within his own research area of climate modelling, where the outputs are so large and complex that past strategies for understanding the models and the limited available metadata are no longer sufficient. The PIMMS project has created a system to help document this climate data, including a web form to help describe all the aspects of the experiments and models according to a set vocabulary. These can then be used in portals which can understand the schema and expose the information in different ways.

He noted that mapping data to metadata has been a particular problem, so he was interested in working with people who have experienced similar issues or found solutions.

Database as a Service implemented in Oxford University

Asif Akram, Oxford University

Asif Akram

Akram outlined the Virtual Infrastructure with Database as a Service (VIDaaS) project, which allows users create a project and upload for example, Microsoft Access databases or Excel spreadsheets to create an online, shareable database in the cloud. The system then converts this into an online database, which can be modified and shared using a simple user interface. He described the three tools they have created to make this process simple, including a database migration tool, a Microsoft Access database converter, and an SQL Designer to help researchers create a working SQL database using drag-and-drop tools.

Further information is available here.

ORCID and DataCite Interoperability Network (ODIN)

John Kay, The British Library

John Kay

Kay described his role as a social sciences curator at the British Library and his work with DataCite, which creates persistent identifiers for datasets so they can be cited. They have just received funding for a project to take DataCite forward, which will include working on interoperability between other systems, such as ORCID.

He was able to offer some APIs for developers to play with at the event, and the opportunity to mint DataCite DOIs.

REWARD

Brian Hole, Ubiquity Press

Brian Hole

Brian Hole described the JISC-funded REWARD project, which aims to incentivise researchers to deposit their data without introducing any new steps into their everyday procedures. He provided an overview of how this worked, including submission to the Journal of Open Archaeological Dataand ePrints. He outlined some of the recommendations resulting from the project, including the need for more training of library staff to customise ePrints to accept data more neatly.

Hole also provided an overview of the Journal of Open Archaeological Data and the benefits this gives to researchers as a peer reviewed journal which guides researchers through the process of finding an acceptable repository and issuing a data paper that can be cited in traditional papers.

He was particularly interested in collaborating with others to consider some of the issues identified by the project, including minting their own identifiers.

Further information is available here.

YouShare

Aaron Turner, University of York

Aaron Turner

YouShare is a HEFCE-funded project to provide an environment for researchers to share programs and data and apply programs to their data.

Turner described their current efforts to link the front end that researches see to an archival system that is standards compliant. This will suck the data set into various tiers of an archival system when it is not being used, then bring this back into a live system when people want to access it to carry out further experiments. He provided a demonstration of the interface, showing how to create workflows using YouShare and publish DOIs to facilitate citations.

He was looking to form collaborations to discuss issues associated with data ingest to archival systems during the hack event.

Further information is available here.

Data.bris

Damien Steer, University of Bristol

Damian Steer

Steer described the cluster system at the University of Bristol, where they have high performance computing and persistent storage for researchers. Research projects can apply for storage, nominate data steward, and receive 5tb free storage. He observed that quite a few people are already using it, including arts and humanities researchers.

The data.bris project aims to create an interface to help researchers use some of this infrastructure and make deposits with metadata. They are looking to add submission via SWORD, despite internal policy questions, and intend to start looking at packaging data sets and integrating with with PURE system by Atira.

DataStage

Sander van der Waal, OSS

Sander van der Waal

Van der Waal outlined the two software components of the DataFlow project: Data Stage and Data Bank. Data Bank is a institutional repository system, and Data Stage is a step before that. As a researcher, before you are ready to publish your data set, you are working with data which needs to be stored and managed. Data Stage helps researchers on a local level to manage departmental data, in a similar way to Dropbox, by providing external back up and version control. Van der Waal explained that he would like to Data Stage to push data to other SWORD compliant repositories, and appealed for people interested in connecting repositories using SWORD to collaborate.

Using dSpace

Ian Wellaway, University of Exeter

Ian Wellaway

Wellsway described work at the University of Exeter, where they are using dSpace with Oracle. He observed that the submission process is a bit clunky, so they have been looking a easy deposit. The problem they have encountered is that a lot of researchers have big data sets, which they are struggling to get into the repository over http. This causes frustration and inevitably puts people off submitting. He appealed for help from people who have solved or have an interest in solving a similar problem.

DMPOnline

Monica Duke, DCC

Monica Duke

Duke introduced the DMPOnline tool developed by the Digital Curation Centre to help researchers create the data management plans, which are now requested by many funders. The DCC are looking to create an API for this, and she has been involved with some thinking about how people might interact with DMPOnline via this API. She was interested in talking further with any people who are interested in getting data in or out of DMPOnline, or think their system should be interacting with it. She also promoted a forthcoming workshop at Open Repositories 2012 which will be exploring this further.

Biomedical Research Infrastructure Software Service kit (BRISSkit)

Malcolm Newbury, Guildfoss

Malcolm Newbury

Newbury outlined the BRISSkit, a suite of applications to support the entire clinical study process, including CiviCRM to recruit participants, CA Tissue which tracks assets (blood, samples etc) and Informatics for Integrating Biology and the Bedside (I2B2) which decomposes information about each patient, adds an ontology and allows researchers to query the data based on that ontology. These tools have all been integrated so information can travel between the application and the full application set can now be provisioned in the cloud.
Newbury observed that they still have some challenges, including generating the unique numbers that are attached to samples, and integrating the applications in a way that does not slow them down, so they are currently looking into open source ways of orchestrating that integration.

Ideas

There were a number of ideas shared before the event, which were summarised briefly before the group began to brainstorm new ideas on the ideas wall. A complete list of all the ideas shared before and at the event can be found on the MRD Hack Days Ideas page.

 

Teams

Several broad teams formed to discuss the ideas further and suggest potential projects to work on throughout the rest of the event.

Data Activity Stream

A group worked on a proof-of-concept for a centralised service for tracking activity data around research projects and individual datasets. This would allow researchers to see what others have been doing with particular data objects, together with a stream of information about activity within the project as a whole.

In this video interview, Nick Johnson explains the concept in more detail and outlines their progress during the event, which included building a working API.

Click here to view the embedded video.

This video is available on Vimeo.

 

SWORD 2

This group decided that the problem with SWORD 2 and big data is the resumption problem is that fundamental to http. They discussed how they might send a SWORD request asking server to get content via some other mechanism, such as a bit torrent client, FTP or Dropbox. Discussion with the wider group generated positive feedback about bit torrenting as a good route to handle big data. The group experimented with this during the event to test their reasoning.

In this video interview, Damian Steer outlines the progress made by the group and the issues

Click here to view the embedded video.

This video is available on Vimeo.

Damian’s presentation can be found here: http://www.slideshare.net/shellac/sword2-and-bittorrent

The issue of how to handle big data was discussed by several overlapping groups. In this interview, Jon Besson and Dan Small reflect on their connected discussions in this area…

Click here to view the embedded video.

This video is available on Vimeo.

Academic Dropbox

Also connected with the issue of big data, a separate group discussed the potential of am academic dropbox, using a client rather than a server-based pool approach. They explored a number of tools, including tools like SparkleShare, and documented their survey of the issues in a series of blog posts.

In this video interview Joss Winn and Jez Cope reflect on some of these issues in more detail…

Click here to view the embedded video.

This video is available on Vimeo.

 

Metadata for Datasets

This group chose to explore existing metadata schemas to identify the minimum number of elements needed in a schema to accompany data transferred between repositories. They highlighted a potential use case involving a researcher who makes a deposit into a subject repository. From an institutional institutional perspective it will be useful to know about all research outputs, so a basic common schema would allow information about the deposit to be shared between the subject repository, the institutional repository, and any other interested repository, such as the British Library. They also noted that this may be useful if the data is held in more than one place, helping to make it clear where the citable data is held and which versions are copies.

The group speculated that an extension of this work could allow people to “follow” a particular researcher in a social media style.

In this video interview, Brian Hole describes the progress they made during the hack event and how they see this developing in the future…

Click here to view the embedded video.

This video is available on Vimeo.

Other Work

During the event there were a number of discussions about issues associated with identifiers. Whilst these did not lead to a working project group, they covered useful ground and led to solutions to some of the specific problems participants brought to the hack event with them.

In this video interview, Gerard Devine from the PIMMs project describes one such outcome…

Click here to view the embedded video.

This video is available on Vimeo.

Alex Wolton from University of Essex also discusses the progress he made on some complex issues associated with his project during the event…

Click here to view the embedded video.

This video is available on Vimeo.

Final Outcomes

 

Activity Data Stream Group (Rainbow Beam)

Nick Jackson, Julian Cheal, Harry Newton, Nick Syrotiuk

Click here to view the embedded video.

This video is available on Vimeo.

Bit Torrent Group

Sander van der Waal, Damian Steer, Tim Brody, Steve Wellburn

Click here to view the embedded video.

This video is available on Vimeo.

Metadata Group (URMe)

Brian Hole, Carlos Silva, Alex Ball, Thomas Parsons, John Kaye, John Bottomley, John Nicholls, Lindsay Wood, Asif Akram

Click here to view the embedded video.

This video is available on Vimeo.

The Prezi to accompany this presentation is available here.

Academic Dropbox

Joss Winn and Jez Cope

Joss and Jez produced several blog posts documenting the issues they researched during the event:

 

Conclusions

One of the key outcomes from the event was a consensus about the need for a different paradigm to deal with moving and managing big data, compared to smaller data sets or multiple small data sets. Exploring these issues and identifying where projects and institutions are encountering similar issues proved to be one of the most useful outcomes for all participants.
 

Participant Responses

 
A number of participants blogged about this event from their own perspectives:

]]>
http://devcsi.ukoln.ac.uk/2012/05/11/event-report-managing-research-data-hack-day/feed/ 0
BiblioHack http://devcsi.ukoln.ac.uk/2012/05/10/bibliohack/?utm_source=rss&utm_medium=rss&utm_campaign=bibliohack http://devcsi.ukoln.ac.uk/2012/05/10/bibliohack/#comments Thu, 10 May 2012 16:03:52 +0000 Mahendra Mahey http://devcsi.ukoln.ac.uk/?p=3533 Bibliohack

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, 2012. We’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along [...]]]>
Bibliohack

The Open Knowledge Foundation’s Open Biblio group, and Working Group on Open Data in Cultural Heritage, along with DevCSI, present BiblioHack: an open Hackathon to kick-start the summer months. From Wednesday 13th – Thursday 14th June, 2012.

We’ll be meeting at Queen Mary, University of London, East London, and any budding hackers are welcome, along with anyone interested in opening up metadata and the open cause – this free event aims to bring together software developers, project managers, librarians and experts in the area of Open Bibliographic Data. A workshop will run alongside the coding on the 13th, and a meet-up on the evening of the 12th is open to all whether you’re attending the Hackathon or not.

What is BiblioHack?

BiblioHack will be two days of hacking and sharing ideas about open bibliographic metadata.

There will be opportunities to hack on open bibliographic datasets and experiment with new prototypes and tools. The focus will be on building things and improving existing systems that enable people and institutions to get the most of bibliographic data.

If you’re a non-coder there are sessions for you too. We will be running a hands-on workshop addressing the technical aspects to opening up cultural heritage data looking at best of breed open source tools for doing that, preparing your data for a hackathon and the best standards for storing and exposing your data to make it more easily re-used.

When and where?

  • The main hackathon will take place over two days between 13th and 14th June at Queen Mary University of London
  • On the morning of the 13th June we’ll be running the workshop addressed at the technical challenges to opening up metadata. So for those unable to participate in the hack due to time constraints or lack of coding know how – this is for you!
  • On the 12th June – Tuesday evening (details TBC but will be a pub in central / east London!) – we’ll also be hosting a meet-up for anyone attending the hack and open data more generally. Whether it’s open bibliographic data, spending or government data that floats your boat all tribes are welcome!

 

Who is organising the event?

 

Who else is involved?

We’ve already lined up a whole host of speakers and groups who’ll be attending both the hack and the workshop. The list so far includes UK Discovery, CKAN, Europeana, Total Impact, Neontribe, The British Library with many more to be added in the coming days…

You’re giving your time and expertise – what do you get if you attend the whole hack?

  • Accommodation at QMUL overnight on the 13th
  • Food and refreshments across the 3 days
  • The chance to work with experts in their fields
  • Admiration and respect from your peers
  • We could expound at length, but… go on, you know you want to (it’s free!)

 

How can I sign up?

  • Register here for the 2 day hack
  • Register here for workshop only
  • Register here for Meet-up only

Please note, if you wish to attend all 3 events you should sign up for each, and the Workshop will run in parallel with the hacking on the morning of the 13th.

More questions?

Contact Naomi Lillie on admin [@] okfn.org

]]>
http://devcsi.ukoln.ac.uk/2012/05/10/bibliohack/feed/ 0
Dev8D: The best of Dev8D http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-the-best-of-dev8d/?utm_source=rss&utm_medium=rss&utm_campaign=dev8d-the-best-of-dev8d http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-the-best-of-dev8d/#comments Thu, 16 Feb 2012 17:34:59 +0000 gibbons http://devcsi.ukoln.ac.uk/blog/?p=2866 This year’s Dev8D has now come to an end, so sit back and take a look at what happened over the three days…

]]>
http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-the-best-of-dev8d/feed/ 0
Dev8D: Kinect-ing with Quadrocopters http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-kinect-ing-with-quadrocopters/?utm_source=rss&utm_medium=rss&utm_campaign=dev8d-kinect-ing-with-quadrocopters http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-kinect-ing-with-quadrocopters/#comments Thu, 16 Feb 2012 12:21:22 +0000 gibbons http://devcsi.ukoln.ac.uk/blog/?p=2780 DevXS last year was a platform for a University of Aberystwyth led team to show off their multi-mobileOS quadrocopter. Mere months down the line, and Dave Tarrant is demonstrating the speed of technological evolution with a 3D motion control system for the flying machine, using an Xbox Kinect.

Dave believes that a potential application for this system is to teach younger children and those with learning difficulties about spacial awareness:

“There are certain beneficial applications, if you like, of the technology. For example, educating young children who have learning difficulties about 3d spacial awareness by being able to interact in 3D. The Kinect is great already for playing games and interacting using full body motion, rather than a controller than you necessarily have to learn, because it is very intuitive”.

]]>
http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-kinect-ing-with-quadrocopters/feed/ 0
Dev8D: Day two reflection with Mahendra and Christopher http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-day-two-reflection-with-mahendra-and-christopher/?utm_source=rss&utm_medium=rss&utm_campaign=dev8d-day-two-reflection-with-mahendra-and-christopher http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-day-two-reflection-with-mahendra-and-christopher/#comments Wed, 15 Feb 2012 23:47:24 +0000 gibbons http://devcsi.ukoln.ac.uk/blog/?p=2750 As the sun sets on the second day of Dev8D 2012, organisers Mahendra Mahey and Christopher Gutteridge reflect on the days events.

The duo talk about the success of fledgling developer, Alex Bilby, after his talk on HTML5 attracted an unexpected amount of interest, as well as discussing lightning talks, the new unconference format and tomorrow’s ‘Best of Dev8D’ programme.

]]>
http://devcsi.ukoln.ac.uk/2012/02/16/dev8d-day-two-reflection-with-mahendra-and-christopher/feed/ 0
Dev8D: Meet the developers: Half way point http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-meet-the-developers-half-way-point/?utm_source=rss&utm_medium=rss&utm_campaign=dev8d-meet-the-developers-half-way-point http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-meet-the-developers-half-way-point/#comments Wed, 15 Feb 2012 15:49:22 +0000 gibbons http://devcsi.ukoln.ac.uk/blog/?p=2726 Lunch time on Wednesday marked the half way point in this year’s Dev8D, and we caught up with some of the developers to see how they think the event is going, and what they have enjoyed the most so far.

]]>
http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-meet-the-developers-half-way-point/feed/ 0
Dev8D: 3D Printing with Graham Klyne http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-3d-printing-with-graham-klyne/?utm_source=rss&utm_medium=rss&utm_campaign=dev8d-3d-printing-with-graham-klyne http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-3d-printing-with-graham-klyne/#comments Wed, 15 Feb 2012 11:57:57 +0000 gibbons http://devcsi.ukoln.ac.uk/blog/?p=2691 One of the most exciting and promising pieces of technology to be on display over the last few years at Dev8D has been the 3D printing RepRaps – Machines capable of replicating up to 50% of its own components.

This year, Oxford University’s Graham Klyne shows off 2nd and 3rd generation machines, which have been built from their respective parent generation machines.

While a large scale commercial application is yet to be found for the RepRaps, Graham believes the 3D printing industry will follow that of digital image printing.

“I can imagine that we can see an ecosystem build up which would have parallels with what we can see with printing digital photos. The possibility of creating your own images – You can print them at home if you want to and if you have the right kind of equipment, but you can also take them and get them printed fairly cheaply at a store. There is a real possibility that we can see a much more competatively priced bureau service for realisation of 3D design”.

]]>
http://devcsi.ukoln.ac.uk/2012/02/15/dev8d-3d-printing-with-graham-klyne/feed/ 0