UoL Library Blog

Develop, debate, innovate.

Posts Tagged ‘curation’

DataCite and the Research Data Challenge

Posted by gazjjohnson on 30 May, 2012

Last Friday (25th May) I took my second trip of the week to London (having been at the Symplectic User Conference on Monday).  This time it was the gentle stroll from St Pancras to the British Library Conference Centre to participate in the first JISC/BL DataCite workshop.  Billed as an introduction to data citation and DataCite, this seemed an ideal follow up to the Research Data Management Forum event in Southampton back in March.  As the role of the LRA Manager migrates to look increasingly at how we will manage, share and curate research data outputs as well as publications it was the sort of thing that I felt I really needed.

Data Citation

Following the house keeping and welcome from the BL’s Lee-Ann Coleman and JISC’s Simon Hodson (owner of the finest waxed moustache I’ve seen in many a moon), Lee-Ann kicked off with an overview of Data Citation; what it is and why is it important.  The fact that there is an expectation from the RCUK that research data will be shared, to assist in validation of research conducted by their funded investigators, is perhaps the most major driver.  At the same HEIs want oversight on their research outputs, and as such the curation of their organisations data resource is important to them for building on earlier work and enabling collaborative research to organically evolve.  Given that many academics in adjoining offices are often unaware of what colleagues are producing, increasing this transparency and accessibility to a rich, queriable and reusable research resource is believed to be of value in not only progressing collaboration but enabling genuine novel research from preexisting work.

Lee-Ann cited some examples included the importance of data sharing in speeding up the sequencing and generation of a vaccine for the African strain of Avian flu.  Her other examples were also in the STEM field which slightly concerned me, given that two-thirds of research here at Leicester is in disciplines outside this domain; whom in my experience often need a greater assistance in capturing and sharing technological resource.  Lee-Ann stressed that one question that needed to be addressed by HEIs was what is critical/worthy data to curate?  A microbiologist might see all the raw data output from an instrument as worthy of this, and yet for many other people it would be the processed data given context and analysis that would be of value.

What is DataCite?

Next  up was Elizabeth Newbold (British Library) who gave an overview of what is DataCite.  Founded in 2009 it is a registration agency, effectively an allocating agent for DOIs (which I had never realised are based on the Handle system that I use daily in the LRA).  However, it was made very plain that DataCite does not work directly with researchers, they are expected to deposit their data (in whatever way possible) to an appririate data centre, and then come to DataCite to “mint” a DOI.  Minting of DOIs was new phrase for me, but clearly one that I can see slipping into my regular conversations about this subject here at Leicester.

It was noted that the UK Data Archive had a strong definition of what was data (termed data collections) as groups of all outputs from a single project source.  Commented that other data centres across the country were working along similar lines and methodologies.

Biscuits - failed to picture lunch, but it was splendidDataCite Infrastructure & Working with DataCite

After an excellent lunch (BL London catering never fails to delight) Ed Zukowski (British Library) gave a very useful, if in part quite detailed and technical, overview of both DataCite and DOIs.  Handles being the technology that underpins them, where DOI is actually a trademarked derivative.  DOIs importantly point to landing pages not to the objects themselves (akin to our implementation of Handles on the LRA), and in practice using the DataCite front-end take around a minute to mint.  He went on to detail how DataCite resolves contents from DOIs minted via them, but I think I’ll wait and link to the slides once available rather than try and make sense of my slightly confused notes.  I was content to see that the service worked, rather than worry about the technicality.

Following this Elizabeth Newbold returned to talk briefly about working with DataCite and the data client responsibilities.  In terms of their metadata schemea there were only 4 required elements needed to make it work.  However, locally people may well augment this with many more fields as they felt appropriate for discovery and description.  I confess one nagging worry I have is whom will create this metadata?  Is it a task we will anticipate a PI will perform at the conclusion of a project?  Personally I have concerns over the quality, accuracy, uniformity and standardisation of such input; going on my experience of manually created records submitted to the LRA via IRIS.  From the academics’ perspective I can see the challenge being that this will be seen as yet another piece of administration trivia that they are expected to deal with, and achieving the cultural change to embeded this into their standard workflows will be challenging with some serious and time-consuming carrot-whipping.  Given the struggle to work deposit of publications into our open access repository into their routine over the past four years, it is a serious challenge and the scale of this should not be underestimated!

Elizabeth noted that metadata created must be shared under a Creative Commons Zero licence, noting that for example the British Library OPAC makes data available for sharing and reuse in this way.  There were some concerns from those present in the room that this might cause problems in cases where institutions, funders or even publishers made claim over such data.  Another speaker also highlighted the problem of having data (with a minted DOI) then having a third party mint a different DOI to it which could interfere with metrics of access as well as uniformity of reference.  There didn’t appear to be a clear consensus or answer to these concerns, and the discussions broke up over tea.

Challenges Around Managing Research Data

The final session of the day was a workshop format where we were broken into small groups, and then smaller groups, an then finally into pairs (!) to discuss and document what we perceived as the challenges around managing research data.  I think it was a shame we were so subdivided, since while I had a valuable chat with my counterpart I would have relished a broader chat with a slightly larger group.  Given that there was a wide disparity between the role of delegates (from publishers to project manages to editors to directors of service through to repository managers) I feel we lost some of the benefit that we could have achieved through putting more of these diverse heads together.  I also sensed a slight bias in the broader discussion when each pair’s issues were categorised and resolutions discussed – it did feel like the expectation was that the answer to “How do we solve this problem?” was intimated to be “DataCite”.  It wasn’t in our room, although in at least one of the other two larger groups DataCite seemed ready to answer more of their challenges.

Conclusion

My slight concerns over the value of the final session aside, this was an eye-opening and valuable day.  It has for me perhaps opened up more questions than answers, although some of those were provided as well.  Importantly what I think it offered was a chance to gauge where other people are on the research data management question and more importantly it gave shape to the bigger operational and strategic questions that we need to be asking ourselves within our organisations.  As such the day was most certainly worthwhile, and my thanks to all the speakers, organisers and delegates for a thought-provoking day.

Further reading

A twitter archive of discussions around the day is also available.

Advertisements

Posted in Leicester Research Archive, Research Support | Tagged: , , , , , , , | Leave a Comment »

Creating PDF/As – issues with protocols

Posted by gazjjohnson on 13 July, 2010

In the last few days I’ve been working on the protocols for converting supplied PDFs into PDF/A format, something I’ve been meaning to get around to for some time. PDF/A is the format in which we ideally want to be storing PDFs on the LRA; and while it isn’t the best digital curation format for our purposes and workflows it is the most practical solution.

However, I’ve hit on a snag that’s made me pull back.  When using Adobe Pro/Distiller to convert them across, once converted any attempts to copy/paste text out of the converted PDF/A format document displays as symbols or gibberish in Word or even note pad.  This is a problem for us in terms of creating the abstracts on the LRA, but more importantly I am concerned that this might in someway interfere with search crawlers indexing the full text of the PDFs.  Perhaps I’m wrong, but I’ll leave that for someone more technically minded to respond to!

For interest here’s the conversion protocol as it currently stands:

 

  1. Open the PDF in Acrobat Pro
  2. Select File | Print
  3. Select printer Name Adobe PDF
  4. Click Properties
  5. Under Default settings select PDF/A-1b:2005 (RGB)
  6. Untick the box Rely on system fonts only: do not use document fonts
  7. Click Ok
  8. Now click OK to proceed to printing PDF/A
  9. You will be prompted for a location and an alternative name so as not to overwrite the original.

 

As such I’m holding off for the time being on converting supplied PDFs until I can find a solution – if anyone is aware of one I’d be appreciative of hearing it!

Posted in Leicester Research Archive, Open Access | Tagged: , , , , , , , , , , , | 4 Comments »

The DCC Charter & Statement of Principles

Posted by gazjjohnson on 9 February, 2009

The Digital Curation Centre has released a draft Charter and Statement of Principles, and is open for public comment. Well worth a read if you get a moment. The DCC has always been a really useful centre for advice on the long term preservation and access to digital domain data – I know I run screaming from trying to get my head round some of the issues, and am more than grateful that there are people out there willing to tackle this major challenge.

Helpfully in the middle there’s a very nice little statement about just what digital curation is

Digital curation is maintaining and adding value to a trusted body of digital information for current and future use; it encompasses the active management of data throughout the information lifecycle.

That’s rather a good little statement I thought.

I’m attending a meeting in London on the 27th Feb hosted by the DCC where they’ll be talking about some of the issues we all face and the support on offer in much greater details. But in the meantime you can feedback to the DCC on their work at: www.dcc.ac.uk/feedback-charter

Posted in Technology & Devices, Wider profession | Tagged: , , , , , | Leave a Comment »