DataCite and the Research Data Challenge

Posted by gazjjohnson on 30 May, 2012

Last Friday (25th May) I took my second trip of the week to London (having been at the Symplectic User Conference on Monday).  This time it was the gentle stroll from St Pancras to the British Library Conference Centre to participate in the first JISC/BL DataCite workshop.  Billed as an introduction to data citation and DataCite, this seemed an ideal follow up to the Research Data Management Forum event in Southampton back in March.  As the role of the LRA Manager migrates to look increasingly at how we will manage, share and curate research data outputs as well as publications it was the sort of thing that I felt I really needed.

Data Citation

Following the house keeping and welcome from the BL’s Lee-Ann Coleman and JISC’s Simon Hodson (owner of the finest waxed moustache I’ve seen in many a moon), Lee-Ann kicked off with an overview of Data Citation; what it is and why is it important.  The fact that there is an expectation from the RCUK that research data will be shared, to assist in validation of research conducted by their funded investigators, is perhaps the most major driver.  At the same HEIs want oversight on their research outputs, and as such the curation of their organisations data resource is important to them for building on earlier work and enabling collaborative research to organically evolve.  Given that many academics in adjoining offices are often unaware of what colleagues are producing, increasing this transparency and accessibility to a rich, queriable and reusable research resource is believed to be of value in not only progressing collaboration but enabling genuine novel research from preexisting work.

Lee-Ann cited some examples included the importance of data sharing in speeding up the sequencing and generation of a vaccine for the African strain of Avian flu.  Her other examples were also in the STEM field which slightly concerned me, given that two-thirds of research here at Leicester is in disciplines outside this domain; whom in my experience often need a greater assistance in capturing and sharing technological resource.  Lee-Ann stressed that one question that needed to be addressed by HEIs was what is critical/worthy data to curate?  A microbiologist might see all the raw data output from an instrument as worthy of this, and yet for many other people it would be the processed data given context and analysis that would be of value.

What is DataCite?

Next  up was Elizabeth Newbold (British Library) who gave an overview of what is DataCite.  Founded in 2009 it is a registration agency, effectively an allocating agent for DOIs (which I had never realised are based on the Handle system that I use daily in the LRA).  However, it was made very plain that DataCite does not work directly with researchers, they are expected to deposit their data (in whatever way possible) to an appririate data centre, and then come to DataCite to “mint” a DOI.  Minting of DOIs was new phrase for me, but clearly one that I can see slipping into my regular conversations about this subject here at Leicester.

It was noted that the UK Data Archive had a strong definition of what was data (termed data collections) as groups of all outputs from a single project source.  Commented that other data centres across the country were working along similar lines and methodologies.

Biscuits - failed to picture lunch, but it was splendidDataCite Infrastructure & Working with DataCite

After an excellent lunch (BL London catering never fails to delight) Ed Zukowski (British Library) gave a very useful, if in part quite detailed and technical, overview of both DataCite and DOIs.  Handles being the technology that underpins them, where DOI is actually a trademarked derivative.  DOIs importantly point to landing pages not to the objects themselves (akin to our implementation of Handles on the LRA), and in practice using the DataCite front-end take around a minute to mint.  He went on to detail how DataCite resolves contents from DOIs minted via them, but I think I’ll wait and link to the slides once available rather than try and make sense of my slightly confused notes.  I was content to see that the service worked, rather than worry about the technicality.

Following this Elizabeth Newbold returned to talk briefly about working with DataCite and the data client responsibilities.  In terms of their metadata schemea there were only 4 required elements needed to make it work.  However, locally people may well augment this with many more fields as they felt appropriate for discovery and description.  I confess one nagging worry I have is whom will create this metadata?  Is it a task we will anticipate a PI will perform at the conclusion of a project?  Personally I have concerns over the quality, accuracy, uniformity and standardisation of such input; going on my experience of manually created records submitted to the LRA via IRIS.  From the academics’ perspective I can see the challenge being that this will be seen as yet another piece of administration trivia that they are expected to deal with, and achieving the cultural change to embeded this into their standard workflows will be challenging with some serious and time-consuming carrot-whipping.  Given the struggle to work deposit of publications into our open access repository into their routine over the past four years, it is a serious challenge and the scale of this should not be underestimated!

Elizabeth noted that metadata created must be shared under a Creative Commons Zero licence, noting that for example the British Library OPAC makes data available for sharing and reuse in this way.  There were some concerns from those present in the room that this might cause problems in cases where institutions, funders or even publishers made claim over such data.  Another speaker also highlighted the problem of having data (with a minted DOI) then having a third party mint a different DOI to it which could interfere with metrics of access as well as uniformity of reference.  There didn’t appear to be a clear consensus or answer to these concerns, and the discussions broke up over tea.

Challenges Around Managing Research Data

The final session of the day was a workshop format where we were broken into small groups, and then smaller groups, an then finally into pairs (!) to discuss and document what we perceived as the challenges around managing research data.  I think it was a shame we were so subdivided, since while I had a valuable chat with my counterpart I would have relished a broader chat with a slightly larger group.  Given that there was a wide disparity between the role of delegates (from publishers to project manages to editors to directors of service through to repository managers) I feel we lost some of the benefit that we could have achieved through putting more of these diverse heads together.  I also sensed a slight bias in the broader discussion when each pair’s issues were categorised and resolutions discussed – it did feel like the expectation was that the answer to “How do we solve this problem?” was intimated to be “DataCite”.  It wasn’t in our room, although in at least one of the other two larger groups DataCite seemed ready to answer more of their challenges.


My slight concerns over the value of the final session aside, this was an eye-opening and valuable day.  It has for me perhaps opened up more questions than answers, although some of those were provided as well.  Importantly what I think it offered was a chance to gauge where other people are on the research data management question and more importantly it gave shape to the bigger operational and strategic questions that we need to be asking ourselves within our organisations.  As such the day was most certainly worthwhile, and my thanks to all the speakers, organisers and delegates for a thought-provoking day.

Further reading

A twitter archive of discussions around the day is also available.

JISC Information Environment Event April 2011

Posted by gazjjohnson on 8 April, 2011

Aston University Lakeside Conference venueHere are my notes and comments on the event I’m attended at the University of Aston as an invited speaker by the JISC on Thursday 7th April – resources from the event can be found here

Neil Jacobs from the JISC opened the day and gave it some context – taking us from the HE environment of 2009 and the days of the Digital Britain Report to 2011 and the current circumstances.  He detailed the various strands of the programme: Repositories, Preservation, Geospatial Data and infrastructure, Library Management Systems, Activity Data, Developer Community, Infrastructure for Resource Discovery, scholarly Communications, Rapid Innovation and Linked Data.

HE today is beginning to look to bibliometrics for research excellence and impact, which are fairly significant drivers.   Moves towards starting/supporting innovation and entrepreneurship need to be watched closely.  The event as a whole was aimed to share the highlights of learning from the various strands of the programme.

Session 1: Learning from Other Institutions

David Millard (University of Southampton) spoke first focussing on lessons learned from how educational repositories were not working .  They spoke to teachers -real teachers didn’t understand terminology or files from OERs, let alone working with digital resources even themselves.  Research repositories on the other hand give a real service to the researchers that they get (I might question that for some academics!). Looked to sharing sights (YouTube/SlideShare etc) which give teaching resources a home, have community and organisation – but it’s not through altruism for many people.  Developed software called EdShare, a post-learning object repository, that offered various advantages – not trying to force people to model their courses or materials in one particular way.  It also had light, non-restrictive metadata.  Tried to make the educational repository part of the living cycle.  Want BlackBoard to feed EdShare which feeds iTunesU as well.

Kamalsudhan Achuthan was up next (filling in at short notice)  talking about improving research information management, something close to my heart with the current local work towards implementing and integrating a CRIS.  The final report from the project can be found here.

William Nixon gave the next talk talking about embedding repositories into practice.  One of the outcomes of the project has been about  building the relationships between the repository and research office staff.  He noted that the future is embedding the repository within the institutional systems, although interoperability is not automatically easy.  The aim might well be to have an invisible repository moment, when it is seamless integrated into the whole.  The repository was used to gather a lot of the information for the min-REF that Glasgow ran, including impact and other metrics.  Embedding and integrating is about adding value, enabling reuse, reducing duplication and exploiting new opportunities.  Advocacy has evolved (as at Leicester) where it’s about working with the Research Office and other people across the campus; which I would say is a very good thing.  At the same time the project showed that there are different needs for the different disciplines.  He finished by suggesting that the job of a repository manager is moving into new, and exciting, territory.

Damian Steer closed the morning through talking about information architecture.  Interestingly he touched on data sources such as blogs and newspaper reports on the work; which would contribute towards demonstrating an impact for the REF.  Behind the scenes at Bristol they use linked data from the Semantic Web.

After lunch myself, Ben Showers from JISC and Nick Woolley (King’s College) talked about various resource and time saving activities.  I was presenting the highlights from my recent survey (my thanks to all those whom responded) rather than talking from personal experience!  You can access my slides here. Ben’s talk (Why you shouldn’t bother with advanced search) is also online.  While the session (which was repeated) was not exactly well attended, there was a spirited debate following the talks on both occasions.

Finally Margaret Coutts from the JISC Infrastructure and Resources Committee came on to deliver the keynote.  Among the comments she made, were that it is important top remember that research repositories are not solely for archiving for the REF, nor are teaching repositories solely for exploiting the content – they should both work in that area.  There is a need to develop life-cycle  management for the documents within them as well.  Academics are now more ready to come forward and expose all the extra effort they put into preparing journals – unpaid contributions and asking the questions – just what are publishers doing for us?  Will they challenge the publishers?  Uncertain as there is  desire not to damage peer review in the process.

The change in scholarly communications is a long game, and not one that will happen in the next few years, although there will be work in the right direction.  Work on LMS indicate that shared systems may well generate shared efficiencies and reduce costs.

One of the big growth areas in the coming years was suggested to be teaching and OERs, where platform rather than standard will be more important.  Likely there will be pressure for more sharing of these both within and without institutions, although there will be some items for local access as well as those for fully open access.  Digital Preservation is something that keeps falling off the edge.  We know what digital preservation is, but keeps being postponed because there are other more pressing things -but this is a time bomb.  We need to address this as a community sooner rather than later.

Urgency for solutions is going to increase.  Are there quick wins we can gain from the JISC projects, that can be put out to the sector.

Rachel Bruce then capped the day off by looking at the way ahead for JISC, which even though it has reduced funding is still charged with enabling innovation but at the same time ensuring that lessons learned and applications developed are able to be taken up by the LIS community.

RSP Winter School: Day 2

Posted by gazjjohnson on 14 February, 2011

(You can read about Day 1 here.)

Day two opened with lightly less overcast skies, and Jackie Wickham giving an overview of the work of the RSP; past and future.  This was followed by Max Wilkinson from the British Library talking about their Datasets Programme; which I was especially looking forward to hearing.  It was interesting to hear about an area, which by accord, most of the room wasn’t doing a great deal about practically.  That data sets are of a volume magnitudes greater than the publications that most repositories deal with is no surprise, and that most repository softwares are not especially great ay handling them wasn’t either.  I was hearted to hear that the BL are working in this area, and appear to be thinking about it at a national level.  I must confess that personally I’d expect that a national solution for data sets repository is more likely to be effective than a local one; but thinking that and seeing it happen are two very different things.

Me, watching Keith's talk - I kid you not!Then Prof Keith Jeffery from euroCRIS/STFC gave a talk which…well it was very information rich.  I described the talk afterwards as akin to the “last 30 minutes of 2001, only without a monolith”.  Keith was nominally talking about euroCRIS but this was almost submerged in the presentation that whipped past with terms half known and unknown.  There was certainly a worth in hearing someone as plugged in at the national level to STEM work as Keith; it was unfortunate that his talk wasn’t really pitched at a sufficiently practical level for those in the room.  I shall however, look forward to re-reading his slides (assuming the RSP shares them) at my leisure, over perhaps a day.

Next up was Mark Cox from King’s College talking about the Readiness4REF project.  Leicester has been slightly involved in this project, with respect to CERIF so some of what Mark ta;lked about was familiar to me.  I came out of this talk taking away the message that making sure your repository is CERIF compliant will make it faster, more effective and ready to interact with the wider community; which can only be a good thing.

Repository junction broker system outlinedAnd then he was followed by Theo Andrew from EDINA who presented what I can only describe as THE talk of the conference for me.  Theo outlined a world where a lot of work is repeated at different institutions, where three co-authors at different unis are each asked to make a deposit of a copy of their paper, with varying levels of success and engagement.  The Repository Junction project proposes to streamline this, so that when one academic deposits, the software seeks out the repositories of the other authors and punts the paper into their verification and deposit workflows.  William Nixon (Glasgow) refered to it as a killer app and to be frank I think if it works he won’t be proved wrong.  Theo’s only working with a limited number of institutions but the plans are to expand out to a larger group; and I like many in the room I can imagine would be only be too happy to be involved!  I’ll be following the project blog with interest.

After a delicious lunch (which made me glad I skipped breakfast) Balviar Notay from JISC spoke about the Take Up and Embedding Programme projects, which was I admit a bit of a blur of acronyms.  All the same some interesting work is going to be carried out under this banner.

She was followed by a workshop session fronted by Jackie Wickham and four willing helpers, which ran into the early evening.  Four facilitators (Miggie Pickton, Nicky Cashman, Jill Golightly and Rachel Proudfoot) moved around four groups and spent 30 minutes discussing issues related to their own projects, locals and experiences.  The small group format allowed for a more intimate level of discussion than might have been enjoyed in the whole group.  I must confess that the first couple of these sessions did little for me (other than further developing my sense that Glasgow has done so much that many of us will struggle to ever achieve their level of success!).  However, the sessions with Nicky and Rachel were much more suited to my personal interests and certainly clarified one or two ideas I’ve been having of late about the LRA and our future direction. 

The day’s sessions was followed by the conference dinner, and repository related discussions and exchanges which lasted long into the night (I lasted ’til around 11.30 but then had to call it a night). An intensive, packed day with a lot for me to reflect on and revisit now I’m back at Leicester.

(You can read about Day 3 here)

RSP Winter School 2011: Day 1

Posted by gazjjohnson on 10 February, 2011

Amramthwaite Hall Hotel, Front LawnWednesday saw me make the long trek up to the top of the Lake District to Armathwaite Hall hotel and the RSP’s winter school.  Like the summer school these are small, intimate gatherings of repository workers to share experiences and learn from one another.  It’s all a little more focussed than your average conference and more akin to an OU summer school sort of thing.  We’re expected to work, not just simply sit here and listen; although it would also be nice to wander the grounds or visit the spa…if there was just the time!

Day one kicked off with a tasty lunch before Jackie of the RSP opened proceedings (in place of Bill Hubbard who unfortunately had called off sick).  Following an ice breaker (which involved a lot of movement and talking) we had the keynote from Salford University’s Vice-Chancellor Martin Hall.  Martin is very switched on to the modern electronic communication environment (the first VC to tweet) and gave an impassioned overview of the importance of open access to the modern scholarly institution – underlined with the economic importance of it as much as the research world.

Martin speaksHe suggested the future for repositories is increasingly going to be centralised and national level, and that local institutional repositories may in time go the way of the dinosaur.  Although, this said he admitted this would take make years to arrange, and given the competitive nature of many institutions might be easier said than done.

It was great to hear a senior institutional manager who really understands the role of open access and repositories, and a great way to really kick the school off.  After Martin the event went from the sublime to…well me.  I was drafted in at the last minute to give a reflections of the summer school 2010 talk.  I can’t claim it’s the most polished talk I’ve given, but seemed to go down well.

We had a short evening break, and then we moved into a debate between Green and Gold open access as the final route.  Personally I still think the truth is a hybrid model, but it certainly was good to hear a former RSP member (Dominic Tate) debate with one of the newest (Emily Nimmo).  Then we moved onto dinner and informal discussions.

Oh tweets from the event are tagged #rspws11

And day two, looks even more packed.

Read about Day 2 and Day 3 here

Open Access Week 2010: Monday

Posted by gazjjohnson on 18 October, 2010

It’s day 1 of our 5 day celebration at the LRA for Open Access Week 2010.  Today we went live with our article in the daily ebulletin, which goes to every member of the university.  I’ve also been out and about meeting a couple of researchers as part of our Repository in Your Office push – my thanks to both Aldo Rona and Helen Atkinson for asking us in.  Plenty of slots left in the week for anyone else who’d like us to swing by and collect electronic versions of their publications!

I thought I should also highlight that the JISC has made some OA Week material available too, updating it on a daily basis.

Random paper of the day: Encounters with Vortices in a Turbine Nozzle Passage

IPR Workshop

Posted by taniarowlett on 4 October, 2010

Last week I attended the Strategic Content Alliance (SCA) IPR workshop hosted by JISC.  The event was run exceptionally well by Naomi Korn and Sarah Fahmy and included overviews of the current progress of the Gowers review, the mounting case law surrounding the issue of privacy vs public interest and the remaining uncertainties relating to the future impact of the Digital Economy Bill. 

There was a great deal of discussion within the group throughout the day, but three main themes emerged:

  • the increasing emphasis on developing a business model to generate income from the rights we hold, whilst continuing to participate in the open access movement.  How can we best generate commercial income but still make our works open access?
  • the clear need for legislation on the use of orphan works.  Currently the decision to digitise and/or use works where there is no clear and contactable rights holder is determined by the level of risk associated with their use, and the decisions vary widely from person to person and institution to institution
  • the weighing up of benefit vs the cost of clearing obscure works.  Naomi made a good point in saying that the more obscure an item, the more cultural and educational interest it may hold.  However, for those of us involved in trying to clear rights for any use of these items it can be extremely time consuming, for perhaps little perceived gain.

With Universities and public bodies finding themselves having to justify their use of public funds, the question is whether we should be pursuing activities which provide us with maximum ‘internal’ educational use; making more of our creative output available to public; or be trying to use the unique resources we hold (specialised archive material, high profile academic research) to generate alternative forms of funding, and which is the most cost-effective?

Repositories and CRIS WRN event article

Posted by gazjjohnson on 23 August, 2010

Nick Sheppard, Leeds Met University (aka MrNick on twitter) has written a good article in the most recent Ariadne about the Welsh Repository Network/JISC workshop back in May looking at the interaction between CRISes* and repository systems.  As I was unable to get to this event due to prior commitments, it was good to have a chance to catch up on the discussions.

I was interested to note that a CRIS (Current Research Information Systems) can go by many names – given that the UoL Research Office often refer to them as RIMS – Research Information Management Systems.  They’re not alone as many universities seem to have renamed them as RMAS or ERA and the like.  But at their heart they are systems that not only gather in research publication data (and much more), but actively link to other systems – chief among them from my perspective interlinking with a repository.

The question “Is an IR a subset of a CRIS?” posed by one speaker (Simon Kerridge, ARMA) is an interesting one.  Having seen a number of recent CRIS vendor demos, it is one that is clearly approached in different ways by different organisations.  Some very much see the IR as a satellite system, fed largely (but not entirely) by the CRIS.  For others it is more of a subsumed system – with a visible front end peeking out, but the rest of the body absorbed by the greater whole.  I must confess so long as the workflows for such issues as rights verification and data management are still handled by the elite repository administration team I don’t have an especial problem either way.  However, if a CRIS/Repository union means that a repo is just a reflection of the CRIS data set, locked down without the additional resources embodied and ingested by the IR over and above the REF related items; well then I’m a little more uneasy.

The talk from St Andrews’ Data architect Anna Clements (which came with some interesting but not readily comprehensible diagrams) brought up the CERIF standard.  Interesting that St Andrews has been pursuing links to their repository for far longer than many other institutions, which has demonstrated the advantages of working closely together with research support personnel (something I’ve benefited from here at Leicester in the past two years and can heartily concur).

Meanwhile William Nixon and Valerie McCutchean of Glasgow gave a very useful overview of the integration of the repository with a CRIS.  I was able to plot from my own experiences whereabouts we are in this process here at Leicester.  They raised a valuable point about author authorities – something that has long concerned me as an issue to which I don’t have a ready solution.  In some regards I’m hoping the CRIS implementation here will allow us to tackle and resolve this at that point – given that unique IDing of authors is something that is key for bibliometrics and REF returns alike.  I notice William doesn’t appear to have offered a solution though in his talk, which is perhaps a slight concern for me.  I wonder how difficult it is going to be to match an author of a non-REF item that routes into the repository from beyond the CRIS with the institutional verfiied author list.  And what about external additional authors?  I suspect this is going to be a major issue for me and my team to resiolve and one that I’d welcome external insight on.

Finally my old friend Jackie Knowles talked about the pitfalls of implementation – most of which I am, thankfully, already well aware.  I think we definiely need more of these warts and all case study examples though; as at the end of the day those of us working at the sharp end of repository/CRIS interlinking will need to know how to work around so many of them.

It sounds like this was an excellent day (and perhaps in serious need for near future repeating!) and a definite must read artilce for anyone about to establish, or already working towards, a CRIS/Repository interlink.

E-infrastructure in the arts and humanities

Posted by emmakimberley on 7 July, 2010

Yesterday I attended a workshop on the uptake of e-infrastructure services in the arts and humanities at KCL. In a growth area mainly developing through the science disciplines, the aim of the day was to assess various services and resources that can be used by arts and humanities researchers as well as looking at the barriers that prevent use. The workshop facilitators had recently run an e-uptake study, comprising interviews with researchers and research support professionals, and are interested in exploring the potential use of e-infrastructure in arts and humanities subjects. A database of their findings is available here.

e-Infrastructure services explored include: digital curation, text mining, the UK Data Archive, various grid computing services for researchers, Virtual Vellum, and a JISC-funded virtual research environments project working on ancient documents (eSAD).

 Some main points from the day:

  • Infrastructure includes tools and resources, but also needs to include training and dissemination opportunities.
  • Barriers to use include lack of funding, lack of knowledge of projects, and lack of understanding of which technologies may be useful.
  • Training alone won’t encourage use of services
  • It’s easy to provide support…
  • …but hard to provide the kind of support that helps potential users know which technologies to use.
  • Worked examples of uses in each discipline are essential: proof of value will encourage further use.

The consensus was that these resources have great potential to help arts and humanities researchers, but that there are still many barriers (both practical and psychological) discouraging engagement.

JISC Conference: April 13th 2010

Posted by gazjjohnson on 15 April, 2010

Round the corner from the conferenceThis Tuesday I travelled down to the Queen Elizabeth II Conference Centre in a very sunny Westminster to attend the annual JISC Conference.  This event draws a lot of senior people from across the educational sector; and it’s possible to run into more than a few VCs over coffee.  It’s also a rich opportunity to hear from the broadest cross section of educational computing projects.  What follows are my notes

 The day was introduced by Malcolm Reed and Chair of JISC then JISC Chair Sir Timothy O’Shea. Spoke about current value as well as what the impact the UK election and reduced funding means we as a sector will be dealing with.  The next 10 years will be difficult as the environmental impact as well as funding will impact on HE computing.  He highlighted an article in the Guardian (14/Apr/2010) on HE, commenting that it complemented the lively pre-conference debate 150 people yesterday led by JISC Vice-Chair.  Suggested to go back and have one key thing to implement.

Martin Bean, VC OU: The Learning Journey: From Informal to Formal

A packed hall of listeners

An anarchist at heart who sought to spark discussions and possibly put a few backs up; with imitable Australian bravado.  Distance education is on fire – because you cannot build enough brick and mortar institutions to keep pace with growth in HE; and thus need to look at alternative delivery modes.  Distance learning is growth area, as cannot build enough brick and mortar HEIs.  But 1/3 HE students are in private institutions – going to see a growth in private organisations providing this kind of educational role.

 Challenges for the custodians – need to educate citizens for new kinds of work.  STEM is key for a competitive workforce for the next 10-50-100 years for innovation.  Need to think about transformation of information into meaningful knowledge.  John Naisbitt book Megatrends was mentioned.  Learning in the workplace needs to become essential, and supported by HEIs more.

 Modern students need constant stimulation and hate complexity (among other aspects of their  desires) but does this mean we need to dumb down our degrees, or shouldn’t we adapt to the modern student expectations?  Is there nothing to be said for a proper old fashioned solid and complex education, I wondered  – where does that take us in terms of teaching critical thinking?

 What can be done to break down the barriers?  Multichannel.  YouTube and iTunes university – 342,000 downloads a week for the OU – in the top 10 in U channel; and most of that traffic comes from outside the UK, pay off is that many of their new students first encounter the OU in this way and are drawn in by the brand.  Informal learning, more cooperative environment and need for flexibility for educational institutions.  LLL need the ability to move in and out of HE formally and informally.  Comments that the D.E. Act is going to seriously interfere with this ability to evolve and use new patterns of education, research and training.

Living with IPR – the web, the law and academic practise

View out the window at lunchCharles Oppenheim opened with a passionate and scholarly dismantling of the appallingly poorly debated and rushed through Digital Economy Bill (now Act).  Then Jason Miles-Campbell (his sporran is a wifi hot spot allegedly) from JISC Legal spoke.  In the next five years there is unlikely to be changes to copyright protected items, you need to find an exemption. Gave an overview of the small changes in the law and clarifications under law for reuse of items.  Digital Economy act – what’s going to happen to institutions – some time to go to see if we are subscribers or ISPs as there will need to be case law.  Note that D.E. Act calls for a graduated response to infringement.  Talked about the Newsbin vs big media companies case.  Newsbin was indexing infringing material – in court case they were found to be infringing.  Court noted what we need to do to have an exemption for such a thing; Newsbin was effectively authorising infringement – encouraged copyright infringement by employing editors.  11 words effective of being substantial.  No good making a large amount of material available to staff, if they’re unsure if they can legally use it.  Patchwork licenses are a problem – different aspects of resources covered by different legislation.  May mean we need to ditch some resources that we won’t be able to use.  Need to make life easy, but we also need to be able to take risk decisions – e.g. like driving – there are times when 32mph in a 30 zone can be okay, but you have to make the judgement call.

Naomi Korn and Emma Beer, Copyright Consultants spoke next about orphan works- those where author is unknown or untraceable – they are significant barrier to public access, due to length of implicit copyright.  The internet is a major source of orphan works.  Items hundreds of years old can still be in © until end of 2039!  In a project 302 staff hours were spent to give only 8 permissions received for use in the British Library sound archive – massive staff effort to little effective impact.  EU Mile Project -registry of Image Orphan Works.  EU ARROW Project – accessible registries of rights information and orphan works.  One thing is clear dealing with orphan works even for major bodies and projects requires a lot of work and staff time, something that those of working in open access can be aware of.  In D.E. Bill Clause 43 tried to offer an exemption.  The D.E. Act means that for now you should only use orphan works within a risk management framework, as not clear quite what the impact of this will be.

Project OOER – best name of the day? #jisc10 Organising Open Educational Resources.  Barriers for sharing different levels of IPR awareness, licensing awareness etc.

 Open Access Session, Neil Jacobs (Chair)

Talked about the report authored by Charles Oppenheim et al late last year.  Moves to electronic only can help reduce costs in the scholarly communications sector.  Alma Swann gave an overview of the work looking at three models of repos gold, green, and role of repos as locations of quality assurance and publication – described by Alma as more futuristic.  Libraries do things differently, and this affected the model that they created.   Though unis increase in size the benefits don’t necessarily.  The Salford VC and Librarian of Imperial College spoke about how they’ve gone about making a strong case for open access, fiscally, at their institutions.

Community Collections and the power of the crowd, Catherine Grout

In a fascinating session looking at crowdsourcing and citizen science we heard from Kate Lindsay (Oxford, WWI Poetry Digital Archive) Arfon Smith (Oxford, Galaxy Zoo), William Perrin (Web innovator and Community Activist) and Katherine Campbell (BBC, History of the World) about 4 very different areas of community engagement.  From sourcing and augmenting first world war artefacts from across the country (including a roadshow – turn up and digitise!), though the power of Galaxy Zoo’s galactic classification project – which I’m proud to say I’m one of the thousands involved in.  What was clear from these two talks is the scale of what is achievable is amplified many, many times beyond what can be achieved through using more conventional team based approaches, and that the successes far outweigh the concerns over quality (indeed the “normalisation” of so many repeated analyses ala Wikipedia was touched on).

 William took a different approach building up a resource from the ground up, and using it as a focus for drawing a community together physically as well as virtually.  He showed some excellent examples of what you can do when a community develops a local Web resource rather than just one activist (I am reminded of the local Sileby village Website for an example of how NOT to approach this – locked down and run by a small clique).

For the twitter over view see here, here and here

JISC Legal Copyright Day March 31st 2010

Posted by gazjjohnson on 12 April, 2010

On the last day before Easter I escaped from the office to head to a rather chilly University of Sheffield to attend a JISCLegal Copyright event. As neither of my copyright officers were available to attend, hoped I’d glean some much needed insight into the latest developments in copyright legislation and practice. What follows are my notes from the event with a few comments where appropriate

Digital Images, John Hargreaves, JISC Digital Media
Formerly the organisation was known as TASI. It is based at the University of Bristol. John gave an overview of their role and services; highlighting their new two weekly online surgery which is open to all. Opened with a note on the relative impenetrability of copyright law (which in the light of the last session of day I can heartily concur with)– however, in this session he aimed to demystify aspects of image © law.

Despite what many people assume just because an image is on the internet doesn’t mean you can use it. Since all images are inherently copyrighted, normally to the creator, there is always a rights issue unless the creator/rights owner has clearly waived it – and indeed even then there might be some constraints.

He highlighted the vast growth in digital images, more user generated content, more sharing, ease of access and proliferation of web 2 services like Flickr and Google Images; services that allow dissemination. Traditional legislation is unclear in a digital context, and also laws are constantly changing and tightening. The suggestion today is that balance of rights lies with the rights holders not public access; something that seems to fly in the face of the open access agenda. Copyright in images will change on formats, something that isn’t born digital might have several different rights holders (original photographer, owner of a photo in a gallery, digitiser etc). The length of time that these rights remain as well for each format can differ.

While rights stay with the rights holder normally, if you create something while contracted to work for an organisation the user might not hold the rights. One line that I liked from John was that copyright exemptions aren’t rights to use, they are defences if you are challenged over your use.

So to avoid some of these problems then you should make use of trusted resources, such as JISC Image collections [LINK]. Commercial sites exist as well, although there might well be per-use or subscription fees to pay. Some sites deal with copyright exempt issues like stock.xchng for example. Also mentioned Flickr and advanced creative commons search for images for re-use. However, some people may well mount images in which they don’t own the copyright – assume the owner doesn’t understand copyright. Look through their images and see if the images in a users collections have the same look and feel, a good guide to seeing if they are the creator of them.

If you want to use images draw up your own license, or at least a clear description of how you would like it to work and the uses to which you will put the images. Even if you don’t directly use it with the rights holder it will help form part of your audit trail documentation, and will clarify discussions. You should consider the various possible rights within an image e.g. moral rights, data protection, expired rights due to age and clear statements of ownership. Joint ownership can be an issue where you need to clear the rights with more than one location.  has sample copyright permission letters that you can use.

Think for anything you or your users create to check that permissions to include images are covered. Consider how long a period of time permission is for (forever for a printed document, or a period of time for a web site for example). You also need to think of any related rights that might need to be cleared up at the same time. Is it appearing on the web and will you archive them or the document in some way. What do you expect the users of your object to be able to do with the images? Indeed if you have these issues clear in your head you are making it much easier for the rights holder to grant clear permissions. And all of this must be clearly documented – permissions, what you can/can’t do, who can use it, what can be done with it, what time limits that exist and the context of use of the object.

Creating image metadata to associate with the image and your use of it can be valuable. It allows you to attach the rights and permissions to the object so it can be passed to other people with these usage restrictions clearly accessible. Finally John talked about importance of asking for size/resolution of an image and how this will impact on where you can use it effectively. Print and screen have different requirements, and if you want high resolution images you are unlikely to find them on free sites – likelihood there will be fees to pay.

Music Copyright, Beverley Dodd, Birmingham City University
Fundamentals of music 1) copyright© is traditional copyright for music, lyrics, artwork etc well established. 2) (p) and this applies to the sound recording itself – p = phonographic. Different copyright laws apply to music around the world. E.g. in the UK the life+70 year rule applies, but there are changes planned. The exemptions are very limited for music copyright. For examination purposes students can perform any music behind closed doors, but photocopying of music is not allowed. Noted that now music in shops has to have a license paid for it; so does that mean more musak?

The power shift in the digital age is towards to rights holder, the major corporations, extending (p) on sound recordings from 50 to 95 years; which is a pretty horrific approach. But this has come because the record companies own the recordings but not the original songs, which remain the ownership of the artist.

CLA licenses do not cover printed music, including the words. Some music cannot be purchased, it can only be hired from publishers. The PRS for Music (Performing Right Society) is the main collecting society in Britain – for live performed music must be declared to them and be licensed, even if given for free or charity. Even more true for music used to communicate to the public in the digital media. License charges vary depending on size and type of performance. Note in the US there are some exemptions for some public places e.g. Bars and Grills.

There was a suggestion of using the old postal method of protecting copyright, a sealed envelope with composition inside date stamped, for musicians to record their rights; which seemed horribly antiquated.

The PRS are very litigious and have even challenged people who work to music on their own, or in private or to horses. Note that YouTube and PRS had a spat in 2009 which saw all premium UK music videos dropped from the site for a period. Noted that some police constabulary (e.g. Wiltshire) refuse to pay the PRS fees and claim an exemption. Even a singing granny in Scotland was slapped with £1000 fee, although they backed down after a slew of negative publicity. The key here is they will pursue just about anyone they consider requires a license. There is a code of practice for University’s available from the PRS.

At BCU they have a conservatoire, and so music copyright and reuse rights are very important to them. Future music © trends as noted are tightening up and locking down. Noted wifi and the Digital Economies Bill means that universities will be required to police and cut access to any illegal use as defined by the UK’s restrictive copyright laws.

eTheses at the University of Sheffield: a case study, Clare Scott
Ethos kicked off by aiming to digitize 5000 high use theses across the country, with 500 supplied from Sheffield. Not all of these were digitized due to issues at the BL. EthOS soft launched 2008. At Sheffield works in a very similar manner to Leicester, including a period of embargo allowed for. Mandated deposit to all students registered from 2008. 3 faculties broadly in support, 2 have particular issues, and 1 is strongly opposed. Issues that have come up included:

  • Prior publication concerns
  • Book publication
  • 3rd party copyright and finding permissions
  • Plagiarism.

In practise hard copy submission will continue for 5 years (2013) and will be reviewed at that point. So far on a day to day basis it hasn’t been a massive change.

Benefits to students include readability and accessibility on a global scale. Hopefully this means their impact will be more immediate and that (eventually) download statistics will be visible. It also offers a taste of self-marketing and promotion for the student. Has helped students when they come to publish as they are seeking copyright permissions earlier that they would otherwise struggle to obtain. Embargo reasons are much the same as ours, including political sensitivity. All theses have to be uploaded, even those embargoed as they can go into the dark archive and not be made visible – but it does mean that an electronic version is available. Problems with commercial exploitation of material when a commercial company took every one of the medical depts, so need to make sure any license doesn’t allow for this to avoid conflicts with academic’s later work.

Sheffield are paying £8,000 a year towards the £40 per theses digitisation fee. Pay up front model is causing problems and concerns from students who expect university will pay. They don’t ask author permission, and in terms of older materials don’t worry about copyright and other issues – reliance on takedown policy. Librarians get asked to download and add to stock, but permission for this is not given. Result is a lot of questions remain, like changing to asking author permissions, or desire from alumni to see theses live. The problem of rising third party copyright questions will continue to rise, and if the training is sufficient to equip the students with the skills to deal with the issues.

Copyright & the cultural sector, Tim Padfield
Developments in copyright law – in policy terms copyright is most important IPO legislation, over patents which actually brings in more money. Libraries and archives are regarded as trusted intermediaries, between rights owners and users, which means it should make things easier for us to seek permissions. A contract can override copyright, and this can be a problem.

Digital Economy Bill Orphan works – Anyone can become an authorised body to license orphaned works, via application to secretary of state. However, every work must be investigated before it can become an orphaned work and so doesn’t really help facilitate mass digitisation.

Exemptions including reprographic copying to cover films and audio, to allow external access to VLEs. Exemptions don’t apply if there’s a licensing scheme in active. Notable that just because an organisation does education, does not mean it is classed as an educational establishment for the purposes of the exemption. Fair dealing is designed to expand to all forms of media beyond text; but only to work carried out by students or staff at a prescribed educational establishment (for private study or research).

Undefined terms and concepts, Tanya Alpin
The final session of the day was rather a disappointment, as it was delivered at an expert academic practitioner level and as such was all but incomprehensible to me.  While doubtless there were some in the room who could follow the legalese, considering the accessibility of the rest of the day’s sessions this was a shame. The one piece of advice I did manage to glean was on the role of originality – the less original a work is, the easier it is to reuse fairly.

Repositories and the Cloud – conference report

Posted by gazjjohnson on 24 February, 2010

Now that's magic!Yesterday I travelled (via a slightly circuitous route) to London and the Magic Circle Headquarters to attend the JISC/Eduserve event Repositories and the Cloud.  Billed as a technical and policy event I was a little concerned that it would be too high level, but as things turned out it seems my understanding of such things is better than I thought.  I can only capture my main thoughts on the day, as 10 minutes in my netbook power died and I had to drop back onto my HTC Magic to continue my Web 2 participation.

The day was split into three main chunks.

In the first session Michele Kimpton (DuraSpace Foundation), Alex Wade (Microsoft Research) and Les Carr (EPrints) talked about their repository platforms, and how they are looking to engage with the cloud.  I was especially interested in what Michele had to say; since DSPace and Fedora are now under one umbrella corporation I anticipate big changes in what we use currently as our repository platform.  Talking to other delegates at the event it seems the smart money is backing Fedora to emerge as the sole winner, but I guess we’ll have to wait and see.  I have to confess I’d never even heard of Windows Azure before today, and as it turns out I wasn’t the only one.  That Microsoft are making a play into the repository market is fairly significant, as increasingly it seems that proprietory software platforms may be in all repository manager’s futures.

A lot of what they focussed on (from my P.O.V.) was the using of off-site cloud storage for large chunks of repository contents; for preservation, for reasons of perceived economic savings and maintenance of access.  I was quite surprised as well to hear just what a big player Amazon are in this market; it seems their ambitions in the library and information sphere goes far beyond the supply of cheap books.

After a lot of lunch time discussions we moved onto the second session with Terry Harmer (Belfast eScience Centre).  Terry’s presentation seemed more technical, and so I must confess some of the finer points were probably lost on me.  However, he did raise a point about trans-national data storage and security that interested me.  The fact that under British law it’s not permissable to just host any data about people off-shore is a definite challenge to moving into the cloud.  Then again, how many of us are already doing such a thing if only in a small way.  He touched on the need to use either EU based servers, or ensure that (for example) US hosts had signed up to safe harbour (or harbor I imagine) agreements, whereby they would agree to respect key elements of UK and EU data protection laws, where they would otherwise not apply.

He also touched on the risks involved with using a major host (for example once again) Amazon – by being a bigger target as such there was a greater risk from griefers or hackers in general for DDoS attacks (taking out access to your data) or worse.  As he pointed out, most IRs are far too small to bother with; although I know that this doesn’t make them invisible or inviolate from assault.

The third part of the day broke out into two discussion sessions – one technical and one policy.  Paul Miller facilitated the policy one I attended, and there followed two hours of wide ranging discussions about using cloud resources for repositories.  A straw poll in the room revealed that most participants were using cloud resources already on a daily or regular basis, and a sizable number were doing this for work based activities and even with the blessing of their management.  However, a lot of the group noted a certain corporate or personal reservation about trusting to the cloud to the Nth degree.  One point that particularly resonated for me was the comment that “We’ve been spending a lot of time and energy advocating local repositories as something controllable, accessible locally mounted to placate certain academic worries – to suddenly shift to trans-national locations might well start ringing alarm bells for many of them.”  A feeling that was expressed by some in the meeting, and in the discussions that followed, that while cloud computing options were certainly exciting the repository field itself is still insufficiently mature to go for them wholesale just yet.

Me, and the founder of the Magic CircleThe event concluded with a reception where the discussion continued into the evening, or until the Magic Circle threw us out.  I spent some of the time planning some possible collaborative work with the RSP team in among sharing experiences and feedback on the day.

Overall this was a challenging day of thought and discussion.  Am I convinced that cloud computing is the saviour of repositories? No.  Do I think real economic savings can be made with them?  Not yet.  Do I think they’re relevant to the future of the repository community?  Almost certainly.  It will be a field well worth the watching, and doubtless there will be more about it in the months or years to come.

A twitter stream from the event is available.

[Edit: Presentations from the event can be found here]

Innovations in Reference Management Part 3

Posted by selinalock on 25 January, 2010

Moving Targets: the role of web preservation in supporting sustainable citation (Richard Davis & Kevin Ashley)

This was a rather different talk to most of the others at the event as it was looking more at the question of how we can cite the preserved version of ephemeral type of data, such as blogs, that we often see on the web these days.

  • Some web preservation already happening: URI/DOI/Handles & other solutions, Wayback machine and UK Webarchive.
  • Are we educating people to use links to sustainable archives/ Should we be recommending linking to the UK Webarchive version and not the original version?
  • Used the example of citing a blog post that might disappear.
  • Will our “collections” look different in future, will they be blog type posts rather than journal articles or books?
  • Talked about the JISC project ArchivePress which allows you to use a RSS feed to create a preserved blog archive: this will allow Universities to create their own repository of blogs. For example, it could integrate with Research Repositories that use applications like DSpace. Should the Leicester Research archive be looking into preserving research blogs as well as other research outputs?
  • Heidelberg University and others have created a Citation Repository for transitory web pages: this was specifically to deal with the problem that their researchers were having when researching China, due to the volatile nature of the Chinese internet. There might be rights issues with this approach but many of the original web pages had disappeared.
  • Should we be teaching people about sustainable resources/publishing as part of our information literacy efforts?
  • Can argue that citing a URL is like citing the shelfmark of a book in a library, as it’s the location of the information rather than the information itself. Should we be looking for a better citation system?
  • Possible solutions: Institutions can offer archive mechanisms, authors need to use archive mechanisms, if a blog is being preserved than it needs to expose that permanent citable link for people to use (e.g. ArchivePress link) and permalinks should be a bit more “perm”!

Help me Igor – taking references outside traditional environments (Euan Adie,

Euan gave an overview of some of the projects they are working on as part of the remit:

  • Looked at how referencing might be achieved if you were using GoogleWave as a collaborative tool to write articles etc.
  • Decided to create a 3rd party GoogleWave widget called Igor.
  • Igor lets you fetch references from Connotea or PubMed and insert them into the Wave: it does this by typing in a command in Wave.
  • Igor uses an open API to retrieve data (XML or RDF) and is only a proof of concept widget at the moment. it is OpenSource and people are welcome to develop it further.
  • Euan did point out that the formats that most reference software uses (RIS/BIBtex) are not very easy to use with web APIs.
  • Mentioned ScienceBlogs: an initiative to aggregate well known science blogs through E.g. finds if blogs link to Nature articles (via html, DOI, PubMed): blogs already comment on articles when they’re published so Nature wants to link the comments/blog posts to the articles.
  • Have a API available that allows you to feed in am article DOI and see what blogs aggregated through mention that article.
  • Mobile devices: have made Mac app Papers available on iPhone. thinks people are not as likely to read articles on mobiles but save the reference for later instead.
  • always willing to experiment and collaborate with other projects.

