UoL Library Blog

Develop, debate, innovate.

Posts Tagged ‘formats’

EPUB vs PDF

Posted by gazjjohnson on 8 July, 2011

Interesting question from my boss this morning asking about the EPUB format especially as it contrasts to PDF, which i confess I know little about.  This is on the back on one of our departments increasingly looking towards making material available on eReaders rather than our VLE (BlackBoard).  My thanks to the folks on twitter whom have kicked in the following bits of insight.

  • EPUB is basically a zipped bag of xml and css with slightly improved DC metadata in it. Best for reflowable text, unlike PDF.
  • PDF is written in stone so doesn’t flow well on ereader devices.  Best ereader for PDF is iPad. EPUBflows.
  • Calibre makes EPUB
  • EPUB will work better on e-readers like kindle – PDFs work but difficult to read
  • Think there is linked data potential in the metadata.
  • http://bit.ly/g7CzSe v.3 is particularly interesting from a metadata perspective
  • Not just for ereaders IMO. Range of advantages Inc. Reusability & accessibility

So there you are – all the wiser now.  The link above is actually well worth following as it does give quite a clear view.  Is it enough information for the boss?  I don’t know, but I’ll pass it along and see what else she’d like to know.

Advertisements

Posted in Service Delivery, Technology & Devices | Tagged: , , , , | 13 Comments »

Creating PDF/As – issues with protocols

Posted by gazjjohnson on 13 July, 2010

In the last few days I’ve been working on the protocols for converting supplied PDFs into PDF/A format, something I’ve been meaning to get around to for some time. PDF/A is the format in which we ideally want to be storing PDFs on the LRA; and while it isn’t the best digital curation format for our purposes and workflows it is the most practical solution.

However, I’ve hit on a snag that’s made me pull back.  When using Adobe Pro/Distiller to convert them across, once converted any attempts to copy/paste text out of the converted PDF/A format document displays as symbols or gibberish in Word or even note pad.  This is a problem for us in terms of creating the abstracts on the LRA, but more importantly I am concerned that this might in someway interfere with search crawlers indexing the full text of the PDFs.  Perhaps I’m wrong, but I’ll leave that for someone more technically minded to respond to!

For interest here’s the conversion protocol as it currently stands:

 

  1. Open the PDF in Acrobat Pro
  2. Select File | Print
  3. Select printer Name Adobe PDF
  4. Click Properties
  5. Under Default settings select PDF/A-1b:2005 (RGB)
  6. Untick the box Rely on system fonts only: do not use document fonts
  7. Click Ok
  8. Now click OK to proceed to printing PDF/A
  9. You will be prompted for a location and an alternative name so as not to overwrite the original.

 

As such I’m holding off for the time being on converting supplied PDFs until I can find a solution – if anyone is aware of one I’d be appreciative of hearing it!

Posted in Leicester Research Archive, Open Access | Tagged: , , , , , , , , , , , | 4 Comments »