UoL Library Blog

Develop, debate, innovate.


Posted by gazjjohnson on 8 July, 2011

Interesting question from my boss this morning asking about the EPUB format especially as it contrasts to PDF, which i confess I know little about.  This is on the back on one of our departments increasingly looking towards making material available on eReaders rather than our VLE (BlackBoard).  My thanks to the folks on twitter whom have kicked in the following bits of insight.

  • EPUB is basically a zipped bag of xml and css with slightly improved DC metadata in it. Best for reflowable text, unlike PDF.
  • PDF is written in stone so doesn’t flow well on ereader devices.  Best ereader for PDF is iPad. EPUBflows.
  • Calibre makes EPUB
  • EPUB will work better on e-readers like kindle – PDFs work but difficult to read
  • Think there is linked data potential in the metadata.
  • v.3 is particularly interesting from a metadata perspective
  • Not just for ereaders IMO. Range of advantages Inc. Reusability & accessibility

So there you are – all the wiser now.  The link above is actually well worth following as it does give quite a clear view.  Is it enough information for the boss?  I don’t know, but I’ll pass it along and see what else she’d like to know.

13 Responses to “EPUB vs PDF”

  1. Nick said

    Hi Gaz

    Sounds like a similar use-case to us – we recently got a request to provide digitised CLA material to be accessed on a kindle. I haven’t been directly involved myself but after some experimentation I think colleagues decided that RTF is the best compromise in the short term as they work better than PDF on a kindle (presume workflow issues with digitising/converting to EPUB?)

    • Thanks for that Nick, that’s some interesting points there. I tend to agree (but always happy to be corrected) that RTF seems a better format all in all.

      • Nick said

        Just realised that you’ve quoted my tweet about ePub working on Kindle – I was mistaken I think. It doesn’t!

  2. benosteen said

    Also worth pointing out that some level of javascript support is in both – PDF (javascript + fixed layout with no DOM = failure. Mainly used for forms and viruses) and in ePub (burgeoning support, but as core text is in HTML, javascript will be far more useful here.)

    I’m waiting for charts and graphs that you can interact with, reflowable and reorderable data columns, word-highlighting… This is why I think ePub has a *very* bright potential future for things like academic outputs. Especially as blog posts to ePub is quite a straightforward action which helps strengthen (IMO) the newer way of publishing work online first and having an open peer review.

    • That does sound very promising Ben – so essentially we’re seeing this as flavour of the moment; but what happens if something better rolls along? Competing standards?

  3. Theo said

    Some quick thoughts in no particular order:

    1. Converting PDF -> EPUB can be a pain as page layout formatting is not always correctly transferred
    2. PDF is basically a print format, if you want to print something out EPUB is not really for you
    3. Kindles don’t ‘read’ EPUB so you need to convert to .mobi (using Calibre for e.g.)

    • I thought converting PDF to anything was a pain 🙂 I’m interested in this additional functionality that EPUB appears to have, should we be considering outside of ereaders as a repository format of choice? Or does it bring with it a whole host of DP issues?

      Useful point about Kindles!

  4. Hi Gaz

    One way of looking at EPub is as an encapsulated, offline Web site. The HTML zipped up in an EPub can work just like a website with functioning links (e.g between text and Index, TOC, cross-references, footnotes).

    There might be cases for a PDF where the formatting has to be just-so, but I think in terms of sustainability, usability and reusability, Epub is the way to go. There’s a good case for creating EPub/Kindle editions in place of the Xeroxed study packs many institutions and courses use.

    EPubs arguably need a bit more crafting and testing than a PDF. I’ve been toying with a little EPub edition myself, see what you think:
    (unzip it to see how it’s made). It looks pretty good on Ipad. Converting to Kindle’s less successful as Kindle doesn’t support CSS, but I have a plan to use some XSLT to get round that. This is only a hobby project though, so takes backseat to the rest of my life! 😦

    I think Pete Sefton’s deliberations on Scholarly HTML are relevant too

    • I like that definition Richard – that makes it pretty sound in my head! How easy is it to create EPUBs from scans though? We scan to PDF for our digitised course packs, but then what if people want them in a tastier more mobile format? I’ve passed on the links to our Coursepacks Officer for her to have a play with.

      • Images can be embedded just like in a web page: < img src=”./images/p001,jpg”/> etc. I’m not sure of the exact issues around scaling/resizing, but it looks as though you can (for example) embed them at thumbnail scale and have them zoom full-page when clicked/touched.

        Personally I think it’s verging on the criminal to provide PDFs of text documents that are /just/ scanned images. Images should at least be OCR’d so that the text becomes searchable (any number of cheap or free packages can do this with image based PDFs – I use But then, if the OCR output is good, creating HTML/EPub instead of PDF oughtn’t to be too time-consuming. (But I don’t know what the CLA police would have to say on the matter 😉

        • Far as I can tell (and I’m not a (C) guru) CLA’s okay with OCRing, problem is the time it takes to do. Number of scans we do here, vs the staff resource there’s barely enough time to scan (and administer) let alone run OCRing routines on everything. That said I’ve flagged it with my team as something we really ought to try and fit in somewhere/when

  5. Dear Gareth,

    I agree, I have tried using PDFs with my Sony eReader and the result is mainly eye strain! EPUB would be my format of choice and there are a number of applications, including the brilliant Calibre, that can handle the conversion very successfully. However the big downside is the fact that the Kindle currently doesn’t support it (although there are always rumours that they will ‘soon’), in the face of the number of Kindle owners RTF may be a better compromise format.

    In an ideal world I’d like WRAP content to be coded in HTML with the options to download in EPUB, MOBI, PDF, RTF etc. ArXiv has a similar system with formats specific to the Sciences field. But I’m well aware a. of the technical implications of this and b. the possible response from researchers already reluctant to give us something in a ‘secure’ format! But I can dream!

    • Cheers Yvonne – yeah, I’ve trialed a few ereaders with PDFs – not ideal for reading a thesis on a train. Then again I’m not a massive fan of ereaders per-se (I read a lot in the bath and they’re too much of a risk!).

      Peter Murrary-Rust would jump for joy hearing you say WRAP should be HTML (prefer XML personally) 🙂

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: