UoL Library Blog

Develop, debate, innovate.

Creating PDF/As – issues with protocols

Posted by gazjjohnson on 13 July, 2010

In the last few days I’ve been working on the protocols for converting supplied PDFs into PDF/A format, something I’ve been meaning to get around to for some time. PDF/A is the format in which we ideally want to be storing PDFs on the LRA; and while it isn’t the best digital curation format for our purposes and workflows it is the most practical solution.

However, I’ve hit on a snag that’s made me pull back.  When using Adobe Pro/Distiller to convert them across, once converted any attempts to copy/paste text out of the converted PDF/A format document displays as symbols or gibberish in Word or even note pad.  This is a problem for us in terms of creating the abstracts on the LRA, but more importantly I am concerned that this might in someway interfere with search crawlers indexing the full text of the PDFs.  Perhaps I’m wrong, but I’ll leave that for someone more technically minded to respond to!

For interest here’s the conversion protocol as it currently stands:

 

  1. Open the PDF in Acrobat Pro
  2. Select File | Print
  3. Select printer Name Adobe PDF
  4. Click Properties
  5. Under Default settings select PDF/A-1b:2005 (RGB)
  6. Untick the box Rely on system fonts only: do not use document fonts
  7. Click Ok
  8. Now click OK to proceed to printing PDF/A
  9. You will be prompted for a location and an alternative name so as not to overwrite the original.

 

As such I’m holding off for the time being on converting supplied PDFs until I can find a solution – if anyone is aware of one I’d be appreciative of hearing it!

4 Responses to “Creating PDF/As – issues with protocols”

  1. Andrew Norman said

    I’ve found in the past that printing to the PDF “printer” is not the best way to convert documents – if you have the document in Acrobat Pro, “Save as…” seems to be more reliable. There’s also a tool called “Preflight” (in the “Advanced” menu) which allegedly will check compliance to various standards including PDF/A, and convert documents if they don’t comply. However, this doesn’t seem to work when applied to a document that Acrobat 9 has itself saved as PDF/A (it reports umpteen ways it doesn’t comply, and refuses to fix some of them, apparently retlating to dodgy XML).

    I think the moral of this story (which is much the same moral as I remember learning when trying to do PostScript programming many years ago) is that PDF “standards” are such a godawful mess that even Adobe’s own tools can’t deal with them properly. I don’t know whether saving your incoming PDF files as encapsulated PostScript and then using Distiller to convert them to PDF/A might give better results, but it may be worth a try.

    • Yes, PreFlight was one of the reasons I dropped back to printing – as it spent 5 minutes converting a thesis (as save as PDF/A) and then moaned about PreFlight causing errors. Has a brief look at PreFlight and quickly ran away!

      Think I agree with you on the “run away from PDF” asap. If only there were a good, reliable, open and universal standard for archival documents – that was also readable by most end users! I was reading a blog from Brian Kelly the other day where he argued the toss with a lot of different ones (docx, HTML, XML, PDF etc) and from what I could see in the discussion there wasn’t really a clear conclusion.

  2. Leonard Rosenthol said

    As mentioned, the correct way to convert a PDF to a PDF/A in Adobe Acrobat is through either Preflight or the much simpler File->Save As->PDF/A. However, if you have the original material (Word, etc.) then doing direct PDF/A creation from that material is ALWAYS going to be best.

    You don’t mention what version of our products you are using, but like any software, the latest versions (9.3.2 in this case) will always be better than earlier versions. I certainly recommend that if you are not current, that you consider doing so.

    As the PDF standardization process is open to all w/o no participation costs, if you feel that the standards need correction, PLEASE join us! Contact the folks at AIIM (aiim.org) to volunteer.

    Leonard Rosenthol
    PDF Standards Architect, Adobe Systems
    ISO Project Leader for PDF/A (ISO 19005)

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

 
%d bloggers like this: