Creating PDF/As – issues with protocols
Posted by gazjjohnson on 13 July, 2010
In the last few days I’ve been working on the protocols for converting supplied PDFs into PDF/A format, something I’ve been meaning to get around to for some time. PDF/A is the format in which we ideally want to be storing PDFs on the LRA; and while it isn’t the best digital curation format for our purposes and workflows it is the most practical solution.
However, I’ve hit on a snag that’s made me pull back. When using Adobe Pro/Distiller to convert them across, once converted any attempts to copy/paste text out of the converted PDF/A format document displays as symbols or gibberish in Word or even note pad. This is a problem for us in terms of creating the abstracts on the LRA, but more importantly I am concerned that this might in someway interfere with search crawlers indexing the full text of the PDFs. Perhaps I’m wrong, but I’ll leave that for someone more technically minded to respond to!
For interest here’s the conversion protocol as it currently stands:
- Open the PDF in Acrobat Pro
- Select File | Print
- Select printer Name Adobe PDF
- Click Properties
- Under Default settings select PDF/A-1b:2005 (RGB)
- Untick the box Rely on system fonts only: do not use document fonts
- Click Ok
- Now click OK to proceed to printing PDF/A
- You will be prompted for a location and an alternative name so as not to overwrite the original.
As such I’m holding off for the time being on converting supplied PDFs until I can find a solution – if anyone is aware of one I’d be appreciative of hearing it!