PDF/A is an open preservation standard controlled by ISO (ISO 19005-1:2005). It is not a particularly useful preservation format for complex files like spreadsheets, databases, or webpages, but it is quite good for text-based, static documents, both digitized and born-digital. Some of its advantages include:
- the look and feel of the original is retained;
- any fonts required to accurately render the document are embedded within the PDF/A (unlike most file formats, which just point to a place on your hard drive where the necessary font may or may not reside);
- it contains extractable text (for digitized documents, of course, this is only true if you used OCR software at time of capture); and
- it helps to ensure authenticity by being very difficult to modify.
I'm not sure I came away from this course with a comprehensive understanding of the PDF/A standard, but what I do know is that implementing it as a preservation standard is not as simple as choosing a "save as" command (which is, sadly, kind of what I pictured). Documents must be prepared for conversion if they contain problematic features, metadata about the structure of the document must be added, and the resultant PDF/A must be visually inspected for accuracy and validated for conformance to the standard. And that's just the beginning - as Geoff stressed at the end of the course, the format isn't everything; preservation programs require work. We still need conversion procedures, version control, environmental controls, descriptive and technical metadata, regular backups, and vigilance in the face of continued change and obsolescence.
One question I came away with as I thought about how this might relate to my work at the Library is whether the PDF/A might really replace the TIFF image as a preservation format for scanned documents. I can see what the advantages might be, but I'm wondering if there are some disadvantages as well. Is this something that other archives have thought about or are already implementing? I'd be interested to hear what others think.