Tuesday, November 5, 2013

Class No. 9: Accessioning and Ingest of Electronic Records

At the end of October I took my ninth (and final!) DAS course, Accessioning and Ingest of Electronic Records taught by Erin Faulder, Digital Archivist at Tufts University. This was an in-person workshop from the Tactical and Strategic tier of the DAS curriculum, held on the Radcliffe campus in Cambridge, MA. The goals of the course were to introduce accessioning and ingest as they apply to digital materials, to go over some current practices and resources, and to provide students with a foundation that could be used to develop policies and workflows for our own institutions.

I thought the course provided a good overview of the issues we face as we start to accession and ingest electronic records. Some of the steps we talked about were definitely specific to born-digital accessions: talking to donors about how to handle previously-deleted files that are recovered by the archive, performing virus scans on incoming media, and performing checksums on files as they are received are some examples of tasks we may never have attempted before. However, I kept thinking that most of what we discussed could also be applied to the analog world. Archivists already understand the importance of having both overarching institutional policies and explicit agreements with individual donors to govern things like what your institution will and won't accept, how material should be transferred from the donor to the institution, and how your institution will verify that the material it received is the material the donor intended and agreed to send. We have processes in place to determine the level of description necessary at the point of accession, we consider storage requirements when accepting new material (physical space can be just as limited as storage space for digital files), and we sometimes take steps to quarantine new material (mold can be just as damaging to our existing holdings as a computer virus if it is allowed to spread). The specifics will be different - and probably more challenging, at least at first - when the material being sent is born digital, but the concepts are the same. While it is incredibly useful to have workshops like this that focus on born-digital records, it is equally as important to emphasize the fact that much of what we already know about how to be archivists still applies in the digital world.

Just a few notes about what stood out for me in a positive way about this course:
  • The OAIS Reference Model diagram is referenced in almost every DAS course, but here it seemed more concrete and accessible than before, probably because we were focusing specifically on the actions taken during the first two phases of the workflow.
  • In response to a late day question about a specific tool a student had tried to use but didn't quite understand, the instructor made an excellent point that I think should be made in every DAS course: don't let the tools guide your decisions. Rather, figure out what you want to accomplish and then pick a tool which will do exactly that. Starting with the tool will frustrate you, and if it's an expensive tool that doesn't work out it will frustrate your administration as well, making future expenditures less likely.
I've now completed all of the required courses for the DAS certificate, and I'm registered to take the comprehensive exam next week here at the JFK Library. Maybe I'll see some of you there. I don't think I'll be allowed to say much about the exam itself, but after it's over I will write a final blog entry to reflect on this whole experience. Thank you, as always, for reading, and please don't hesitate to comment if you have any questions or feedback for me.

Thursday, June 13, 2013

Class No. 8: Inreach and Outreach for Digital Archives

This week I attended my eighth DAS course, Inreach and Outreach for Digital Archives, taught by Fynnette Eaton. The course is part of the Tactical and Strategic tier of the DAS curriculum, and was an all day, in person event held at the Radcliffe campus in Cambridge. The objectives of the course were to identify the relevant stakeholders surrounding digital archives at our institutions; to learn how to articulate the importance of digital preservation to those stakeholders; to effectively communicate with donors about their born-digital material; and to think about ways to build a digital archives program within the context of our specific institutions. I thought the workshop successfully achieved these objectives.

Though the slides and the discussion focused on managing born-digital material, the general themes of this course would probably be applicable to any collaborative project in virtually any setting. Given the relative lack of born-digital material in my particular institution, I appreciated that; it meant that the discussions were relevant to me, and I was able to participate without feeling like I was thinking only in the abstract. The broad themes as I saw them were these:
  • It is imperative to understand the political and social culture of your particular environment before undertaking any project that will require participation and buy-in from staff and management;
  • Think carefully about who will need to be involved, whether directly or indirectly, and communicate with them early and often. In the context of a digital archives initiative potential stakeholders include management, IT staff, donors, fellow staff, and end users;
  • a collaborative project must provide a clear benefit to every party involved, and the expectations and goals of the project must be well understood;
  • It is a good idea to have a “champion,” somebody who is well-connected, well-liked, and trusted in your institution who can promote your idea;
  • It is important to develop and manage the image you want your stakeholders to have of your project. The instructor referred to this as “branding,” which sounded a little foreign to my ears, but I understood the point and agreed with it. For example, if as a University Archivist you want the professors at your institution to think of the Archives as the natural place to transfer their born-digital files, it’s up to you to give them that idea by promoting the Archives as a safe repository for electronic records, and also by educating them about what files might be suitable for eventual transfer. 
More than any other DAS course I’ve taken thus far, this was a true workshop. We spent a considerable amount of time thinking and talking about the challenges we face at each of our institutions, and worked individually, with partners, and in small groups to brainstorm potential strategies for moving forward with a digital archives program. I thought the instructor did a great job of listening to our ideas and concerns, asking thoughtful questions, and offering useful suggestions. 

I'll be writing again in July, when I'm scheduled to take my ninth and final DAS course. I don't know when or where I will be able to take the comprehensive exam, but as I know more about that I will share it here. Thanks for reading, and as always please share any comments or questions.

Thursday, March 7, 2013

Class No. 7: A Beginner's Guide to Metadata

I recently took my seventh DAS course: A Beginner's Guide to Metadata, taught by Greg Colati and Jessica Branco Colati. This was another webinar course, recorded live back in 2008. I was a little concerned that the information would be too basic and, after five years, slightly out of date, but overall I felt that it was a well-organized, informative presentation. The goals were to provide a basic overview of metadata - what it is, where it comes from, what its components are, and how to choose the "right" schema(s) for one's own organization.

Metadata has of course existed for thousands of years in the form of analog inventories and cataloging systems, but the word "metadata" comes to us from the IT world (incidentally, the capitalized term "Metadata" is actually a registered trademark of the Metadata Company - who knew?). Metadata was originally considered "cataloging for geeks," "cataloging for guys," or just "cataloging for non-librarians," but over time it has become firmly entrenched in the lexicon of archivists and librarians. Simply defined, it is a tool for the identification, management,and use of information resources. It must be structured according to a schema, it must describe an information resource, and it should be useable by both machines and humans. Once created, it is never perfect or truly complete; it must be continually improved upon and eternally maintained. Put another way, metadata is language, made up of syntax, structure, and semantics. As archivists we should aim to be "multilingual metadata speakers."

Having defined metadata, the instructors described the various "typologies" of metadata, such as primary, secondary, or tertiary; descriptive, administrative, technical, or structural; global, community, or local; and embedded or associated. They then explored the idea of metadata as language by picking apart a few lines of an EAD finding aid. The syntax for this particular piece of code was XML, which governed the fact that some information was in brackets while other information was not. The structure was EAD (Encoded Archival Description), which governed the specific elements that could be used, and the hierarchical order in which they appeared. The semantics were governed by DACS (Describing Archives: A Content Standard), which informed the content of the finding aid - the way the name of the creator was formed, for example, or the format of the collection title. The instructors noted that the syntax and structure of our metadata will likely change over time, but hopefully the semantics will remain relatively stable. I thought this was a helpful exercise, in the same way that the basic rules and components of grammar must be understood more explicitly when we attempt to learn a foreign language.

The last part of the course covered general things to consider when choosing a metadata schema (or schemas) for your own institution. This decision requires four steps:
  1. Identify your needs. What kinds of objects do you have, and what kind of information do you need to collect about them? What do you want to be able to do with your metadata? How does your audience expect to be able to find and interact with your holdings?
  2. Identify your resources. Creating quality metadata is costly, but keep in mind that tasks left for sometime in the distant future will likely never get done. Do you want to spend your resources providing a high level of access to a few things, or a low level of access to a lot of things? If your institution follows MPLP, can you justify item level metadata for digital objects? And how do you define an "item"? Does user-created metadata have a place in our catalogs? It may be free to obtain, but there are costs involved in monitoring it for quality control and accuracy.
  3. Test your vernacular. Is the schema applicable? Is it useful? Does it meet the needs of your primary audience? Can you successfully communicate with it?
  4. Optimize your efforts. Look at the interoperability, shareability, reusability, and "archivability" of the schema. Remember that when moving from one metadata language to another some pieces of information will translate directly, some will be aggregated with others into a broader term, and some will be lost altogether.
For archivists with a good understanding of traditional archival practice but with limited experience creating or interacting with encoded metadata, or for those who might like to revisit the basics, I think this webinar is a great starting point. As always, please feel free to comment if you have questions or feedback, and thank you for reading!

Monday, July 2, 2012

Class No. 6: Using and Understanding PDF/A as a Preservation Format

Last week I attended my sixth DAS course, a live webinar titled "Using and Understanding PDF/A as a Preservation Format." The course, taught by Geoff Huth, covered some basic information about preservation standards in general, specific information about the purpose and requirements of PDF/A (and its various versions), and some practical information about how to create and validate PDF/A files.

PDF/A is an open preservation standard controlled by ISO (ISO 19005-1:2005). It is not a particularly useful preservation format for complex files like spreadsheets, databases, or webpages, but it is quite good for text-based, static documents, both digitized and born-digital. Some of its advantages include:
  • the look and feel of the original is retained;
  • any fonts required to accurately render the document are embedded within the PDF/A (unlike most file formats, which just point to a place on your hard drive where the necessary font may or may not reside);
  • it contains extractable text (for digitized documents, of course, this is only true if you used OCR software at time of capture); and
  • it helps to ensure authenticity by being very difficult to modify. 
Because the PDF/A standard is expressly designed to persist over time, it requires that certain "non-archival" features be stripped out of a document before it can be converted into a valid PDF/A file. This applies to anything that might be unstable in the long term, such as embedded audio or video, encryption, compression, transparencies, executable files, or references to external content, though with each new version of the standard it seems that more features are allowed. There are several different "flavors" of PDF/A, each with its own list of requirements. For example, to create a valid PDF/A-1a you will need to include metadata that preserves the logical structure of the document, specifies the language of of the text, and preserves the text stream in reading order, whereas a PDF/A-1b preserves the visual appearance of the original but requires less descriptive metadata (the "b" stands for basic, the "a" for accessible; a document that only conforms to the standard at the basic level is less accessible as a result). The PDF/A-2 allows for electronic signatures and JPEG2000 compression and sets requirements for XMP metadata, and within that there is a PDF/A-2a, b, and u (for Unicode). The PDF/A-3 was recently ratified as well, which is very similar to PDF/A-2 but supports the maintenance of the original file by allowing it to be embedded within the PDF/A.

I'm not sure I came away from this course with a comprehensive understanding of the PDF/A standard, but what I do know is that implementing it as a preservation standard is not as simple as choosing a "save as" command (which is, sadly, kind of what I pictured). Documents must be prepared for conversion if they contain problematic features, metadata about the structure of the document must be added, and the resultant PDF/A must be visually inspected for accuracy and validated for conformance to the standard. And that's just the beginning - as Geoff stressed at the end of the course, the format isn't everything; preservation programs require work. We still need conversion procedures, version control, environmental controls, descriptive and technical metadata, regular backups, and vigilance in the face of continued change and obsolescence.

One question I came away with as I thought about how this might relate to my work at the Library is whether the PDF/A might really replace the TIFF image as a preservation format for scanned documents. I can see what the advantages might be, but I'm wondering if there are some disadvantages as well. Is this something that other archives have thought about or are already implementing? I'd be interested to hear what others think.

Tuesday, April 10, 2012

Class No. 5: Standards for Digital Archives

Last week I took a Foundational DAS webinar, "Standards for Digital Archives" taught by Mahnaz Ghaznavi. As its name suggests, this course provided an overview of the many standards that are available for use with digital archives. The underlying theme was that standards are good, and that you should adopt the ones that fit the needs of your institution. The course began with an example of an electronic record that could have benefited from the use of standards: a word processing file created in an obsolete, proprietary format that displayed as a nonsensical mishmash of special characters. Had the file been converted to an open, standard, more persistent format, the information contained in the document could have been retained.

Though sometimes we create our own standards - a local set of topical subject headings, for example -  the best standards are those that are published and maintained by a standards setting body (such as ISO, W3C, NISO, ANSI, or NIST). There are standards to guide us in almost any activity that we engage in as archivists:
  • records retention and appraisal (ISO 15489); 
  • the ingestion, management, preservation, and access of digital or physical archives (ISO 14721, better known as OAIS); 
  • linking objects with their associated metadata (METS);
  • capturing preservation data about our objects (PREMIS); 
  • capturing descriptive metadata about our objects (Dublin Core);
  • migrating our objects into more stable formats (JPEG 2000, PDF/A)
  • and making sure our digital objects are stored in a secure manner (TRAC)
Given that it would have been impossible to delve into these standards in any detail within the confines of a ninety minute webinar, I think the instructor was able to convey some useful information about the options that are available to help manage digitized or born-digital archival assets. She advised us to learn from what other institutions have done and are doing, whether successfully or not, and to recognize that digital preservation is a moving target. To implement any of these standards one would need significantly more guidance, but this course can serve as the first step to becoming aware of what is possible.

Because SAA generously allows multiple people to view their webinars for the cost of one registration (though each attendee must pay for his or her examination fee), we had a good-sized audience of full time staff and interns in a conference room at the Library. For our interns, most of whom are current graduate students in Library Science at Simmons College, it seemed like much of the information presented echoed what they've already learned in class. I took that as a positive sign that graduate programs are adapting to our increasingly digital world. Archives students graduating now will start their careers already armed with skills and knowledge that more established professionals must actively seek out (by pursuing the DAS certificate, for instance). Of course it has always been thus, everywhere and in every profession, but my perspective until recently has been that of the recent graduate; now that I have been out of school for almost ten years, I find that I am suddenly among those who must rush to catch up or be left behind.

Thursday, March 29, 2012

Class No. 4: Electronic Records: The Next Step

I recently completed my fourth DAS course, an on-demand webinar titled "Electronic Records: The Next Step" taught by Geoffrey Huth, Director of Government Records Services at the New York State Archives. One of my classmates had very recently taken Huth's full day, in-person course on Basic Electronic Records, which is part of the Foundational tier of DAS courses. Though this webinar is part of the Tactical and Strategic tier - the next tier up - it apparently didn't contain much new information that was not already covered in the basic course. Given this, the two courses might be best presented as an either/or choice: the basic course for true beginners, and this webinar for those who already have some familiarity with the issues surrounding electronic records.

The structure of the course mirrored the archival lifecycle, which - as I have learned in all of my DAS courses - is consistent regardless of format: Appraisal, Ingest, Processing and Preservation, Maintenance, Access, and Planning. Though much of the material was familiar, I find that I need to hear this kind of information over and over again before it truly sinks in. I took away the following main points:
  • Appraise ruthlessly. It will cost approximately five times more to store a digital file than it does to store a physical object. We cannot and should not keep everything, in the physical world or in the digital world. If you cannot manage or even access the files, if you cannot maintain their original functionality, or if you do not have sufficient metadata to make sense of them, consider whether they are worth keeping. 
  • Define acceptable file formats (uncompressed, unencrypted) and external media devices, as well as acceptable methods of transfer for your institution. This way you will have processes in place to handle any electronic records that you receive.
  • Make sure that the donor retains a second copy of all electronic files until your copy is verified.
  • Always accession electronic records on a quarantined (i.e. non-networked) computer. Run your virus software, wait a month, and then run it again. 
  • Preservation options for electronic records include migration, normalization, emulation, and output to some sort of hard copy, generally paper or microfilm. 
    • Normalization, which involves converting files to a "normal" format that is open and persistent (PDF/A, for example) the most likely solution. 
    • Emulation, wherein the file is never converted to another format, is a less practical choice, as the original environment of each file would need to be perpetually maintained. I see how this is completely impractical, but if you had the resources and the know-how it might be interesting to have a fleet of computers running defunct operating systems and software programs so that records could be accessed as they were originally created.
    • Output to paper or microfilm might be an acceptable solution if you've got just one or two electronic files, and if those files are simple word processing documents. If retaining the functionality of a record is important (links in a website or formulas in a spreadsheet, for example), obviously a hard copy is not going to be sufficient.  
  • One thing that I found slightly alarming was Huth's assertion that the world, with the exception of the archival community, is turning away from TIFF and toward JPEG 2000 as a standard. Is this true, and if so, what will that mean for digital archives (like JFK's) that are full of TIFF images?
  • Access seems like the trickiest piece of this puzzle. Is access provided online, or just in the reference room? If electronic records are closely related to physical records, how do you provide meaningful access to both at once?  
  • Just as we should define the formats we will accept when accessioning records, we should define the formats we are willing to provide to our users. It should be up to the user to convert our normalized file into whatever format he or she may require.
  • Though our inclination may be to ignore electronic records and digitization, the truth is that if you're not working with the digital world, you're not working in the real world. 
  • You can't do everything at once, but do something, and do it now.
In the spirit of that last point, I am going to try to do something with the electronic records that are stored on this device, which was found by my colleague in an unexpected place in our stacks:

Floppy disk

First I'll need a quarantined computer with a disk drive that will fit this floppy disk, and then I'll need to figure out what program was used to create whatever documents are stored on it. In this case my guess is that they'll be word processing documents that most likely exist in hard copy in the collection already, in which case this disk probably won't be of much importance to the collection.  However, rather than just sticking it somewhere in the stacks and pretending it doesn't exist (as we did originally), I'm going to use what I've learned in my DAS courses to deal with it properly.

Tuesday, March 27, 2012

Changes to DAS Course Examination Policies

SAA recently made some changes to the DAS course examination policies, and I thought it might be useful to highlight them here.

The exams, and the rules governing them, now differ depending on the length of the course. Until now students were given two hours to complete each exam, regardless of length, and some exams had as few as five questions. Now for a web seminar, which is the shortest type of course, the exam will consist of ten questions, and participants will be given just one hour to complete them. In contrast, the exam for a two-day course - the longest type of course currently offered - will now consist of 30 questions, but participants will have up to four hours to complete them.

This seems like a sensible way to acknowledge the disparate amount of material that can be covered in a 90 minute webinar versus a one- or two-day, in-person course. I wonder if the next step might be to weight these courses differently, given this disparity, or perhaps to offer significantly longer webinars to increase the complexity of remote courses for the benefit of those who are not able, for whatever reason, to travel.

The revised Course Examinations page also provides some details about the comprehensive exam, though I'm not sure whether it's new information. It explains that the comprehensive exam covers the seven Core Competencies of the DAS Curriculum, and that each DAS course addresses at least two of these competencies. Any combination of the required number of courses from the four tiers of study should theoretically provide students with the knowledge necessary to pass the exam. The seven Core Competencies are:
  1. Understand the nature of records in electronic form, including the functions of various storage media, the nature of system dependence, and the effect on integrity of records over time.
  2. Communicate and define requirements, roles, and responsibilities related to digital archives to a variety of partners and audiences.
  3. Formulate strategies and tactics for appraising, describing, managing, organizing, and preserving digital archives.
  4. Integrate technologies, tools, software, and media within existing functions for appraising, capturing, preserving, and providing access to digital collections.
  5. Plan for the integration of new tools or successive generations of emerging technologies, software, and media.
  6. Curate, store, and retrieve original masters and access copies of digital archives.
  7. Provide dependable organization and service to designated communities across networks.
More information about the DAS Curriculum can be found here.