Archive processing and organization
The Digital Archive is a managed storage environment for your content. Your content will be stored in the format and directory structure in which you send it to OCLC for processing. As each batch of content is received by OCLC, the Digital Archive performs an Ingest process. During Ingest processing, the Archive:
- checks the content against the electronic shipping manifest,
- checks each file for viruses,
- verifies each file’s data format using JHOVE, and
- creates a digital fingerprint for the file so the Archive can do an independent “fixity” check to verify that no bits in the file have been altered in the future.
At the conclusion of Ingest processing, an Archive Accession Report is created for you to review.
The Digital Archive will allow any data format type to be ingested and stored. The choice of archival data format depends upon your local practice and collection policy. One purpose of the Digital Archive is for you to be able to get back the exact file you sent to OCLC for archiving.
You may include metadata files along with content files. If you do this you may want to create either a directory structure or a naming structure to associate metadata files specifically with content files. Metadata files will be treated as preservation objects like any other file.
While your data is stored in the Archive the system performs regular checks on the health of your content. The results of those checks are summarized in a File Integrity Report. You can use the results of this report to ensure that your content remains unaltered in the Archive. See Reports.
Content in the Archive may be disseminated in two ways. You can disseminate individual files online using a web browser, or you can e-mail a request for a bulk dissemination. See Dissemination.
Your content is organized and stored in the Archive in a structure using multiple identifiers that allow you to uniquely identify individual files and organize your collections. Each of the identifiers below is part of a hierarchy, with Institution Name at the top of the hierarchy and File Name at the bottom:
- Institution Name (OCLC symbol to be exact) – OCLC symbol is the broadest identifier for all your content. We can uniquely identify all the files in the Archive that belong to you using your OCLC symbol.
- Server URL (e.g. CONTENTdm “host name”1) – Identifies the source server from which content was extracted or where the access copy of the content can be found. Using Server URL allows you to group content in the Archive by the source server where associated access copies are kept.
- Collection Name (e.g. CONTENTdm “collection alias”1) – Identifies collections of content. This identifier is unique within a Server URL.
- Archival Volume Name (e.g. CONTENTdm “volume ID”) – Identifies the batch of content you sent to OCLC. This identifier must also be unique within a Collection.
- File Name – Identifies a specific file within an archival volume. This identifier must be unique within an archival volume.
1This information is used in CONTENTdm servers to allow a unique link to a “preservation” file in the Archive to be tracked in the “access” file metadata associated with it.