Skip to Main Content

image of computer


Maryland State Archives Scanning Standards

The goal of scanning is to create a complete and accurate reproduction of the original. In order to meet this expectation, the Maryland State Archives adopted the following minimum standards for digitization. These standards adhere to guidelines developed in the Federal Agencies Digitization Guidelines Initiative (FADGI) "Technical Guidelines for Digitizing Cultural Heritage Materials," which were updated in 2022.

Contents

Covered Record Types

This webpage contains standards for digitizing the most common types of records. If you have an item not covered in this document, please contact the Maryland State Archives for further information via email at msa.helpdesk@maryland.gov.

Protecting the Original Record

If there are concerns that digitization will cause damage to the original due to its condition or equipment limitations please refer to the resources section to seek additional information or contact the Maryland State Archives for assistance via email at msa.helpdesk@maryland.gov.

Missing Pages/Placeholders

If a page of the original is missing we strongly recommend that the agency create and insert a placeholder image identifying the missing document and the reason that it was not imaged. This is an excellent tool for making sure that every page of a document is accounted for.

Quality Control

Someone other than the scanner should perform a quality check of each image to ensure that:

  • the image is an accurate reproduction of the original and is clear and readable
  • the directory and filename for the image accurately identifies the record imaged

Master Copy/Access Copy

Images created with the intent of replicating an original document are considered a master image or copy. Master copies should be of high quality and follow recommended standards when possible. Master copies are not to be used on a regular basis. Access copies are generally copies of master files whose main purpose is to provide access to users. Access copies are normally lower in quality resulting in smaller file size which allows easier sharing/access.

File Formats

The file format is the specific way digital information is made and stored by the computer. Not all programs are compatible with all file formats. Below are a few common formats:

  • TIFF files are widely usable in many different programs. TIFF files utilize lossless compression and are commonly used for master copies. Files in TIFF format end with a .tif extension.
  • JPEG is a lossy compression for color and grayscale images. Depending on the degree of compression, the loss of detail may or may not be visible to the eye. Files in JPEG format end with a .jpg extension.
  • JPEG 2000 uses image compression to produce both lossy and lossless digital files. Lossless images may compete with TIFF files for archival quality masters. Files in JPEG2000 format use .jp2, .jpf and other extensions.
  • PDF contains an image of a page, including text and graphics. PDF files are widely used for read-only file sharing. Adobe Acrobat is, by far, the most popular PDF file application. If the digitization is intended for long-term or permanent storage, a PDF/A is preferred over the standard PDF.
  • PDF/A is a subset of PDF that is designed for long-term archiving of electronic documents. Files are 100% self contained and do not rely on outside sources for document information.

Minimum Digitization Guidelines

This is a simplified chart of common digitization guidelines. For a more in depth review see the resources section.

Printed Text

Acceptable File Format of Master Copy

TIFF, PDF/A, PDF

Note: If the digitization is intended for long-term or permanent storage, a TIFF or PDF/A is strongly recommended over PDF.

Acceptable File Format of Access Copy

All

Minimum Resolution

300 ppi

Manuscripts

Acceptable File Format of Master Copy

TIFF, JPG 2000

Acceptable File Format of Access Copy

All

Minimum Resolution

300 ppi

Books

Acceptable File Format of Master Copy

TIFF, JPG 2000

Acceptable File Format of Access Copy

All

Minimum Resolution

300 ppi

Photographs

Acceptable File Format of Master Copy

TIFF, JPG 2000

Acceptable File Format of Access Copy

All

Minimum Resolution

300 ppi

Photograph Negatives, 35mm to "4x5"

Acceptable File Format of Master Copy

TIFF, JPG 2000

Acceptable File Format of Access Copy

All

Minimum Resolution

"8 x 10" - 300 ppi
"4 x 5" - 600 ppi
35 mm neg - 2100 ppi

Oversize items: Maps, Plats, Posters, etc.

Acceptable File Format of Master Copy

TIFF, JPG 2000

Acceptable File Format of Access Copy

All

Minimum Resolution

300 ppi
Note: If an oversize document includes a lot of fine detail,
a ppi of 400-600 is recommended.

In House vs Outsourcing

Most agencies do not have the appropriate scanning equipment, software, or staff expertise to execute a digitization project. Evaluation of your resources will help determine if your digitization process should be done in-house or outsourced to a vendor who specializes in digital imaging. Vendors provide digitizing services, technical advice, and sometimes the long-term maintenance of the resulting files. Before talking to vendors, be familiar with digitization technology and have a clear idea of your project and its goals.

Questions to ask internally before contacting a vendor:

  • How much material will be digitized? What type of materials will be digitized?
  • Can the materials leave your site? What precautions are necessary to ensure the security of the materials?
  • What is the physical condition of the materials? Do they need to be prepared for scanning (removing staples and paperclips)? Do they have any special handling requirements that would keep them from being outsourced? Can they be transported easily?
  • What is the required quality of the digital images? High or low resolution? Black and white or color?
  • What is the desired end product? A document management system? A searchable online collection? Who is the intended audience? Staff members? Researchers? The general public?
  • Why are you digitizing the materials? What file format(s) fit your requirements? Do you need both master and access copies? How will each be created? And when? Do the access copies need to be watermarked?
  • What will happen to the original paper documents that were imaged? Do they need to be kept for any reason? Local access? Retention schedules? If not, how will they be properly disposed of?
The Northeast Document Conservation Center has a page that highlights issues relating to Outsourcing and Vendor Relations.

Resources

Federal Agencies Digital Guidelines Initiative (FADGI) Digitization Guidelines
http://www.digitizationguidelines.gov/guidelines/digitize-technical.html

National Archives Guidelines for Digitizing Archival Materials for Electronic Access
https://www.archives.gov/preservation/technical/guidelines.html

SERI Tools and Resources Committee: Managing Digital Projects
https://www.statearchivists.org/viewdocument/seri-tools-and-resources-committee

International Press Telecommunications Council (ITCP) Metadata
https://iptc.org/standards/photo-metadata/

Library of Congress Preservation Guidelines for Digitizing
https://www.loc.gov/preservation/care/scan.html

Society of American Archivists: External Digitization Standards
https://www2.archivists.org/standards/external/123

Northeast Document Conservation Center: Outsourcing and Vendor Relations
https://www.nedcc.org/free-resources/preservation-leaflets/6.-reformatting/6.7-outsourcing-and-vendor-relations

National Archives: Table of File Formats
https://www.archives.gov/records-mgmt/policy/transfer-guidance-tables.html#digitalstillimage

Digitization Terms

Below are terms that you will encounter frequently with any digitization project.

Digitization-Digitization is a process by which a document or photo is scanned and converted to a digital format. After scanning , the original document or photo is represented by a series of pixels. The image can then be kept on a network or transferred onto a variety of storage options.

Pixels- A pixel is the smallest unit of a digital image or graphic that can be displayed and represented on a digital display device.A pixel is the basic logical unit in digital graphics. Pixels are combined to form a complete image, video, text or any visible thing on a computer display.

PPI/DPI- Technically speaking, PPI (pixels-per-inch) is the way that image resolution is properly described; it affects the size and quality of the image. DPI (dots-per-inch) is better suited to describing the resolution of printers and printed output. PPI and DPI are often used interchangeably.

Resolution- The quality of a digital image is partially dependent on the initial scanning resolution. Resolution is expressed in the number of pixels used to represent an image (DPI/PPI) . The higher your image resolution the larger your file size will be.Optical resolution is the actual resolution that digitization equipment (scanner, digital camera) is capable of capturing. Interpolation is the computer filling in or guessing to make up the resolution between what can actually be captured and what is being requested. Interpolation is rarely recommended when scanning, but works well for printing images for large posters.

Compression- Compression is the reduction of image file size for processing, storage, and transmission. The quality of the image may be affected by the compression techniques used and the level of compression applied. In selecting a compression technique, it is necessary to consider the attributes of the original object. Some compression techniques are designed to compress text; others are designed to compress pictures. There are two types of compression: Lossless Compression and Lossy Compression.

  • Lossless compression allows the original data to be reconstructed from the compressed data with no loss of information.
  • Lossy compression results in a smaller file size, but uses inexact approximations and partial data discarding to represent the content.

Color Depth - (also known as Bit Depth) The number of possible shades or tonal gradations that a color can have from black to white. A bitonal image is 1-bit (2 1 , or 2 colors). Grayscale images are typically 8-bit (2 8 or 256 values). Color images are typically 24-bit (3 colors, 8-bits per color, 16 million values).

  • Bitonal - (bilevel, binary, or 1-bit) Bitonal means that each pixel in the image file can only have one of two tonal values, black or white (the tonal value can be stored in one bit of digital data, hence 1-bit or binary). Bitonal images are easier for OCR software to interpret. Because of the limited color range, bitonal images are dramatically smaller than a grayscale or color files.
  • Grayscale - a black-and-white form of continuous tone imagery. Unlike bitonal images, where one two tonal values can be described, grayscale images are (typically) composed of 256 shades of gray (2 8 or 8-bit), varying from black at the weakest intensity to white at the strongest. High-end scanners are capable of capturing 12-bit (2 12 ) and 16-bit (2 16 ) grayscale. Grayscale images are also called monochromatic, as they only capture one channel of color.
  • Color - ('truecolor') The representation of color images on a monitor is done with the RGB (red-green-blue) color model. Whereas grayscale uses one color channel, color images use 3 channels (one each for red, green, and blue). Typically, each color channel has 8 bits or 256 values from darkest to lightest, resulting in 24-bit color. On Macintosh computers, 24-bit color is referred to as "millions of colors" because 256 x 256 x 256 = 16,777,216 possible color combinations. As with grayscale, high-end scanners can also capture 36-bit (12 bits per channel) and 48-bit color (16 bits per channel).

Metadata- Usually defined as "data about data" is used to describe an object (digital or otherwise), its relationships with other objects, and how the object has been and should be treated over time. A structured format and a controlled vocabulary, which together allow for a precise and comprehensible description of content, location, and value, are its basic elements. Metadata often includes items like file type, file name, creator name, date of creation, and the record’s classification.

OCR- Is the process of electronically translating a scanned image of text material into machine readable text. A program will read the character content within the image and create a digital version of the text.This allows the text to be searched and indexed, or used in other processes. The accuracy of the OCR depends on a number of factors including image quality.


This web site is provided as a courtesy of the Maryland State Archives. As you develop your records management program, you should consult with your agency’s Records Officer.


This web site is presented for reference purposes under the doctrine of fair use. When this material is used, in whole or in part, proper citation and credit must be attributed to the Maryland State Archives. PLEASE NOTE: The site may contain material from other sources which may be under copyright. Rights assessment, and full originating source citation, is the responsibility of the user.


Tell Us What You Think About the Maryland State Archives Website!



© Copyright December 04, 2024 Maryland State Archives