Appalachian College Association

Scanning

ALICE Digitization Committee

Digital Library of Appalachia Project

SCANNING GUIDELINES

Scanning is used to create a digital surrogate of most two-dimensional materials, including printed documents, manuscripts, drawings, and photographs.  In selecting the appropriate scanning procedure, the archivist must make a determination on whether it is important to maintain the look and feel of the original, and whether use of the material may be improved through various modifications.  In some cases, the library may decide to make multiple surrogates of a single item to serve various purposes.  Included here are procedures for (A.)creating text files,  (B.)creating files that preserve the original format of a document, (C.)creating files that enhance or manipulate the original format, and (D.)creating image files.

These guidelines are meant to suggest a process a library could follow in scanning resources for inclusion in the Digital Library of Appalachia.  Individual users will have to take time to acquaint themselves with the features of project software so that they may respond to particular problems.  However, the following procedures should be adequate for the majority of scanning work undertaken by ACA libraries, and may be adapted for local use. 

A.  For text documents where original format is not significant.

This procedure creates a text file of documents through optical character recognition (OCR).  The pilot project provided TextBridge Pro OCR software.  Text files are smaller than image files, and the full text becomes keyword searchable.  Use this scanning procedure for documents without pictures, and for documents where the formatting of the text does not add significant editorial meaning. 

1.  Make sure computer and scanner are properly connected and operational.  Place first page of document in the scanner.  Open TextBridge software.   TextBridge provides a scanning wizard program under the Process menu.  The wizard is recommended for most applications.

2.  Use the default settings for layout (any page) and type (any type) as offered by the wizard.  Do not retain pictures (for example advertisements on a magazine page that is otherwise just text) or formatting.  Do not ask the program to proofread (it is easier to do so in a word processor).  When the wizard is finished, TextBridge will initiate a pre-scan of the document in the scanner. 

3.  Check settings for the scanner.  Image type should be set to Text (Background Removal) and Destination should be set to OCR.  Resolution should be set to 300 dpi. 

4.  Once settings are established, select Preview from the scanner program.  The scanner will once again pre-scan the document.  After this preliminary scan, click and drag with the mouse to select the page outline.  Select scan from the scanner program.

5.  After scanning of one page is complete, the program provides opportunity to scan additional pages.  Click on the Next Page button and repeat step 4 until the document is entirely scanned.  After the final page, select No More Pages.

6.  TextBridge saves the scanned document as Rich Text File (.rtf) by default.  Give the document an appropriate name, and save to a temporary location.  Close TextBridge.

7.  Open the saved .rtf document in a word processor such as Microsoft Word.  While the OCR program will have converted most of the text into a usable document, some proofreading and clean-up work is typically required.  This is especially true when the program encounters special formatting such as drop caps at the beginning of paragraphs.   Most word processing programs will help identify misspelled words and other problems that occur in the OCR process.  When cleanup is complete, save the revised file as text. 

8.  Optional.  The text file can be converted to a .pdf file if desired, and if Adobe Acrobat is installed and operational.  This allows for a universal delivery platform, and is the preferred format in the Digital Library of Appalachia.  To create a .pdf file from text, simply use the Print to Acrobat Distiller option on the FilePrint menu of the word processor. 

B.  For multi-page text documents where the original format is important, and where images are mixed with text.

 This procedure creates portable document format (.pdf) files of documents with Adobe Acrobat software provided in the pilot project.  These files can become quite large, and keyword searching may cumbersome.  Use this scanning procedure for holograph manuscripts, documents with pictures, and for documents where the formatting of the text does add significant editorial meaning.  Single page documents may also be scanned as image files, as described in guideline C below.

<!--[if !supportEmptyParas]--> <!--[endif]-->

1.  Make sure computer and scanner are properly connected and operational.  Place first page of document in the scanner.  Open Adobe Acrobat software.   Check the settings by selecting Tools from the top menu bar, then Distiller.  The job options should be set to ebook.  Click on the Settings tab, and under Job Options verify that the box for Optimize for fast web view is checked.

<!--[if !supportEmptyParas]--> <!--[endif]-->

2.  Start the scanning process from the File menu by selecting Import, then Scan. 

<!--[if !supportEmptyParas]--> <!--[endif]-->

3.  Select the device (your scanner), the format (single sided works for most documents), and the destination (new .pdf document) from the Acrobat scan window.  Click on Scan.

4.   Acrobat will initiate a pre-scan of the document.  Adjust the settings for the scanner.  The image type will most often be Color Document or Black & White Document.  You may wish to scan a color document in black & white to reduce file sizes.  Change resolution to 400 dpi for pages with photo images.  Pages with text and line drawings may be scanned at 350 dpi.  Avoid skew by placing the originals squarely on the scanner.  Rescan a skewed image rather than rotating it after scanning.

5.  Once scanner settings are established, click and drag with the mouse over the preview image to select the page outline.  Select Scan from the scanner program.

6.  After scanning of one page is complete, the program provides opportunity to scan additional pages.  Click on the Next button and repeat step 5 until the document is entirely scanned.  After the final page, select Done.

7.  The document will display in the Acrobat window on the screen.  Page through the document to verify that all pages are present and properly scanned.  Pages may be deleted, inserted, or otherwise modified for correction using the options on the Document menu from the top menu bar. 

8.  Compress the file with the Distiller.  From the File menu, select Print, and select Acrobat Distiller as the printer.  Assign an appropriate name and location to the file when prompted.

C.  For text documents where some manipulation of the image is important.

This procedure creates portable document format (.pdf) files of documents with Adobe Photoshop and Acrobat software provided in the pilot project.  These files can become quite large, and keyword searching may cumbersome.  Use this scanning procedure for documents that require some manipulation.  For example, newspaper text may need to have images sharpened or contrast adjusted to improve legibility.  Some text may have foxing or staining to pages, and require additional touch-up.

1.  Make sure computer and scanner are properly connected and operational.  Place first page of document in the scanner.  Open Adobe Photoshop software.   Note:  At least one library preferred Adobe ImageReady to Photoshop because of accuracy of cross-hairs in selection tool.

2.  Start the scanning process from the File menu by selecting Import, then Epson Twain (or name of scanner in use).

3.  Photoshop will initiate a pre-scan of the document.  Adjust the settings for the scanner.  The image type will most often be Color Document or Black & White Document.  Change resolution to 400 dpi for pages with photo images.  Pages with text and line drawings may be scanned at 350 dpi.  Avoid skew by placing the originals squarely on the scanner.  Rescan a skewed image rather than rotating it after scanning.

4.  Once scanner settings are established, click and drag with the mouse over the preview image to select the page outline.  Click on Scan from the scanner program.

5.  When page has been scanned, close scanner software.  Use Photoshop tools to manipulate image as needed. 

6.  To scan additional pages, repeat steps 2-5.  All pages will remain visible in Photoshop as separate documents.

7.  Begin creating Acrobat file from Photoshop by selecting image window with the first page of the document.  From the File menu, print this page to Acrobat Distiller.  Assign an appropriate name and location to save the file as prompted.

8.  Close the first page image in Photoshop.  Do not save changes here they are saved in Acrobat.  Select the second page image in Photoshop and print this page to Acrobat Distiller as above.  Assign it a temporary name and save to the desktop.  Repeat for any additional pages.

9.  Assemble pages into a single file in Acrobat.  Open the .pdf file of the first page of the document as saved in step 7.  From the Document menu on the top menu bar, select Insert Pages.  Select the file for the second page temporarily saved to the desktop in step 8.  Repeat for any additional pages.

10.  Once all pages have been added, save .pdf file with appropriate name and location.

D.  For photographs, illustrations, maps, and other items that are chiefly images.

This procedure creates image files (.jpg and .tif) of items with Adobe Photoshop (or alternatively Image Ready) software provided in the pilot project.  These files can become quite large, and are usually prepared for web delivery in a reduced .jpg format..  Use this scanning procedure for pictures that require some manipulation.  For example, photographs may need adjustments in contrast, brightness, and color levels.

1. Make sure computer and scanner are properly connected and operational.  Place document in the scanner.  Open Adobe Photoshop software.   Note:  At least one library preferred Adobe ImageReady to Photoshop because of accuracy of cross-hairs in selection tool.

2.  Start the scanning process from the File menu by selecting Import, then Epson Twain (or name of scanner in use).  [For pictures taken with digital camera, use the FileOpen command and proceed to step 5.]

3.  Photoshop will initiate a pre-scan of the document.  Adjust the settings for the scanner.  The image type will most often be Color Photograph or Black & White Photograph.  Change resolution to 600 dpi. Line drawings may be scanned at 400 dpi.  We start with a fairly high resolution scan, because it is always possible to resample an image downward to decrease file size, but resolution cannot be improved without rescanning.  Avoid skew by placing the originals squarely on the scanner.  Rescan a skewed image rather than rotating it after scanning.

4.  Once scanner settings are established,  click and drag with the mouse to select the page outline.  Click on Scan from the scanner program.

5.  When page has been scanned, close scanner software.  Use Photoshop tools to manipulate image as needed.  Most frequently used options are available from the Image menu on the top menu bar.

6.  Save the file as in .tif format, with an appropriate name and location, for archival purposes.

7.  Reduce the file for web delivery.  From the Image menu, select Image Size and click on the Auto option.  Choose Best quality for the image.

8.  The image will appear re-sized in Photoshop.  From the File menu, select Save for Web.  This will bring up an alternative view of the image in Photoshop as it will appear in .jpg format.  Settings for the image should be set at JPEG High.  Again, assign an appropriate name and location.  This is the file to be used in the DLA database.