How do I OCR a document?
1. From the Start menu, click Programs, and open Caere
OmniPage Pro 10.0. You may also double click the shortcut icon on your desktop
if you have one.
2. Once the OmniPage screen displays, look at the far left
corner at the top of the screen to be sure that the default setting has the
program ready for Manual OCR (see Figure 1).

Figure 1: Manual OCR button
highlighted
3. Click the Load File button to bring up the scanned
image you need to OCR (see Figure 2).

Figure 2: Load File button
4.
From the Load File dialog box, select the L (l:) drive, the Digitization
folder, the proper project folder, the proper volume folder, the images folder,
then the tiff folder.
5. In the tiff folder, double click the page number of the
article you need to OCR.
For
example, if you need to OCR an article in Kappler that is one page 821 in volume
VI from the Look in box inside the Load File dialog box (see Figure 3)
select the following folders and files: (l:)>Kappler>vol6>images>tiff>v6p0821

Figure 3: Look in box in Load File
dialog box
If you need to load more than one page at a time, see the
Helpful Hints box for this chapter.
6. On the left side of the OmniPage screen is a view of the
loaded files. To the right of this frame is the frame that holds a larger view
of the page you select from the frame on the left of the screen (see Figure 4).

Figure
4: OmniPage screen with Thumbnail View frame and Image View frame visible.
7. Now it is time to "zone" the document. Visualize a box
around the text or image you need and place (don't click anything, yet!) your
mouse cursor in the upper left corner of that box. Now press the left button on
your mouse and hold the button down as you draw a box around the text you need
to OCR.
You can draw more than one box if necessary. You may need to
draw more than one box if you want text in two different areas in the scanned
document in the same document, but you don't want to include all of the text on
that page.
For example, when scanning a document with margin notes, you
might want the text directly above the margin notes and the text to the side of
the margin notes, but you don't want the margin notes yet (for more information
about OCR'ing text with margin notes, read the
next section).
In this case you would draw a box around the text directly
above the margin notes, and a box around the text to the side of the margin
notes (see Figure 5).

Figure
5: Two different zoning boxes around different text
8. Once you have zoned all the necessary text, it is time to
save it. To do this, click the Save as File button (see Figure 6)
at the top of the OmniPage screen, in the same area as the Load File button.

Figure 6: Save as File button
9. After you press the Save as File button, the Save
As dialog box displays. Using the Save in box at the top of the Save As
dialog box, go to the L drive, the proper project folder, the proper volume
folder, then the text folder (see Figure7).

Figure 7: Save As dialog box with
Save in box highlighted.
If
you've been scanning, you've gotten used to going into the images folder, but
that is only for opening and saving images. Now you are going to OCR,
so be sure you save this document in the text folder.
For example, if you had zoned part of page 821 of Volume VI in
Kappler, you should follow this pattern in the Save in box:
(l:)>Digitization>Kappler>vol6>text files
10. After you reach the text files area, you must now name
your document. Name it according to its volume number and page number (a
four-digit number). Using the example from the paragraph above, the file name
would like this: v6p0821.
Note: Also name the
files according to their order on the page. More than one article may fit on one
page for some collections, but only name the ones that start on
that page.
For example, if page 821 had three articles that started on
that page, you would name the first one v6p0821, the second one
v6p0821b, and the third one v6p0821c.
You do not need to include "a" after the first file's
name because we just assume that if there is no letter after the file name, it
is the first one on that page.
If you have text that continues onto the next page, this is
part of the article from the previous page, so do not designate the next
article on the next page as "b."
In
Figure 8 below you can see how the text at the top of the page is
continued from the previous page. Because it is continued, it does not count as
the first article of this page, so do not designate the next article "b." The
next article is actually the first one of the new page, so it will have the
invisible designation of "a."

Figure 8: Text continued onto next
page does not qualify as the first article on the new page.
11.
Now you must designate what type to save your file as. Save your document
as Text Only with Linebreaks (see Figure 9).

Figure 9: Save As dialog box with
Save as type box highlighted
12. Now you can actually save your document, so after
typing in the correct file name and designating it as a Text Only with
Linebreaks document, click OK.
13. After you click OK, two more boxes display before you
actually get to your saved document.
The first one is the OmniPage Pro dialog box (see
Figure 10). Always click yes for this box, or, because yes is
its default setting, you can just press Enter.

Figure
10: OmniPage Pro dialog box with yes button selected
The
second box is the Zoning Instructions dialog box (see Figure 11).
Always click the Use Only Current Zones button, or, because this button
is the program's default setting, you can just press Enter.

Figure
11: Zoning Instructions dialog box with Use only current zones button
selected
14. Your saved document will automatically open in Notepad,
but you may want to close this program and open it another word processing
program such as Microsoft Word is you have more experience with that program.

How do I OCR
text with margin notes?
Some projects such as Kappler have documents with margin
notes. Depending on the length of the article, it may contain from one to more
than 300 margin notes.
We have found that it is sometimes easier to OCR and save the
margin notes in a separate document from the rest of the text, copy and paste
the margin notes into the text document, then OCR the entire document.
The following steps will tell you how to OCR text with margin
notes, but you should already be familiar with OCR'ing in general, so if you
have not already read the
beginning
of this chapter, do so now so you will know the basic procedures and terms in
these steps.
1. In OmniPage, load all the pages of the article with
margin notes.
2. Zone only the margin notes on each page.
3. Click the Save as File button (see Figure12).

Figure
12: Save as File button
4. DO NOT name this file according to its volume number
and the page number the margin notes start on.
You will save all the margin notes for each new article in the
same file, so name this file anything you want, such as junk file.txt,
your name.txt, or margin notes.txt.
5. Type in your file name in the File name area of the
Save As dialog box (see Figure 13).

Figure 13: Save As dialog box
6. Save the file as a Text only w/Linebreaks file type
in the Save as type area of the Save as dialog box (see Figure 13 above).
7. Right click inside each zone, and select Clear from the
right-click menu (see Figure 14).

Figure 14: Right-click menu inside each zone
8. Clear the zones around the margin notes on each page.
9. Zone the rest of the text on each page of the article.
10. Save in the text files folder of the proper project folder
and proper volume folder according to the text's volume number and page number
(and the order in which it appears on that page--b, c, d, and so forth).
11. Open the margin notes file in Word, or some other word
processor.
12. Highlight and copy all of the margin notes. The Copy
function can be found either in the Edit menu (see Figure 15) in the
toolbar, or by clicking the Copy icon (see Figure 16).
 |
 |
Figure
15: Edit menu |
Figure 16: Copy icon |
13. Open the text document that the margin notes belong in.
14. Paste the margin notes into the text document at the
beginning. The Paste function can also be found either in the Edit menu, or by
clicking the Paste icon, next to the Copy icon (see Figure 17).

Figure 17: Paste icon
15. Indicate within the body text of the article where the
margin notes belong.
16. Proof and organize the entire document.
Note: The way you organize
the margin notes within the text may vary from project to project.
|