One of the neatest facets of the George Eastman Legacy Collection is the personal and business correspondence of George Eastman. Our vault contains 154 boxes of loose, incoming letters and 40 bound volumes containing Eastman’s outgoing letters in Letterpress format.
Over the years, hundreds of researchers have come to peruse Eastman’s letters as they researched his business sense, love for music, home and gardens, philanthropy, travels, and much more. In order to preserve the fragile Letterpress volumes and aid people in their research, two years ago we embarked upon an ambitious project: scanning the copies of George Eastman’s outgoing letters in high resolution so that they could eventually be OCR’d and thus be word searchable.
With the help of Kirtas Technologies and Iris Resources, the 40 bound volumes were carefully photographed. It was a tedious process as clean, white sheets of paper had to be put behind each thin, translucent page so the blue text could be legible. Once we had our hi-res images (each volume contained either 500 or 1,000 pages), an indispensable volunteer of ours, Peter Thomas, ran the images through Photoshop, getting rid of unneeded space and heightening the contrast so the text could be as easy to read as possible. Then he ran the images through Abbyy, an Optical Character Recognition (OCR) software that “reads” the text and generates word-searchable PDFs.
While Peter spent months doing his part in Photoshop and Abbyy, some volunteers and I spent the same period typing up the handwritten indexes in the back of each volume that Eastman’s secretary, Alice K. Whitney, had created after wrapping up Eastman’s affairs. Now we have a complete, chronological index of every letter that Eastman wrote through his secretaries. (If Eastman personally handwrote a letter to someone, we wouldn’t have a copy – unless it’s been donated to us).
Now when a researcher asks if Eastman wrote to a particular person or company, I can quickly look their name up in our spreadsheet and see if and when Eastman corresponded with them. Getting the appropriate image number from that spreadsheet allows me to pull up the appropriate PDF(s) in seconds, saving time and preserving the original volumes.
Because of the nature of the often-blurry text on the thin paper, the OCR is not perfect. But we’re convinced we did the best we could with current technology available to us. At its best though, we can search the PDFs for particular words like Labrador or Brownie and find when Eastman talks about particular subjects.
We intend to make these searchable letters available online in a yet-to-be-determined format. Stay tuned.