Technical Difficulties in the Egyptian Gazette

Oct 6, 2016 1 min read technical

Encoding the Egyptian Gazette has been a learning experience filled with obstacles.

When I first produced my image scans using the library’s microfilm of the Egyptian Gazette, I thought that by feeding the material into the OCR software, my computer would automatically spit back the entire transcription completed error-free. At first I attempted to use FineReader on my roommate’s laptop, hoping it would yield results, but when her computer began to lag and I realized how time consuming the process would be, I switched to using my Mac.

When the class discovered Cisdem PDF Converter for Mac, I quickly realized that while some pages contained minimal errors, others required extensive corrections to yield an accurate reading of the text. The pages, most of which were rife with errors, took around 2 and a half hours each to fully type and XML, and even then many advertisements and charts are still not completed.

After I had completed the vast majority of the transcription, I learned that by scanning in a higher resolution and stitching multiple images together, I probably could have produced the paper and a much faster rate.

Nevertheless, I’m continuing to work on completing certain advertisements and charts, as well as TEI-tagging people and places.

XML OCR encoding

Technical Difficulties in the Egyptian Gazette

Celita Summa

Student