Tips for getting great OCR conversions

As impressive as Optical Character Recognition is, it’s not perfect and some care has to be taken in order to get the results you expect. Similarly to the upfront prep work needed to convert your PDF to Excel, there are also a few unwritten rules for converting scanned files. Below you’ll find a handy checklist that applies equally well to PDF to DOCX and PDF to XLSX.

To optimize your document for the purposes of OCR:

  • Manually adjust page rotation where necessary. In addition to making the final document easier to read, this will also improve the accuracy of the extracted text.

  • Use high-resolution images. The images should ideally be in PNG format and readable without too much eye strain, but JPEGs work just as well. The clearer the image, the better the conversion result.

  • Include formatting that closely matches the output format (e.g. tables that resemble the formatting in Excel). This is probably the biggest factor in ensuring the faithful recreation of your original PDF into either DOCX or XLSX.