Why can't some of the date characters in a PDF be recognized correctly?

Vistas:

Problem:
You have PDF documents where text characters are being read correctly but a date is not being read correctly. The forward slashes are being read as 't' or some other character.

For example the date in the PDF looks like this: 1/20/2016
DocuWare is interpreting the date as: 1t20t2016

It is possible that the PDF has already been OCR'd by another process. MFP's can be set to do an OCR while creating the PDF making the PDF searchable.
This places a Text shot layer in the PDF. When DocuWare detects this Text Shot layer it does not re-OCR the document. It simply reads the Text Shot.
If the process that created the Text Shot layer did it incorrectly, then DocuWare will simply pass the error on. It cannot correct it.

Solution:
The first step in investigating problems with OCR is to ascertain whether the document has the Text Shot layer. This can be accomplished by opening the document in a PDF reader such Fox-It or Acrobat.
Perform a search for a word in the document. If the search returns no result then the PDF is not searchable.

If it does return a result then the PDF is searchable.

If it is searchable then you should now search for the problem date.
In this example we search for 1/20/2016.

There is no result returned because the Text Shot is incorrect.
Prove this by seaching for 1t20t2016 - and it will find the date in the document.

The only solution is to resolve the OCR errors in the process that is performing the OCR, or turn it off completely and send the documents to DocuWare as non-searchable PDF's. DocuWare will then perform its own OCR.
For existing documents already delivered to DocuWare you will need to enter the incorrect data manually

Este artículo es válido para las versiones de DocuWare: 6.5 6.6 6.7 6.8 6.9 6.10 | OCR PDF text shot date #FAQID_4338 web client import

Why can't some of the date characters in a PDF be recognized correctly?

Obtenga ayuda