Posted Thu, 07 Jun 2018 19:13:30 GMT by Joe Kaufman Bell Laboratories Inc Senior System Architect

Hey all,


Simple question: What is the best Windows font on paper that when scanned into a DocuWare document stored to a full-text cabinet the full-text comprehensively and accurately converts the text to a searchable form? This is probably a fairly generic OCR question, but does one use serif, sans serif, proportion, non-proportional...? Is there some font best suited for getting scanned in most accurately (not counting barcodes, of course)?

We are getting a little lazy on some scan sheets for ancillary information (multiple fields, hard to index, etc.), but I want to still have the full-text index things as well as possible when we turn on full-text for this cabinet. This is nothing mission critical, just want to take the best shot possible from the outset.



Joe Kaufman

Posted Fri, 08 Jun 2018 15:59:40 GMT by Casey Miller Solutions Manager

I like to use Sans Serif. You don't have any end strokes on the letters. Sometimes this can confuse the engin into thinking an I is a 1 or so forth. but honestly, if you have any kind of mixed variables (letter/number) then the DW OCR engine is not going to get it right. Example PO12345. 90% of the time it will read P01234. It's maddening and haven't figured out how to ever correct it.

Posted Fri, 08 Jun 2018 16:43:25 GMT by Joe Kaufman Bell Laboratories Inc Senior System Architect


Good point about what serifs can do... I am not so worried about distinct ID/key fields, as those we will barcode (only three values). We aren't even storing the other data in indexes, just have full-text turned on. 


I may go with Courier New (even with the serifs) because it is so easy to read with the human eye and I can control layout better since each character is the same width. Though, you are right that an invoice number might have a mix of zeroes and O's. Have you done any sort of scientific test on what gets read better, or is it just from years of experience that you know sans-serif works better?


I might keep our large descriotion field (just regular words) as Courier but switch the detail listing of vendor codes and invoice numbers to a non-proportional sans serif font of some kind...




Posted Fri, 08 Jun 2018 17:00:30 GMT by Casey Miller Solutions Manager

Working with ABBYY for a while now and after a lot of testing the Consolas font was the most effective and accurate. It is a monospaced unambiguous font that was designed to be machine-readable but easy to read with the human eye. A more common free font that is close to this is Calibri. 

Posted Fri, 08 Jun 2018 17:04:51 GMT by Joe Kaufman Bell Laboratories Inc Senior System Architect

Just read a similar comment over on Stack Overflow -- thanks again!



Joe Kaufman

Posted Wed, 13 Jun 2018 06:26:05 GMT by Simon H. Hellmann Wedderhoff IT GmbH Systemadministrator

Hello Joe,

I mostly use Arial or Arial Black without problems. Frutiger also seems to work nice.

The Consolas Idea sounds good, I will try that out.

Greetings from Germany,

Simon H. Hellmann

DocuWare System Consultant

Posted Thu, 28 Jun 2018 13:55:10 GMT by Craig Williams President/ CEO

Have you tried OCR-A, OCR A Extended, OCR-B from Seagull?

Posted Thu, 28 Jun 2018 14:05:44 GMT by Joe Kaufman Bell Laboratories Inc Senior System Architect


No, I just went with Courier... The OCR fonts are a little to non-human for my taste, and the fields I listed were just as important for being human-readable as they were getting full-texted properly.



Joe Kaufman

You must be signed in to post in this forum.