Simple question: What is the best Windows font on paper that when scanned into a DocuWare document stored to a full-text cabinet the full-text comprehensively and accurately converts the text to a searchable form? This is probably a fairly generic OCR question, but does one use serif, sans serif, proportion, non-proportional...? Is there some font best suited for getting scanned in most accurately (not counting barcodes, of course)?

We are getting a little lazy on some scan sheets for ancillary information (multiple fields, hard to index, etc.), but I want to still have the full-text index things as well as possible when we turn on full-text for this cabinet. This is nothing mission critical, just want to take the best shot possible from the outset.



I like to use Sans Serif. You don't have any end strokes on the letters. Sometimes this can confuse the engin into thinking an I is a 1 or so forth. but honestly, if you have any kind of mixed variables (letter/number) then the DW OCR engine is not going to get it right. Example PO12345. 90% of the time it will read P01234. It's maddening and haven't figured out how to ever correct it.

Good point about what serifs can do... I am not so worried about distinct ID/key fields, as those we will barcode (only three values). We aren't even storing the other data in indexes, just have full-text turned on. 


I may go with Courier New (even with the serifs) because it is so easy to read with the human eye and I can control layout better since each character is the same width. Though, you are right that an invoice number might have a mix of zeroes and O's. Have you done any sort of scientific test on what gets read better, or is it just from years of experience that you know sans-serif works better?


I might keep our large descriotion field (just regular words) as Courier but switch the detail listing of vendor codes and invoice numbers to a non-proportional sans serif font of some kind...




Working with ABBYY for a while now and after a lot of testing the Consolas font was the most effective and accurate. It is a monospaced unambiguous font that was designed to be machine-readable but easy to read with the human eye. A more common free font that is close to this is Calibri. 

Just read a similar comment over on Stack Overflow -- thanks again!



I mostly use Arial or Arial Black without problems. Frutiger also seems to work nice.

The Consolas Idea sounds good, I will try that out.

Have you tried OCR-A, OCR A Extended, OCR-B from Seagull?

No, I just went with Courier... The OCR fonts are a little to non-human for my taste, and the fields I listed were just as important for being human-readable as they were getting full-texted properly.



