Veröffentlicht Thu, 07 Jun 2018 19:13:30 GMT von Joe Kaufman Bell Laboratories Inc Senior System Architect

Hey all,

 

Simple question: What is the best Windows font on paper that when scanned into a DocuWare document stored to a full-text cabinet the full-text comprehensively and accurately converts the text to a searchable form? This is probably a fairly generic OCR question, but does one use serif, sans serif, proportion, non-proportional...? Is there some font best suited for getting scanned in most accurately (not counting barcodes, of course)?

We are getting a little lazy on some scan sheets for ancillary information (multiple fields, hard to index, etc.), but I want to still have the full-text index things as well as possible when we turn on full-text for this cabinet. This is nothing mission critical, just want to take the best shot possible from the outset.

 

Thanks!

Joe Kaufman

Veröffentlicht Fri, 08 Jun 2018 15:59:40 GMT von Casey Miller Solutions Manager

I like to use Sans Serif. You don't have any end strokes on the letters. Sometimes this can confuse the engin into thinking an I is a 1 or so forth. but honestly, if you have any kind of mixed variables (letter/number) then the DW OCR engine is not going to get it right. Example PO12345. 90% of the time it will read P01234. It's maddening and haven't figured out how to ever correct it.

Veröffentlicht Fri, 08 Jun 2018 16:43:25 GMT von Joe Kaufman Bell Laboratories Inc Senior System Architect

Casey,

Good point about what serifs can do... I am not so worried about distinct ID/key fields, as those we will barcode (only three values). We aren't even storing the other data in indexes, just have full-text turned on. 

 

I may go with Courier New (even with the serifs) because it is so easy to read with the human eye and I can control layout better since each character is the same width. Though, you are right that an invoice number might have a mix of zeroes and O's. Have you done any sort of scientific test on what gets read better, or is it just from years of experience that you know sans-serif works better?

 

I might keep our large descriotion field (just regular words) as Courier but switch the detail listing of vendor codes and invoice numbers to a non-proportional sans serif font of some kind...

 

Thanks!

JoeK

Veröffentlicht Fri, 08 Jun 2018 17:00:30 GMT von Casey Miller Solutions Manager

Working with ABBYY for a while now and after a lot of testing the Consolas font was the most effective and accurate. It is a monospaced unambiguous font that was designed to be machine-readable but easy to read with the human eye. A more common free font that is close to this is Calibri. 

Veröffentlicht Fri, 08 Jun 2018 17:04:51 GMT von Joe Kaufman Bell Laboratories Inc Senior System Architect

Just read a similar comment over on Stack Overflow -- thanks again!

 

Thanks,

Joe Kaufman

Veröffentlicht Wed, 13 Jun 2018 06:26:05 GMT von Simon H. Hellmann Wedderhoff IT GmbH Systemadministrator

Hello Joe,

I mostly use Arial or Arial Black without problems. Frutiger also seems to work nice.

The Consolas Idea sounds good, I will try that out.

Greetings from Germany,

Simon H. Hellmann

DocuWare System Consultant

Veröffentlicht Thu, 28 Jun 2018 13:55:10 GMT von Craig Williams President/ CEO

Have you tried OCR-A, OCR A Extended, OCR-B from Seagull?

Veröffentlicht Thu, 28 Jun 2018 14:05:44 GMT von Joe Kaufman Bell Laboratories Inc Senior System Architect

Craig,

No, I just went with Courier... The OCR fonts are a little to non-human for my taste, and the fields I listed were just as important for being human-readable as they were getting full-texted properly.

 

Thanks,

Joe Kaufman

Sie müssen angemeldet sein um Beiträge in den Foren zu erstellen.