Publicado Thu, 25 Jul 2019 05:30:27 GMT por Patrick Keough
- I have a customer who wants to store about 12,000 documents and he wants to use DW Import templates to do it.  There are two types of documents; checks and letters.  The checks are related to the letters by a matching account number and dollar amount (required for Stapler job).
- He wants to extract the account number and the dollar amount contained in both.
- I have two questions...
-   FIRST, the 17-digit account number is represented on both document types using hyphens for ease of reading, but the hyphens are not used in the DW file cabinet.  The ACCOUNT NUMBER field in DW is a VARCHAR 20 field with no mask assigned (I guess they just manually type the 17 digits when storing a new document to the file cabinet).  What is the best way to OCR the account number and store it w/o the hyphens?  I cannot find the DW Import template equivalent the way you could configure the DW Windows Desktop Client REC2 module to ignore specified characters.
- I thought perhaps the answer was to create a mask for the field, but nobody seems to understand REGTXT enough to suggest the correct filter syntax.  Not sure this would work, but until I have the correct REGTXT I can't teat it.
- SECOND, the letters contain a dollar amount as part of the text.  The dollar amounts vary from $.nn up through $nnn.nn. The dollar amount is immediately followed by a comma to set the amount off in the sentence to which it belongs.  Again -- in the old DW Windows Desktop REC2 module we could set an OCR zone and have it garb everything between the "$" and the comma -- but no similar function in Web Client DW Import.
- I can use the character count filters because the character counts vary based on the dollar amount.
- I see that there is a substitution function in DW Import template, but I can't figure out how to make it work.
- Any suggestions?
Publicado Thu, 25 Jul 2019 08:49:05 GMT por Phil Robson
Patrick,
You won't be able to remove the hyphens. There simply is not sufficient flexibility to drop those characters wherever they appear. Masking won't help, DocuWare does not alter ingoing data to fit a mask. The data has to be formatted as per the mask otherwise it will fail.
As long as this is not a Cloud installation you may need to consider a very carefully written database trigger to remove the hyphens.

As for the $ Amount, obviously you can only capture that if the value is in the same place on every document, but the OCR does have the ability to extract text between known characters. Maybe we need to see one of these documents to advise further.


Phil Robson
Senior Director Client Services, Americas
Publicado Fri, 26 Jul 2019 15:16:59 GMT por Patrick Keough
The answer was to remove the hyphens in post capture using a MySQL script.

The script I used was:

UPDATE database name SET table name = REPLACE(table name, '-', '');

The first set of quotes inside the parenthetical identifies what I wanted to remove and the second set of parentheticals identifies what I wanted it replaced by, so in this case, remove the hyphen and replace with nothing.

I still have to figure out the correct way to snip OCR using before and after recognized characters,,, not having any luck with it so far.

You must be signed in to post in this forum.