[tesseract-ocr] OCR various fields of bank check in TIFF format

Keith Smith Tue, 08 Aug 2023 09:56:33 -0700

Hello,

I have several X9.37 files and would like to use tesseract to OCR the check 
images in TIFF format and compare the OCR results with those fields in the 
X9.37 file.   If the results of my tesseract OCR do not match the values in 
the X9.37 file, then I'd like to flag the check for manual review.


The exact fields which I would like to OCR from the TIFF image are:

* the MICR line fields including routing number, On Us, and Auxiliary On Us;
* the legal check amount (in cursive);
* the courtesy check amount.

I have tried running tesseract 5 as follows:
      tesseract --tessdata-dir tessdata input output
where my "tessdata" directory contains eng.traineddata, mcr.traineddata, 
and ocr.traineddata,
and "input" contains some of my tiff formatted check files.

I have the following questions:

1. This of course simply prints some free-form text to the "*.ocr.txt" 
file.  Is there a standard way of generating output in JSON format similar 
to:

{
   "onUs": "...",
   "auxiliaryOnUs": "...",
   "legalAmount": "...",
   "courtesyAmount": "..."
}

2. Is there a standard way of converting the "legalAmount" to a numeric 
value?

3. The results that I am getting for the MICR line fields are horrible.  
What is recommended for best results?  These checks are E13B format.

4. If I need to do my own training, what is the best way to create the 
ground truth for my use case?

Thank you in advance,
Keith


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/09a1c715-3290-468b-83eb-8690aaabf55en%40googlegroups.com.

[tesseract-ocr] OCR various fields of bank check in TIFF format

Reply via email to