OK, so EasyOCR is using CRAFT for text detection 
(https://pypi.org/project/craft-text-detector/), and it gives much better 
results for my image. Here is the image with bounding boxes from 
CRAFT: 
https://github.com/apismensky/ocr_id/blob/main/outputs/AR_text_detection.png
And it also produces a folder with bunch of crops of the original 
image: https://github.com/apismensky/ocr_id/tree/main/outputs/AR_crops
which could be feed to tesseract, using psm=7, which gives an output: 
crop_0.png:     5ARKANSAS DRIVER’S LICENSE
crop_1.png: 
crop_2.png:     9¥ CLASS LD
crop_3.png:     4a DLN. 999999999: pos 03/05/1960
crop_4.png: 
crop_5.png:     1 SAMPLE
crop_6.png:     2NICK
crop_7.png: 
crop_8.png:     8123 NORTH STREET
crop_9.png:     CITY, AR 12345
crop_10.png:     4bEXP
crop_11.png:     4aiss
crop_12.png:     03/05/2026 \/"— \
crop_13.png:     03/05/2018
crop_14.png:     1SSEX 16HGT
crop_15.png:     18 EYES
crop_16.png:     5'-10*
crop_17.png:     M
crop_18.png:     BRO
crop_19.png:     9a END NONE
crop_20.png:     12 RESTR NONE
crop_21.png:     Vick Cample
crop_22.png:     5 DD 8888888888 1234
CRAFT + tesseract result:     5ARKANSAS DRIVER’S LICENSE      9¥ CLASS LD   
  4a DLN. 999999999: pos 03/05/1960      1 SAMPLE     2NICK      8123 NORTH 
STREET     CITY, AR 12345     4bEXP     4aiss     03/05/2026 \/"— \     
03/05/2018     1SSEX 16HGT     18 EYES     5'-10*     M     BRO     9a END 
NONE     12 RESTR NONE     Vick Cample     5 DD 8888888888 1234
which is waaaayyy better than when tesseract is trying to detect bounding 
boxes itself. 
The whole script is here: 
https://github.com/apismensky/ocr_id/blob/main/craft.py
I'm also using psm=0 to detect image rotation angle and fix rotation before 
applying CRAFT

Would it be possible to use CRAFT in tesseract for bounding boxes? 

On Tuesday, September 5, 2023 at 9:32:56 AM UTC-6 Alexey Pismenskiy wrote:

> Hai, could you please tell me what you are doing for pre-processing? 
> Do you have any source code you can share? 
> Are those results consistently better for images scanned with different 
> quality (resolution, angles, contrast etc)? 
>
>
> On Monday, September 4, 2023 at 2:02:27 AM UTC-6 nguyenng...@gmail.com 
> wrote:
>
>> Hi, 
>> I would like to hear other's opinions on your questions too. 
>> In my case, when I try using Tesseract for Japan train tickets, I have to 
>> do a lot of steps for preprocessing (remove background colors, noise + line 
>> removal, increase contrast,  etc.) to get satisfactory results. 
>> I am sure what you are doing (locating text boxes, extracting them, and 
>> feeding them one by one to tesseract) can get better accuracy results. 
>> However, when the number of text boxes increases, it will undoubtedly 
>> affect your performance. 
>> Could you share the PSM mode for getting those text boxes' locations ?  I 
>> usually use the AUTO_OSD to get the boxes and expand them a bit at the 
>> edges before passing them to Tesseract. 
>>
>> Regards
>> Hai
>>  
>> On Saturday, September 2, 2023 at 7:03:49 AM UTC+9 apism...@gmail.com 
>> wrote:
>>
>>> I'm looking into OCR for ID cards and drivers licenses, and I found out 
>>> that tesseract performs relatively poor on ID cards, compared to other OCR 
>>> solutions. For this original image: 
>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png 
>>> the results are: 
>>>
>>> tesseract: "4d DL 999 as = Ne allo) 2NICK © , q 12 RESTR oe } lick: 5 DD 
>>> 8888888888 <(888)%20888-8888> 1234 SZ"
>>> easyocr:  '''9 , ARKANSAS DRIVER'S LICENSE CLAss D 4d DLN 999999999 3 
>>> DOB 03/05/1960 ] 2 SCKPLE 123 NORTH STREET CITY AR 12345 ISS 4b EXP 
>>> 03/05/2018 03/05/2026 15 SEX 16 HGT 18 EYES 5'-10" BRO 9a END NONE 12 RESTR 
>>> NONE Ylck Sorble DD 8888888888 1234 THE'''
>>> google cloud vision: """SARKANSAS\nSAMPLE\nSTATE O\n9 CLASS D\n4d DLN 
>>> 9999999993 DOB 03/05/1960\nNick Sample\nDRIVER'S LICENSE\n1 SAMPLE\n2 
>>> NICK\n8 123 NORTH STREET\nCITY, AR 12345\n4a ISS\n03/05/2018\n15 SEX 16 
>>> HGT\nM\n5'-10\"\nGREAT SE\n9a END NONE\n12 RESTR NONE\n5 DD 8888888888 
>>> 1234\n4b EXP\n03/05/2026 MS60\n18 EYES\nBRO\nRKANSAS\n0"""
>>>
>>> and word accuracy is:
>>>
>>>              tesseract  |  easyocr  |  google
>>> words         10.34%    |  68.97%   |  82.76%
>>>
>>> This is "out if the box" performance, without any preprocessing. I'm not 
>>> surprised that google vision is that good compared to others, but easyocr, 
>>> which is another open source solution performs much better than tesseract 
>>> is this case. I have the whole project dedicated to this, and all other 
>>> results are much better for easyocr: 
>>> https://github.com/apismensky/ocr_id/blob/main/result.json, all input 
>>> files are files in 
>>> https://github.com/apismensky/ocr_id/tree/main/images/sources
>>> After digging into it for a little bit, I suspect that bounding box 
>>> detection is much better in google (
>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_google/AR.png) 
>>> and easyocr (
>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png), 
>>> than in tesseract (
>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_tesseract/AR.png).
>>>  
>>>
>>> I'm pretty sure, about this, cause when I manually cut the text boxes 
>>> and feed them to tesseract it works much better. 
>>>
>>>
>>> Now questions: 
>>>
>>> - What is the part of the codebase in tesseract that is responsible for 
>>> text detection and which algorithm is it using? 
>>> - What is impacting bounding box detection in tesseract so it fails on 
>>> these types of images (complex layouts / background noise... etc)
>>> - Is it possible to use the same text detection procedure as easyocr or 
>>> improve the existing one?  
>>> - Maybe possible to switch text detection algo based on the image type 
>>> or make it pluggable where user can configure from several options A,B,C...
>>>
>>>
>>> Thanks. 
>>>
>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/31110930-d356-42f4-a921-5ca5a62444f8n%40googlegroups.com.

Reply via email to