OK, so EasyOCR is using CRAFT for text detection (https://pypi.org/project/craft-text-detector/), and it gives much better results for my image. Here is the image with bounding boxes from CRAFT: https://github.com/apismensky/ocr_id/blob/main/outputs/AR_text_detection.png And it also produces a folder with bunch of crops of the original image: https://github.com/apismensky/ocr_id/tree/main/outputs/AR_crops which could be feed to tesseract, using psm=7, which gives an output: crop_0.png: 5ARKANSAS DRIVER’S LICENSE crop_1.png: crop_2.png: 9¥ CLASS LD crop_3.png: 4a DLN. 999999999: pos 03/05/1960 crop_4.png: crop_5.png: 1 SAMPLE crop_6.png: 2NICK crop_7.png: crop_8.png: 8123 NORTH STREET crop_9.png: CITY, AR 12345 crop_10.png: 4bEXP crop_11.png: 4aiss crop_12.png: 03/05/2026 \/"— \ crop_13.png: 03/05/2018 crop_14.png: 1SSEX 16HGT crop_15.png: 18 EYES crop_16.png: 5'-10* crop_17.png: M crop_18.png: BRO crop_19.png: 9a END NONE crop_20.png: 12 RESTR NONE crop_21.png: Vick Cample crop_22.png: 5 DD 8888888888 1234 CRAFT + tesseract result: 5ARKANSAS DRIVER’S LICENSE 9¥ CLASS LD 4a DLN. 999999999: pos 03/05/1960 1 SAMPLE 2NICK 8123 NORTH STREET CITY, AR 12345 4bEXP 4aiss 03/05/2026 \/"— \ 03/05/2018 1SSEX 16HGT 18 EYES 5'-10* M BRO 9a END NONE 12 RESTR NONE Vick Cample 5 DD 8888888888 1234 which is waaaayyy better than when tesseract is trying to detect bounding boxes itself. The whole script is here: https://github.com/apismensky/ocr_id/blob/main/craft.py I'm also using psm=0 to detect image rotation angle and fix rotation before applying CRAFT
Would it be possible to use CRAFT in tesseract for bounding boxes? On Tuesday, September 5, 2023 at 9:32:56 AM UTC-6 Alexey Pismenskiy wrote: > Hai, could you please tell me what you are doing for pre-processing? > Do you have any source code you can share? > Are those results consistently better for images scanned with different > quality (resolution, angles, contrast etc)? > > > On Monday, September 4, 2023 at 2:02:27 AM UTC-6 nguyenng...@gmail.com > wrote: > >> Hi, >> I would like to hear other's opinions on your questions too. >> In my case, when I try using Tesseract for Japan train tickets, I have to >> do a lot of steps for preprocessing (remove background colors, noise + line >> removal, increase contrast, etc.) to get satisfactory results. >> I am sure what you are doing (locating text boxes, extracting them, and >> feeding them one by one to tesseract) can get better accuracy results. >> However, when the number of text boxes increases, it will undoubtedly >> affect your performance. >> Could you share the PSM mode for getting those text boxes' locations ? I >> usually use the AUTO_OSD to get the boxes and expand them a bit at the >> edges before passing them to Tesseract. >> >> Regards >> Hai >> >> On Saturday, September 2, 2023 at 7:03:49 AM UTC+9 apism...@gmail.com >> wrote: >> >>> I'm looking into OCR for ID cards and drivers licenses, and I found out >>> that tesseract performs relatively poor on ID cards, compared to other OCR >>> solutions. For this original image: >>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png >>> the results are: >>> >>> tesseract: "4d DL 999 as = Ne allo) 2NICK © , q 12 RESTR oe } lick: 5 DD >>> 8888888888 <(888)%20888-8888> 1234 SZ" >>> easyocr: '''9 , ARKANSAS DRIVER'S LICENSE CLAss D 4d DLN 999999999 3 >>> DOB 03/05/1960 ] 2 SCKPLE 123 NORTH STREET CITY AR 12345 ISS 4b EXP >>> 03/05/2018 03/05/2026 15 SEX 16 HGT 18 EYES 5'-10" BRO 9a END NONE 12 RESTR >>> NONE Ylck Sorble DD 8888888888 1234 THE''' >>> google cloud vision: """SARKANSAS\nSAMPLE\nSTATE O\n9 CLASS D\n4d DLN >>> 9999999993 DOB 03/05/1960\nNick Sample\nDRIVER'S LICENSE\n1 SAMPLE\n2 >>> NICK\n8 123 NORTH STREET\nCITY, AR 12345\n4a ISS\n03/05/2018\n15 SEX 16 >>> HGT\nM\n5'-10\"\nGREAT SE\n9a END NONE\n12 RESTR NONE\n5 DD 8888888888 >>> 1234\n4b EXP\n03/05/2026 MS60\n18 EYES\nBRO\nRKANSAS\n0""" >>> >>> and word accuracy is: >>> >>> tesseract | easyocr | google >>> words 10.34% | 68.97% | 82.76% >>> >>> This is "out if the box" performance, without any preprocessing. I'm not >>> surprised that google vision is that good compared to others, but easyocr, >>> which is another open source solution performs much better than tesseract >>> is this case. I have the whole project dedicated to this, and all other >>> results are much better for easyocr: >>> https://github.com/apismensky/ocr_id/blob/main/result.json, all input >>> files are files in >>> https://github.com/apismensky/ocr_id/tree/main/images/sources >>> After digging into it for a little bit, I suspect that bounding box >>> detection is much better in google ( >>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_google/AR.png) >>> and easyocr ( >>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png), >>> than in tesseract ( >>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_tesseract/AR.png). >>> >>> >>> I'm pretty sure, about this, cause when I manually cut the text boxes >>> and feed them to tesseract it works much better. >>> >>> >>> Now questions: >>> >>> - What is the part of the codebase in tesseract that is responsible for >>> text detection and which algorithm is it using? >>> - What is impacting bounding box detection in tesseract so it fails on >>> these types of images (complex layouts / background noise... etc) >>> - Is it possible to use the same text detection procedure as easyocr or >>> improve the existing one? >>> - Maybe possible to switch text detection algo based on the image type >>> or make it pluggable where user can configure from several options A,B,C... >>> >>> >>> Thanks. >>> >> -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/31110930-d356-42f4-a921-5ca5a62444f8n%40googlegroups.com.