Thanks for sharing Hai 
Looks like CRAFT can detect regions despite the 
background: 
https://github.com/apismensky/ocr_id/blob/main/images/boxes_craft/black_background_text_detection.png.
 
It also creates cuts for each text region which can be OCR-ed separately 
and then joined together as a result.
When I ran your example 
with https://github.com/apismensky/ocr_id/blob/main/ocr_id.py I've got the 
following output: 

CRAFT + crop result: 下記 の と お り 、 御 請求 申し 上 げ ま す 。 株 式 会 社 ス キャ ナ 保 存 スト 
レー ジ プ ロ ジェ クト 件 名 T573-0011 2023/4/30 大 阪 市 北 区 大 深町 3-1 支払 期限 山口 銀行 本 店 
普通 1111111 グラ ン フ ロン ト 大 阪 タ ワーB 振込 先 TEL : 06-6735-8055 担当 : ICS 太 郎 
66,000 円 (税込 ) a 摘要 数 重 単位 単 価 金額 サン プル 1 1 式 32,000 32,000 サン プル 2 1 式 
18000 18,000 2,000 2,000' 8g,000' 2,000
crop word_accuracy: 48.78048780487805

I've tried to create a map of boxes using .uzn files and pass it to 
tesseract, but results are worse: 
CRAFT result: 下記 の と お り 、 御 請求 申し 上 げ ま す 。

株 式 会 社 ス キャ ナ 保 存

スト レー ジ プ ロ ジェ クト

〒573-0011

2023/4/30

大 阪 市 北 区 大 深町 3-1

山口 銀行 本 店 普通 1111111

グラ ン フ ロン ト 大 阪 タ ワーB

TEL : 06-6735-8055

担当 : ICS 太 郎

66,000 円 (税込 )

サン プル 1

1| 式

32,000

32,000

サン プル 2

1| 式

18000

18,000

2,000

2,000.

8,000

8,000

craft word_accuracy: 36.58536585365854. 

Apparently 金額 is not there; 
Sorry, my Japanese is little bit rusty :-) 
I have an impression that when I pass the map with .uzn text regions to 
tesseract it applies one transformation to pre-process the image, but when 
I'm passing each individual images it preprocess it separately, applying 
the best strategy for each region? Of course it is slower this way. 
 
On Wednesday, September 6, 2023 at 7:07:52 PM UTC-6 nguyenng...@gmail.com 
wrote:

> Hi Apismensky,
>
> Here are the code and sample I used for preprocessing, I extracted the 
> ticket region of the train ticket from a picture taken by a smartphone. 
> Since the angle, distance, brightness, and many other factors can change 
> the picture quality. 
> I would say scanned images or fixed-position camera-taken images have more 
> consistent quality. 
>
> Here is the original image:
>
> [image: sample_to_remove_lines.png]
>
> # TRy to remove lines
> org_image = cv2.imread("/content/sample_to_remove_lines.png")
> cv2_show('org_image', org_image)
> gray = cv2.cvtColor(org_image,cv2.COLOR_BGR2GRAY)
>
> thresh = cv2.threshold(gray, 0, 255, cv2.THRESH_BINARY_INV + 
> cv2.THRESH_OTSU)[1]
> cv2_show('thresh Otsu', thresh)
>
>
> # removing noise dots.
> opening = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, np.ones((2,2),np.uint8
> ))
> cv2_show('opening', opening)
>
> thresh = opening.copy()
> mask = np.zeros_like(org_image, dtype=np.uint8)
>
> # Extract horizontal lines
> horizontal_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (60 ,1))
> remove_horizontal = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, 
> horizontal_kernel, iterations=2)
> cnts = cv2.findContours(remove_horizontal, cv2.RETR_EXTERNAL, 
> cv2.CHAIN_APPROX_SIMPLE)
> cnts = cnts[0] if len(cnts) == 2 else cnts[1]
> for c in cnts:
>     cv2.drawContours(mask, [c], -1, (255, 255, 255), 8)
> # cv2_show('mask extract horizontal lines', mask)
>
> # Extract vertical lines
> vertical_kernel = cv2.getStructuringElement(cv2.MORPH_RECT, (1,70))
> remove_vertical = cv2.morphologyEx(thresh, cv2.MORPH_OPEN, vertical_kernel
> , iterations=2)
> cnts = cv2.findContours(remove_vertical, cv2.RETR_EXTERNAL, 
> cv2.CHAIN_APPROX_SIMPLE)
> cnts = cnts[0] if len(cnts) == 2 else cnts[1]
> for c in cnts:
>     cv2.drawContours(mask, [c], -1, (255, 255, 255), 8)
>
> cv2_show('mask extract lines', mask)
>
> result = org_image.copy()
> # Loop through the pixels of the original image and modify based on the 
> mask
> for y in range(mask.shape[0]):
>     for x in range(mask.shape[1]):
>         if np.all(mask[y, x] == 255):  # If pixel is white in mask
>             result[y, x] = [255, 255, 255]  # Set pixel to white
>
> cv2_show("result", result2)
>
> gray = cv2.cvtColor(result2,cv2.COLOR_BGR2GRAY)
> _, simple_thresh = cv2.threshold(gray, 195, 255, cv2.THRESH_BINARY)
> cv2_show('simple_thresh', simple_thresh)
>
>
> in the above code, u can ignore the cv2_show function since it is just my 
> custom method for showing images. 
> You can see that the idea is to remove some noise, remove lines, and then 
> simple-thresh. 
> [image: extracted_lines.png]
>
> [image: removed_lines.png]
>
>
> [image: ready_for_locating_text_box.png]
>
> I would say, from this point, the AUTO_OSD mode of Tesseract PSM can also 
> give the text box for the above picture, it also needs to check with RIL 
> mode (maybe RIL.WORD or RIL.TEXTLINE) to get the right level of textboxes. 
> In my opinion, the same preprocessing methods can only be applied to a 
> certain group of samples. It is in fact very hard to cover all the cases.  
> For example: 
>
> [image: black_background.png]
>
> I found it difficult to locate the text box where the text is white, and 
> the background is dark colors. the black text on the white background is 
> easy to locate and then OCR. I am not sure what are a good method to locate 
> those white texts on the dark background colors.
> I hope to hear your as well as others's suggestions on this matter. 
>
> Regards
> Hai
> On Wednesday, September 6, 2023 at 12:32:56 AM UTC+9 apism...@gmail.com 
> wrote:
>
>> Hai, could you please tell me what you are doing for pre-processing? 
>> Do you have any source code you can share? 
>> Are those results consistently better for images scanned with different 
>> quality (resolution, angles, contrast etc)? 
>>
>>
>> On Monday, September 4, 2023 at 2:02:27 AM UTC-6 nguyenng...@gmail.com 
>> wrote:
>>
>>> Hi, 
>>> I would like to hear other's opinions on your questions too. 
>>> In my case, when I try using Tesseract for Japan train tickets, I have 
>>> to do a lot of steps for preprocessing (remove background colors, noise + 
>>> line removal, increase contrast,  etc.) to get satisfactory results. 
>>> I am sure what you are doing (locating text boxes, extracting them, and 
>>> feeding them one by one to tesseract) can get better accuracy results. 
>>> However, when the number of text boxes increases, it will undoubtedly 
>>> affect your performance. 
>>> Could you share the PSM mode for getting those text boxes' locations ?  
>>> I usually use the AUTO_OSD to get the boxes and expand them a bit at 
>>> the edges before passing them to Tesseract. 
>>>
>>> Regards
>>> Hai
>>>  
>>> On Saturday, September 2, 2023 at 7:03:49 AM UTC+9 apism...@gmail.com 
>>> wrote:
>>>
>>>> I'm looking into OCR for ID cards and drivers licenses, and I found out 
>>>> that tesseract performs relatively poor on ID cards, compared to other OCR 
>>>> solutions. For this original image: 
>>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png 
>>>> the results are: 
>>>>
>>>> tesseract: "4d DL 999 as = Ne allo) 2NICK © , q 12 RESTR oe } lick: 5 
>>>> DD 8888888888 <(888)%20888-8888> 1234 SZ"
>>>> easyocr:  '''9 , ARKANSAS DRIVER'S LICENSE CLAss D 4d DLN 999999999 3 
>>>> DOB 03/05/1960 ] 2 SCKPLE 123 NORTH STREET CITY AR 12345 ISS 4b EXP 
>>>> 03/05/2018 03/05/2026 15 SEX 16 HGT 18 EYES 5'-10" BRO 9a END NONE 12 
>>>> RESTR 
>>>> NONE Ylck Sorble DD 8888888888 1234 THE'''
>>>> google cloud vision: """SARKANSAS\nSAMPLE\nSTATE O\n9 CLASS D\n4d DLN 
>>>> 9999999993 DOB 03/05/1960\nNick Sample\nDRIVER'S LICENSE\n1 SAMPLE\n2 
>>>> NICK\n8 123 NORTH STREET\nCITY, AR 12345\n4a ISS\n03/05/2018\n15 SEX 16 
>>>> HGT\nM\n5'-10\"\nGREAT SE\n9a END NONE\n12 RESTR NONE\n5 DD 8888888888 
>>>> 1234\n4b EXP\n03/05/2026 MS60\n18 EYES\nBRO\nRKANSAS\n0"""
>>>>
>>>> and word accuracy is:
>>>>
>>>>              tesseract  |  easyocr  |  google
>>>> words         10.34%    |  68.97%   |  82.76%
>>>>
>>>> This is "out if the box" performance, without any preprocessing. I'm 
>>>> not surprised that google vision is that good compared to others, but 
>>>> easyocr, which is another open source solution performs much better than 
>>>> tesseract is this case. I have the whole project dedicated to this, and 
>>>> all 
>>>> other results are much better for easyocr: 
>>>> https://github.com/apismensky/ocr_id/blob/main/result.json, all input 
>>>> files are files in 
>>>> https://github.com/apismensky/ocr_id/tree/main/images/sources
>>>> After digging into it for a little bit, I suspect that bounding box 
>>>> detection is much better in google (
>>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_google/AR.png) 
>>>> and easyocr (
>>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_easy/AR.png), 
>>>> than in tesseract (
>>>> https://github.com/apismensky/ocr_id/blob/main/images/boxes_tesseract/AR.png).
>>>>  
>>>>
>>>> I'm pretty sure, about this, cause when I manually cut the text boxes 
>>>> and feed them to tesseract it works much better. 
>>>>
>>>>
>>>> Now questions: 
>>>>
>>>> - What is the part of the codebase in tesseract that is responsible for 
>>>> text detection and which algorithm is it using? 
>>>> - What is impacting bounding box detection in tesseract so it fails on 
>>>> these types of images (complex layouts / background noise... etc)
>>>> - Is it possible to use the same text detection procedure as easyocr or 
>>>> improve the existing one?  
>>>> - Maybe possible to switch text detection algo based on the image type 
>>>> or make it pluggable where user can configure from several options A,B,C...
>>>>
>>>>
>>>> Thanks. 
>>>>
>>>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/524278cc-a7ec-4682-88bd-85f488230172n%40googlegroups.com.

Reply via email to