We already use python opencv2 to convert the image to remove color and do 
binarisation. I also tried to use erosion, but it showed no marked 
improvement. Now for this particular image it would be easy to remove the 
left side, but it is merely a sample and the text can occur in any part of 
the image in the actual application we are building. When you say OCR only 
text areas, does that mean you can run tesseract once in a different page 
segmentation mode to just create a bounding box, then run it again to 
actually get the text accurately?

On Friday, October 22, 2021 at 12:56:51 AM UTC-4 zdenop wrote:

> Generally: read and follow 
> https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md
>
> Basically: pre-process image: remove not text element, or OCR only text 
> areas (search internet for "text detection")
>
> Zdenko
>
>
> št 21. 10. 2021 o 23:34 Schuyler Reinken <xarl...@gmail.com> napísal(a):
>
>> I'm using the english tessdata_best on linux
>>
>> On Thursday, October 21, 2021 at 5:32:17 PM UTC-4 Schuyler Reinken wrote:
>>
>>> I am using tesseract 4.1.1 and the results on this Image are as follows:
>>> -----------------------------------------------------
>>> roan
>>> nian
>>> Er
>>> Preferred i)
>>> PRODUCED & wa
>>> SPRINGGATES
>>> FARMS AND VINEYARD
>>> Le
>>> 1
>>> Tome Son a Woon
>>> Hui Sov vet Aoinii
>>> BEVERAGES UF
>>> a i od oR De pa 1
>>> primi ett
>>> ‘OPERATE MACHNERY, AND MAY CAUSE
>>> 375 mL 7% ALC BY VOL REATH PROBES. COMANSSUFTES
>>> Jon 2 To 5 GIP \Y » ) SIR VW, T=" Wa COO pn a TEES gemma
>>>
>>> -------------------------------------------------------------------------------------------------------------
>>> On Friday, October 15, 2021 at 10:30:10 AM UTC-4 Schuyler Reinken wrote:
>>>
>>>>
>>>> Hello! I am having trouble using Tesseract to read inconsistently 
>>>> spaced text.
>>>>
>>>> It tends to miss entire lines of text in the government warning in 
>>>> image attached. I don't need to read the blue angled text, only the stuff 
>>>> on the white sidebar. Is there a way to improve it's reading of this sort 
>>>> of image?
>>>> [image: SPRING GATE VINEYARD_a.jpg]
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/123a18f9-c281-4063-b197-45a9a35e6090n%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/123a18f9-c281-4063-b197-45a9a35e6090n%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/dfaeda97-e182-4553-ba02-72a6aa8d7fa7n%40googlegroups.com.

Reply via email to