[tesseract-ocr] Re: Inconsistencies in detection and extraction of text using tesseract

Saanvi Bhagat Fri, 31 May 2024 05:19:36 -0700

In order to improve the results, I have implemented canny edge detection 
and Hough Lines Transform on the images. Then I fed the binarized image to 
the tesseract model.


text = pytesseract.image_to_string(cropped_frame,lang='eng', config =' 
--psm 6 --oem 3')
The results have improved a bit, but are still far from perfect. The 
negative symbols are being omitted, some of them are being misunderstood as 
~. Similarly some decimal points are also being omitted. 22.5 was extracted 
as 225.
On Friday, May 31, 2024 at 1:07:01 PM UTC+5:30 [email protected] wrote:

> Its hard to give opinion withour seeing how you setup tesseract, what PSM 
> did you specify, .. etc?
>
> On Friday 31 May 2024 at 02:34:36 UTC+12 [email protected] wrote:
>
>> I have provided the image from which I am trying to extract text from, 
>> using tesseract ocr (input.jpeg). Along with that, I have also provided the 
>> result or the extracted text from the image. As it can be observed from the 
>> images, the extracted text is not very accurate. Negative symbols have been 
>> omitted, some undesired characters are also there in the extracted text. (I 
>> have marked some of the incorrect results with blue boxes)
>>
>> I have tried to improve the results by preprocessing and bringing changes 
>> in the parameters of the model. I have tried:
>>
>> 1. Binarizing the images
>>
>> 2. HDR processing of the processes
>>
>> Even then, such inconsistencies remain.
>>
>> How to improve the detection and extraction of text in tesseract? I have 
>> also tried paddleocr for the same task. Even then, symbols such as euro, 
>> some negative signs are not being detected.
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/1e3692c2-d38a-46bb-b228-897851c3c8fcn%40googlegroups.com.

[tesseract-ocr] Re: Inconsistencies in detection and extraction of text using tesseract

Reply via email to