[tesseract-ocr] Re: Tesseract 5.x for Math recognition

Nash Kwmz Thu, 20 Jul 2023 10:55:58 -0700

I tried this method as well. I even tried performing some pre-processing to 
the image to give tesseract a better idea of what's going on but its still 
not working.


On Thursday, July 20, 2023 at 2:18:13 AM UTC-4 tomwi...@gmail.com wrote:

> The code you provided uses Tesseract OCR with a custom configuration (-l 
> eng+equ) to recognize English and mathematical equations (equ) in the 
> image. However, there is a small issue with the code – 
> pytesseract.image_to_string() expects the image in PIL (Python Imaging 
> Library) format, not OpenCV format (NumPy array).
>
> To fix the issue, you can convert the image from OpenCV format to PIL 
> format before passing it to Tesseract. You can use the 
> PIL.Image.fromarray() function to perform this conversion.
>
> Here's the updated code:
> pythonCopy code
> import pytesseract import cv2 from PIL import Image custom_config = r'-l 
> eng+equ' img = cv2.imread("tessa.png") # Convert the image from OpenCV 
> format (NumPy array) to PIL format pil_image = Image.fromarray(img) # 
> Perform OCR using Tesseract and extract text from the image text = 
> pytesseract.image_to_string(pil_image, config=custom_config) print(text) 
>
> Make sure to replace "tessa.png" with the actual path to your image file.
>
> With this code, Tesseract OCR will attempt to recognize both English text 
> and mathematical equations present in the image. The custom_config 
> parameter with the value -l eng+equ instructs Tesseract to use the 
> English and mathematical equation language data for recognition.
>
> Please note that while Tesseract is a powerful OCR engine, recognizing 
> complex mathematical expressions accurately might be challenging. If you 
> encounter issues with accuracy, consider using specialized OCR libraries or 
> APIs that are designed specifically for math recognition.
> source: Chat gpt
> Vào lúc 04:55:05 UTC+7 ngày Thứ Tư, 19 tháng 7, 2023, kwmz...@gmail.com 
> đã viết:
>
>> Hi everyone, 
>>
>> I'm trying to use Tesseract to detect both the english part and the 
>> mathematical part of the image below and it doesn't seem to work 
>>
>> [image: tessa.png]
>>
>> The code I'm using is :
>>
>> *import pytesseract*
>> *import cv2*
>>
>> *custom_config = r'-l eng+equ'*
>>
>>
>> *img = cv2.imread("tessa.png")text = pytesseract.image_to_string(img, 
>> config=custom_config,)print(text)*
>>
>> The output being produced is just (see below) without the mathematical 
>> part even though I've used eng+equ 
>> [image: Screenshot 2023-07-18 at 5.54.13 PM.png]
>>
>> Did anyone find a workaround for this or must I retrain tesseract? 
>>
>> Regards,
>> Nash
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/da499789-b564-4af8-8e6e-aba76ae6e0e8n%40googlegroups.com.

[tesseract-ocr] Re: Tesseract 5.x for Math recognition

Reply via email to