Re: [tesseract-ocr] Re: Reading image from Rubber

Taresh Chaudhari Thu, 19 Dec 2024 22:08:45 -0800

HI,
Sure, can we connect tomorrow around 11:30 am IST at Google meet.  My Id is 
"tareshchaudh...@gmail.com".



On Wednesday, 11 December 2024 at 18:53:17 UTC+5:30 mahmoud...@gmail.com 
wrote:

> Hello I want make or generated with you a simple file trainddata by 
> jtessboxeditor for Tesseract and test it can you inform me time to discuss 
> The steps.  Thanks 
>
> في الثلاثاء، ٢٦ نوفمبر ٢٠٢٤، ٥:٠١ م Taresh Chaudhari <tareshc...@gmail.com> 
> كتب:
>
>> Thanks Mahmoud for sharing. I did apply these techniques, but still 
>> results are not good and still trying to solve this problem. Let me see how 
>> does it proceed.
>>
>> On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 mahmoud...@gmail.com 
>> wrote:
>>
>>> To improve the accuracy of text extraction, you can preprocess the image 
>>> before passing it to the OCR engine. Preprocessing techniques like 
>>> converting the image to grayscale, enhancing contrast, or applying filters 
>>> can help reduce noise and improve readability. Additionally, tweaking the 
>>> pytesseract settings like changing the --psm value may also improve the 
>>> results.
>>>
>>> Here’s an updated version of your code with some preprocessing steps:
>>> import pytesseract
>>> from PIL import Image, ImageEnhance, ImageFilter
>>>
>>> pytesseract.pytesseract.tesseract_cmd = 
>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>>>
>>> # Path to your image
>>> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'
>>>
>>> def extract_text_from_image(image_path):
>>>     # Open the image
>>>     img = Image.open(image_path)
>>>
>>>     # Convert the image to grayscale to improve text-background contrast
>>>     img = img.convert('L')  # Convert image to grayscale
>>>     img = ImageEnhance.Contrast(img).enhance(2)  # Increase contrast
>>>     img = img.filter(ImageFilter.SHARPEN)  # Sharpen the image
>>>
>>>     # Use pytesseract to extract text
>>>
>>>
>>>     extracted_text = pytesseract.image_to_string(img, config='--psm 6')  
>>> # PSM 6 assumes a block of text
>>>     return extracted_text.strip()
>>>
>>> # Extract and print text
>>> text = extract_text_from_image(image_path)
>>> print(f"Text extracted from {image_path}: {text}")
>>>
>>> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari <
>>> tareshc...@gmail.com> كتب:
>>>
>>>> Attaching a image for reference.
>>>>
>>>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari wrote:
>>>>
>>>>> Hi, 
>>>>> I am trying to read the characters from the image, which has 
>>>>> characters with black color in the background. Attaching the code which i 
>>>>> used to extract, currently its giving the partial output. Can you help me 
>>>>> to guide how to make it accurate? 
>>>>>
>>>>>
>>>>> import pytesseract
>>>>> from PIL import Image
>>>>> pytesseract.pytesseract.tesseract_cmd = 
>>>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe'
>>>>> # Paths to your images
>>>>> image_paths = [
>>>>>    'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg']
>>>>>
>>>>> # Function to process an image and extract text
>>>>> def extract_text_from_image(image_path):
>>>>>     # Open the image
>>>>>     img = Image.open(image_path)
>>>>>     
>>>>>     # Use pytesseract to perform OCR
>>>>>     extracted_text = pytesseract.image_to_string(img, config='--psm 
>>>>> 6')  # PSM 6 assumes a block of text
>>>>>     return extracted_text.strip()
>>>>>
>>>>> # Process all images and print results
>>>>> for img_path in image_paths:
>>>>>     text = extract_text_from_image(img_path)
>>>>>     print(f"Text extracted from {img_path}: {text}")
>>>>>
>>>> -- 
>>>> You received this message because you are subscribed to the Google 
>>>> Groups "tesseract-ocr" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send 
>>>> an email to tesseract-oc...@googlegroups.com.
>>>> To view this discussion visit 
>>>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com
>>>>  
>>>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>> .
>>>>
>>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>>
> To view this discussion visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/f4cda1a1-15e8-49b9-9cd0-b37c791cdf9bn%40googlegroups.com.

Re: [tesseract-ocr] Re: Reading image from Rubber

Reply via email to