OK thanks في الجمعة، ٢٠ ديسمبر ٢٠٢٤، ١٠:٠٨ ص Taresh Chaudhari < tareshchaudh...@gmail.com> كتب:
> HI, > Sure, can we connect tomorrow around 11:30 am IST at Google meet. My Id > is "tareshchaudh...@gmail.com". > > > On Wednesday, 11 December 2024 at 18:53:17 UTC+5:30 mahmoud...@gmail.com > wrote: > >> Hello I want make or generated with you a simple file trainddata by >> jtessboxeditor for Tesseract and test it can you inform me time to discuss >> The steps. Thanks >> >> في الثلاثاء، ٢٦ نوفمبر ٢٠٢٤، ٥:٠١ م Taresh Chaudhari < >> tareshc...@gmail.com> كتب: >> >>> Thanks Mahmoud for sharing. I did apply these techniques, but still >>> results are not good and still trying to solve this problem. Let me see how >>> does it proceed. >>> >>> On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 mahmoud...@gmail.com >>> wrote: >>> >>>> To improve the accuracy of text extraction, you can preprocess the >>>> image before passing it to the OCR engine. Preprocessing techniques like >>>> converting the image to grayscale, enhancing contrast, or applying filters >>>> can help reduce noise and improve readability. Additionally, tweaking the >>>> pytesseract settings like changing the --psm value may also improve the >>>> results. >>>> >>>> Here’s an updated version of your code with some preprocessing steps: >>>> import pytesseract >>>> from PIL import Image, ImageEnhance, ImageFilter >>>> >>>> pytesseract.pytesseract.tesseract_cmd = >>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >>>> >>>> # Path to your image >>>> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg' >>>> >>>> def extract_text_from_image(image_path): >>>> # Open the image >>>> img = Image.open(image_path) >>>> >>>> # Convert the image to grayscale to improve text-background contrast >>>> img = img.convert('L') # Convert image to grayscale >>>> img = ImageEnhance.Contrast(img).enhance(2) # Increase contrast >>>> img = img.filter(ImageFilter.SHARPEN) # Sharpen the image >>>> >>>> # Use pytesseract to extract text >>>> >>>> >>>> extracted_text = pytesseract.image_to_string(img, config='--psm >>>> 6') # PSM 6 assumes a block of text >>>> return extracted_text.strip() >>>> >>>> # Extract and print text >>>> text = extract_text_from_image(image_path) >>>> print(f"Text extracted from {image_path}: {text}") >>>> >>>> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari < >>>> tareshc...@gmail.com> كتب: >>>> >>>>> Attaching a image for reference. >>>>> >>>>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari >>>>> wrote: >>>>> >>>>>> Hi, >>>>>> I am trying to read the characters from the image, which has >>>>>> characters with black color in the background. Attaching the code which i >>>>>> used to extract, currently its giving the partial output. Can you help me >>>>>> to guide how to make it accurate? >>>>>> >>>>>> >>>>>> import pytesseract >>>>>> from PIL import Image >>>>>> pytesseract.pytesseract.tesseract_cmd = >>>>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >>>>>> # Paths to your images >>>>>> image_paths = [ >>>>>> 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'] >>>>>> >>>>>> # Function to process an image and extract text >>>>>> def extract_text_from_image(image_path): >>>>>> # Open the image >>>>>> img = Image.open(image_path) >>>>>> >>>>>> # Use pytesseract to perform OCR >>>>>> extracted_text = pytesseract.image_to_string(img, config='--psm >>>>>> 6') # PSM 6 assumes a block of text >>>>>> return extracted_text.strip() >>>>>> >>>>>> # Process all images and print results >>>>>> for img_path in image_paths: >>>>>> text = extract_text_from_image(img_path) >>>>>> print(f"Text extracted from {img_path}: {text}") >>>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "tesseract-ocr" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to tesseract-oc...@googlegroups.com. >>>>> To view this discussion visit >>>>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-oc...@googlegroups.com. >>> >> To view this discussion visit >>> https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/f4cda1a1-15e8-49b9-9cd0-b37c791cdf9bn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/f4cda1a1-15e8-49b9-9cd0-b37c791cdf9bn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAB5aXsmf_vvH9J0%3DcGLrquPzYfRrH2YF4UB2M6Q26DKUnxG1kg%40mail.gmail.com.