HI, Sure, can we connect tomorrow around 11:30 am IST at Google meet. My Id is "tareshchaudh...@gmail.com".
On Wednesday, 11 December 2024 at 18:53:17 UTC+5:30 mahmoud...@gmail.com wrote: > Hello I want make or generated with you a simple file trainddata by > jtessboxeditor for Tesseract and test it can you inform me time to discuss > The steps. Thanks > > في الثلاثاء، ٢٦ نوفمبر ٢٠٢٤، ٥:٠١ م Taresh Chaudhari <tareshc...@gmail.com> > كتب: > >> Thanks Mahmoud for sharing. I did apply these techniques, but still >> results are not good and still trying to solve this problem. Let me see how >> does it proceed. >> >> On Tuesday, 26 November 2024 at 00:31:29 UTC+5:30 mahmoud...@gmail.com >> wrote: >> >>> To improve the accuracy of text extraction, you can preprocess the image >>> before passing it to the OCR engine. Preprocessing techniques like >>> converting the image to grayscale, enhancing contrast, or applying filters >>> can help reduce noise and improve readability. Additionally, tweaking the >>> pytesseract settings like changing the --psm value may also improve the >>> results. >>> >>> Here’s an updated version of your code with some preprocessing steps: >>> import pytesseract >>> from PIL import Image, ImageEnhance, ImageFilter >>> >>> pytesseract.pytesseract.tesseract_cmd = >>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >>> >>> # Path to your image >>> image_path = 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg' >>> >>> def extract_text_from_image(image_path): >>> # Open the image >>> img = Image.open(image_path) >>> >>> # Convert the image to grayscale to improve text-background contrast >>> img = img.convert('L') # Convert image to grayscale >>> img = ImageEnhance.Contrast(img).enhance(2) # Increase contrast >>> img = img.filter(ImageFilter.SHARPEN) # Sharpen the image >>> >>> # Use pytesseract to extract text >>> >>> >>> extracted_text = pytesseract.image_to_string(img, config='--psm 6') >>> # PSM 6 assumes a block of text >>> return extracted_text.strip() >>> >>> # Extract and print text >>> text = extract_text_from_image(image_path) >>> print(f"Text extracted from {image_path}: {text}") >>> >>> في الاثنين، ٢٥ نوفمبر ٢٠٢٤، ٤:١٢ م Taresh Chaudhari < >>> tareshc...@gmail.com> كتب: >>> >>>> Attaching a image for reference. >>>> >>>> On Monday, 25 November 2024 at 15:52:27 UTC+5:30 Taresh Chaudhari wrote: >>>> >>>>> Hi, >>>>> I am trying to read the characters from the image, which has >>>>> characters with black color in the background. Attaching the code which i >>>>> used to extract, currently its giving the partial output. Can you help me >>>>> to guide how to make it accurate? >>>>> >>>>> >>>>> import pytesseract >>>>> from PIL import Image >>>>> pytesseract.pytesseract.tesseract_cmd = >>>>> 'C:\\Users\\M562765\\AppData\\Local\\Programs\\Tesseract-OCR\\tesseract.exe' >>>>> # Paths to your images >>>>> image_paths = [ >>>>> 'C:/Users/M562765/Downloads/Unable-images/Unable/crop1.jpg'] >>>>> >>>>> # Function to process an image and extract text >>>>> def extract_text_from_image(image_path): >>>>> # Open the image >>>>> img = Image.open(image_path) >>>>> >>>>> # Use pytesseract to perform OCR >>>>> extracted_text = pytesseract.image_to_string(img, config='--psm >>>>> 6') # PSM 6 assumes a block of text >>>>> return extracted_text.strip() >>>>> >>>>> # Process all images and print results >>>>> for img_path in image_paths: >>>>> text = extract_text_from_image(img_path) >>>>> print(f"Text extracted from {img_path}: {text}") >>>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "tesseract-ocr" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to tesseract-oc...@googlegroups.com. >>>> To view this discussion visit >>>> https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/tesseract-ocr/83985355-a349-4ed7-a2a9-c938fda1a5f4n%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> > To view this discussion visit >> https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/050091bf-ff93-4907-8f8d-74c06edd9f3en%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/f4cda1a1-15e8-49b9-9cd0-b37c791cdf9bn%40googlegroups.com.