1. Implement text detection on the image (EAST, YOLO... see https://www.youtube.com/watch?v=ZpRNfWzuexQ) or search for "text detection python" 2. Process detected areas so there is a text without any graphics - see some suggestions in docs ( https://github.com/tesseract-ocr/tessdoc/blob/main/ImproveQuality.md) 3. run OCR (tesseract) on the processed area(s)
Zdenko pi 9. 12. 2022 o 18:15 Anna Muller <amull...@nd.edu> napĂsal(a): > Hello - I am very new to using the Tesseract software. I am currently > completing a project that requires me to read text from TikTok screenshots > - I attached a random example image I got from Tiktok to this post. I am > currently getting pretty inaccurate output. Below I pasted the code I was > using that I got from an online tutorial. > > I was wondering if anybody had any suggestions or would be able to point > me in the right direction to resources where I could better learn how to > fine tune my image processing parameters. > > I am currently using Jupyter Notebooks, but if anybody suggests accessing > Tesseract differently, please let me know. > > *My Code:* > from PIL import Image > column = Image.open('tiktoktest.png') > gray = column.convert('L') > blackwhite = gray.point(lambda x: 0 if x < 200 else 255, '1') > blackwhite.save("tiktok.jpg") > > text_from_image = pytesseract.image_to_string(Image.open('tiktok.jpg')) > print(text_from_image) > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/e7c6a467-e53d-442d-b7bf-1fd645cdd66an%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/e7c6a467-e53d-442d-b7bf-1fd645cdd66an%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8yOcfPKssSd722zkfyQFZWx1fEx379iwGvi7bP132h6qA%40mail.gmail.com.