Thank your tool - it is already listed in tesseract doc: https://github.com/tesseract-ocr/tessdoc/blob/main/User-Projects-%E2%80%93-3rdParty.md#4-others-utilities-tools-command-line-interfaces-cli-etc
Zdenko ut 8. 4. 2025 o 6:09 Ajinkya Bobade <ajinkyabobad...@gmail.com> napĂsal(a): > I have noticed that text cleaning is the most difficult part in OCR > pipeline. I have struggled alot on this part, without properly cleaned text > OCR simply fails in terms of accuracy. In order to handle text cleaning > seperately I created a GitHub repo that uses AI to clean up all text in a > image. Once the text is cleaned we can choose our own custom OCR models on > it. I have personally seen OCR accuracy shoot up to 99% on a properly > preprocessed and cleaned image. > > Here is a Github: https://github.com/ajinkya933/ClearText link. > > Regards > Ajinkya > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion visit > https://groups.google.com/d/msgid/tesseract-ocr/CAHy6iNOjhs7ZY7r26fGzqJOUr2e%2BF3bY%3DeDCHjM-VD7XH5M%3DTA%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAHy6iNOjhs7ZY7r26fGzqJOUr2e%2BF3bY%3DeDCHjM-VD7XH5M%3DTA%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion visit https://groups.google.com/d/msgid/tesseract-ocr/CAJbzG8xFD8oaOG3_TSSG8RCnPv0vF5E7BA8TtZAk_KMvz6%3DeKQ%40mail.gmail.com.