[tesseract-ocr] Re: Digits only recognized when mixed with letters

2024-02-29 Thread Iman Firouzian
Hi again, I've tested it on windows and pycharm. the tesseract version is tesseract v5.0.0-alpha.20200328 the result is roughly the same. it would recognize correctly when numbers are mixed with letters. Any specific confugurations needed? thanks for helping On Thursday, February 29, 2024 at 9:5

[tesseract-ocr] Re: Digits only recognized when mixed with letters

2024-02-29 Thread Philippe Argouarch
I have a similar problem with the breton language, the lib does not recognize the verbal particle o and replace it by a zero 0 . oa which mean "was' in english becomes 0a philippe Le jeudi 29 février 2024 à 09:45:53 UTC+1, Iman Firouzian a écrit : > Hi again, > I've tested it on windows and py

[tesseract-ocr] user patterns with tesserocr python API

2024-02-29 Thread Roman Seidel
Hi all, I am currently try to use user-patterns on the PyTessBaseAPI from tesserocr [1]. What I've done is to initialize the API with: with PyTessBaseAPI(path='/usr/share/tesseract-ocr/4.00/tessdata', lang= LANGUAGE, psm=int(psm), oem=int(TOEM)) as api: setting the user patterns file with: a

[tesseract-ocr] Re: Digits only recognized when mixed with letters

2024-02-29 Thread Tom Morris
Thanks for the version and model information. That'll be useful for anyone trying to help. My best guess is that there's something about the Farsi training data which is causing this, but I don't know what (and I don't speak Farsi). One thing you might try is using the Arabic script model and s

[tesseract-ocr] Re: Digits only recognized when mixed with letters

2024-02-29 Thread Tom Morris
p.s. You can find the Farsi training data here: https://github.com/tesseract-ocr/langdata_lstm/blob/main/fas/fas.training_text -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, s