Guys I was following this
link:
https://tesseract-ocr.github.io/tessdoc/TrainingTesseract-4.00#introduction
It says make a unicharset file. What the hell is that supposed to be? How
can I make it? And then it goes on to ramble about unicharsetcompressed. It
does not explain how. What the fuck
Hello.. Currently I have a lot of *news domain* data to train in tesseract
for* non-english* language. But what I'd like to know is that in my news
data, there are *many english words* and should I *remove* or *add* these
english words to get the *better accuracy*. ( What I learned is that in
Resize your images so that text is 36 pixels high. That's what is used for
eng models.
Since you are fine tuning, limit number of iterations to 400 or so (not
1 which is default).
Use dedug_level of -1 during training so that you can see the details per
iteration.
On Sun, Sep 20, 2020, 00:
3 matches
Mail list logo