Re: [tesseract-ocr] Make russian_with_accent traineddata file

2024-02-09 Thread Romain B. (Le Belge)
Here is all the informations to reproduce my problem: Here is an image from my russian learning book(french version) [image: testTes2.png] If you run it with tesseract(while using the russian + french language) with this command: *tesseract testTes2.png stdout -l rus+fra* You will get this resul

[tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
Now I am trying to fine tune Tesseract Arabic model in the testdata_best with more than 20K images in the dataset. I want to know the best values for parameters like MAX_ITERATIONS, EPOCHS, and LEARNING_RATE so how can I find these values I also look inside research papers but I find nothing ab

Re: [tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread La Monte H. P. Yarroll
You want to do grid search. The best practice is to try each of those parameters over a range and try all combinations. If you find that your results are improving at the one or another extreme rather in the middle of the range, you'll want to pick a new set of parameters overlapping your margins.

Re: [tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
Thank you for that. Another question, is there not any source that suggests parameters for various uses like for training/fine-tuning a model on a specific language or explaining the best values for each use On Friday, February 9, 2024 at 3:44:04 PM UTC+2 piggy wrote: > You want to do grid sear

Re: [tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
How can I use grid search to train Tesseract model? On Friday, February 9, 2024 at 5:46:58 PM UTC+2 Omar Samir wrote: > Thank you for that. Another question, is there not any source that > suggests parameters for various uses like for training/fine-tuning a model > on a specific language or exp

Re: [tesseract-ocr] Make russian_with_accent traineddata file

2024-02-09 Thread Tom Morris
Salut Romain, On Friday, February 9, 2024 at 6:03:02 AM UTC-5 Romain B. (Le Belge) wrote: I'm trying to fix this issue. By what i have read, i think i need to re-train the russian language in tesseract for it to support accents. I found this