[tesseract-ocr] Fine-Tune Arabic Model

2024-04-12 Thread Omar Samir
I have created a dataset with almost 200 million words. So there are about 20 million examples to train the model on if each image contains 10 words. Is it enough to get better results? under consideration, we have fine-tuned a model using 20 thousand examples and it did worse than the pre-trai

[tesseract-ocr] Transfer learning

2024-03-03 Thread Omar Samir
How can I use transfer learning to fine-tune a tessdata_best model?? -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com.

[tesseract-ocr] Dataset used to train tessdata_best models

2024-03-03 Thread Omar Samir
What is the dataset used to train ara.traineddata model in tessdata_beset -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups

[tesseract-ocr] Result are worse after fine-tune

2024-03-03 Thread Omar Samir
I was fine-tuning the ara.traineddata in tessdata_best and I have benchmarked the output model and tessdata_best model. I have found that the tessdata_best model did better than the output model, so is it a normal thing that my output model did worse? If not what am I doing wrong? I am using T

Re: [tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
How can I use grid search to train Tesseract model? On Friday, February 9, 2024 at 5:46:58 PM UTC+2 Omar Samir wrote: > Thank you for that. Another question, is there not any source that > suggests parameters for various uses like for training/fine-tuning a model > on a specific la

Re: [tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
margins. > > On Fri, Feb 9, 2024 at 7:49 AM Omar Samir wrote: > >> Now I am trying to fine tune Tesseract Arabic model in the testdata_best >> with more than 20K images in the dataset. I want to know the best values >> for parameters like MAX_ITERATIONS, EPOCHS, and LEA

[tesseract-ocr] The Best way to fine tune a Tesseract model

2024-02-09 Thread Omar Samir
Now I am trying to fine tune Tesseract Arabic model in the testdata_best with more than 20K images in the dataset. I want to know the best values for parameters like MAX_ITERATIONS, EPOCHS, and LEARNING_RATE so how can I find these values I also look inside research papers but I find nothing ab

[tesseract-ocr] Failed to load list of training filenames from data/foo/list.train

2023-12-30 Thread Omar Samir
I was trying to train Tesseract-OCR on the ocrd-testset.zip in the README, and I get this error above in the subject -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an ema

[tesseract-ocr] Failed to load list of training filenames from data/foo/list.train

2023-12-30 Thread Omar Samir
I am trying to train tesseract using tesstrain on ocrd-testset.zip in the README, and I get this error above in the subject. I have downloaded Tesseract using these two videos: https://youtu.be/pe80OEJkS7U?si=UEk0GB9DhlAlt1yq https://youtu.be/pe80OEJkS7U?si=AtfumDuDrMy8sXuO I cloned tesstrain re

[tesseract-ocr] Re: Failed to load list of training filenames from list.train

2023-12-30 Thread Omar Samir
Hello Saad, I am facing the same problem. So have you found any solution to it. On Friday, September 15, 2023 at 2:25:59 PM UTC+3 saadah...@gmail.com wrote: > Someone please help me with this issue > I am trying to finetune the arabic trained data based on some arabic > numerals, for which i hav