nohup make MODEL_NAME=ben START_MODEL=ben LANG_TYPE=Indic GROUND_TRUTH_DIR=data/ben-ground-truth TESSDATA=$HOME/tessdata_best DEBUG_INTERVAL=-1 training MAX_ITERATIONS=50000 >> data/ben.log &
Graphs are created using the training log file as well as validation log files. Some of these require using PRs which have not yet been merged in tesstrain repo. See https://github.com/tesseract-ocr/tesstrain/pulls For Evaluation reports, I used https://github.com/eddieantonio/ocreval On Fri, Jan 1, 2021 at 12:09 PM Soumik Ranjan Dasgupta < ranjansou...@gmail.com> wrote: > Hi Shreeshrii, > > Can you please tell me the training command used? Also, how can I create > the graphs and these other documents? > > On Sat, 26 Dec 2020, 18:37 Shree Devi Kumar, <shreesh...@gmail.com> wrote: > >> Soumik, >> >> I used your groundtruth and trained using ben as the START_MODEL. I got >> best results on the validation set of images at around 5000 iterations. see >> attached Accuracy report and CER graph. >> >> >> >> On Thu, Dec 24, 2020 at 8:36 PM Soumik Ranjan Dasgupta < >> ranjansou...@gmail.com> wrote: >> >>> Hi everyone, >>> I wanted to do fine-tune the ben.traineddata model by using some ancient >>> text that were supposedly printed with typeset. I have roughly around 1k >>> lines of text and tried the normal fine-tuning approach with around 25k >>> iterations. >>> The thing that surprised me the most was even after packing the >>> traineddata (character error was around 4%) and testing an unseen image, >>> the performance was exactly the same. Not a single character was different! >>> You can find the traineddata, training data, the logs and the source >>> code at this link: >>> https://github.com/srdg/unarchived_ben_tess/releases/tag/v0.0.4-alpha >>> >>> Can anyone tell me exactly what I am doing wrong here? Do I need to >>> change any training parameter, increase my training data, or anything else >>> completely? >>> >>> Best regards, >>> Soumik >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesseract-ocr+unsubscr...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/1fc044d1-b0ae-45d5-9041-e6fbf8ec5089n%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/1fc044d1-b0ae-45d5-9041-e6fbf8ec5089n%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> >> >> -- >> >> ____________________________________________________________ >> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZ3A7CUEqw29Gxu6r1-cLHPTLFt%3D%3D0C0109D_6x6C7Kw%40mail.gmail.com >> <https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVZ3A7CUEqw29Gxu6r1-cLHPTLFt%3D%3D0C0109D_6x6C7Kw%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAM-%2BFN%3DZggnH4wV5vUhY9nsSqjKg9xZ5TQDoCMwSqf7H0oPogQ%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAM-%2BFN%3DZggnH4wV5vUhY9nsSqjKg9xZ5TQDoCMwSqf7H0oPogQ%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWcMRdq2Ok%3Dz9aTmB68QeKQE2de64C%2BxpL3fOWX025_OA%40mail.gmail.com.