I did a find command in the root directory and searched for the tesstrain script. It could only find the script that i pulled from the latest tesseract git repo. My training script calls that specific tesstrain script using a relative path so it couldn't be an older version
On Thursday, January 7, 2021 at 11:01:55 AM UTC-6 shree wrote: > Old versions of tesstrain.sh used to limit training to 3 pages. Looks like > you may have an old version in the path somewhere. > > On Thu, Jan 7, 2021 at 10:17 PM Kamui 7 <qntmm...@gmail.com> wrote: > >> I have a script to train tesseract and I ran it on Arch Linux, Debian, >> and even a docker container and they all produce the same errors. I checked >> to make sure the script is correct as well. >> >> Bug 1: >> This happens when tesstrain runs text2image. The max pages parameter does >> not work at all. It ends up only rendering 4 pages regardless of what I >> pass in for the maxpages parameter. I even tried hardcoding it into the >> tesstrain_utils.sh file and it still does the same thing. >> >> Bug 2: >> After it finishes producing those 4 pages, i finetune it with >> lstmtraining and the resulting output is full of "Encoding of string >> failed!" errors. >> >> Bug 3: >> Along with those encoding errors, it also outputs the following text: >> >> "Image too small to scale!! (2x48 vs min width of 3) >> Line cannot be recognized!! >> Image not trainable" >> >> I will upload my script along with the Dockerfile if anyone wants to take >> a look. >> >> >> https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > > > -- > > ____________________________________________________________ > भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/42a49dfd-7b52-437e-8840-9dbdddbad0aen%40googlegroups.com.