Old versions of tesstrain.sh used to limit training to 3 pages. Looks like you may have an old version in the path somewhere.
On Thu, Jan 7, 2021 at 10:17 PM Kamui 7 <qntmmag...@gmail.com> wrote: > I have a script to train tesseract and I ran it on Arch Linux, Debian, and > even a docker container and they all produce the same errors. I checked to > make sure the script is correct as well. > > Bug 1: > This happens when tesstrain runs text2image. The max pages parameter does > not work at all. It ends up only rendering 4 pages regardless of what I > pass in for the maxpages parameter. I even tried hardcoding it into the > tesstrain_utils.sh file and it still does the same thing. > > Bug 2: > After it finishes producing those 4 pages, i finetune it with lstmtraining > and the resulting output is full of "Encoding of string failed!" errors. > > Bug 3: > Along with those encoding errors, it also outputs the following text: > > "Image too small to scale!! (2x48 vs min width of 3) > Line cannot be recognized!! > Image not trainable" > > I will upload my script along with the Dockerfile if anyone wants to take > a look. > > > https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing > > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduUQ_maJaMyk2akc9c0-8JquBDkw%2Bi4p6cmW8rW0BQKSdw%40mail.gmail.com.