I have a script to train tesseract and I ran it on Arch Linux, Debian, and even a docker container and they all produce the same errors. I checked to make sure the script is correct as well.
Bug 1: This happens when tesstrain runs text2image. The max pages parameter does not work at all. It ends up only rendering 4 pages regardless of what I pass in for the maxpages parameter. I even tried hardcoding it into the tesstrain_utils.sh file and it still does the same thing. Bug 2: After it finishes producing those 4 pages, i finetune it with lstmtraining and the resulting output is full of "Encoding of string failed!" errors. Bug 3: Along with those encoding errors, it also outputs the following text: "Image too small to scale!! (2x48 vs min width of 3) Line cannot be recognized!! Image not trainable" I will upload my script along with the Dockerfile if anyone wants to take a look. https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com.