[tesseract-ocr] Numerous different bugs while training jpn

Kamui 7 Thu, 07 Jan 2021 08:47:51 -0800

I have a script to train tesseract and I ran it on Arch Linux, Debian, and 
even a docker container and they all produce the same errors. I checked to 
make sure the script is correct as well.


Bug 1:
This happens when tesstrain runs text2image. The max pages parameter does 
not work at all. It ends up only rendering 4 pages regardless of what I 
pass in for the maxpages parameter. I even tried hardcoding it into the 
tesstrain_utils.sh file and it still does the same thing. 

Bug 2:
After it finishes producing those 4 pages, i finetune it with lstmtraining 
and the resulting output is full of "Encoding of string failed!" errors.

Bug 3:
Along with those encoding errors, it also outputs the following text:

"Image too small to scale!! (2x48 vs min width of 3)
Line cannot be recognized!!
Image not trainable"

I will upload my script along with the Dockerfile if anyone wants to take a 
look. 

https://drive.google.com/file/d/1FkW1q1cXwOxY6Yi1A1cMzInbtJa9L01M/view?usp=sharing

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7a9415d6-4d0c-4333-98c0-2628720661ebn%40googlegroups.com.

[tesseract-ocr] Numerous different bugs while training jpn

Reply via email to