Hi, I am trying to follow the TessTutorial to train tesseract from scratch. I have some questions regarding the lang data to understand how the training is working.
The provided training text has some random English words. The questions regarding the training text: 1- Is using text from some scope will improve the performance of tesseract on that scope? For example, training tesseract on special names or vocabs that are not English but has Latin letters and numbers (a-z A-Z 0-9 and special chars). Example: pH_scale1 2 - Is generating words from random letters will do the same as using English words? The provided eng.trainingtext has text such as : "different New Articles page 23 a To Service ~~ a details DC that don't as 7 «« Date:" What if I use something random like this: "sqwrLwU2bo BLiRDhvAoM USyWtpBFi5 UwLgXyoz1e UqiXudhrhz dDKAdnI8Z2 YIl6T6d7m6 G2IVtTRbuu Lh6NvWNLc3 CGD2SXOoNT" Thanks -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/82dd955f-782f-4091-9d6d-6de25bc02ad9n%40googlegroups.com.