I would like to add one more question, were the other Latin languages, such as French trained from scratch or just fine-tuned the English language?
On Saturday, July 24, 2021 at 11:25:12 PM UTC-4 akm wrote: > Hi, > > I am trying to follow the TessTutorial to train tesseract from scratch. I > have some questions regarding the lang data to understand how the training > is working. > > The provided training text has some random English words. The questions > regarding the training text: > > 1- Is using text from some scope will improve the performance of tesseract > on that scope? For example, training tesseract on special names or vocabs > that are not English but has Latin letters and numbers (a-z A-Z 0-9 and > special chars). Example: pH_scale1 > > 2 - Is generating words from random letters will do the same as using > English words? > The provided eng.trainingtext has text such as : > "different New Articles page 23 a To Service ~~ a details DC that don't as > 7 «« Date:" > > What if I use something random like this: > "sqwrLwU2bo BLiRDhvAoM USyWtpBFi5 UwLgXyoz1e UqiXudhrhz dDKAdnI8Z2 > YIl6T6d7m6 G2IVtTRbuu Lh6NvWNLc3 CGD2SXOoNT" > > > Thanks > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/79be938d-4ccb-4d95-9375-fff026049411n%40googlegroups.com.