Hi,

I am trying to follow the TessTutorial to train tesseract from scratch. I 
have some questions regarding the lang data to understand how the training 
is working.

The provided training text has some random English words. The questions 
regarding the training text:

1- Is using text from some scope will improve the performance of tesseract 
on that scope? For example, training tesseract on special names or vocabs 
that are not English but has Latin letters and numbers (a-z A-Z 0-9 and 
special chars). Example: pH_scale1

2 - Is generating words from random letters will do the same as using 
English words?
The provided eng.trainingtext has text such as :
"different New Articles page 23 a To Service ~~ a details DC that don't as 
7 «« Date:"

What if I use something random like this:
"sqwrLwU2bo BLiRDhvAoM USyWtpBFi5 UwLgXyoz1e UqiXudhrhz dDKAdnI8Z2 
YIl6T6d7m6 G2IVtTRbuu Lh6NvWNLc3 CGD2SXOoNT"
 

Thanks

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/82dd955f-782f-4091-9d6d-6de25bc02ad9n%40googlegroups.com.

Reply via email to