I would like to add one more question, were the other Latin languages, such 
as French trained from scratch or just fine-tuned the English language?

On Saturday, July 24, 2021 at 11:25:12 PM UTC-4 akm wrote:

> Hi,
>
> I am trying to follow the TessTutorial to train tesseract from scratch. I 
> have some questions regarding the lang data to understand how the training 
> is working.
>
> The provided training text has some random English words. The questions 
> regarding the training text:
>
> 1- Is using text from some scope will improve the performance of tesseract 
> on that scope? For example, training tesseract on special names or vocabs 
> that are not English but has Latin letters and numbers (a-z A-Z 0-9 and 
> special chars). Example: pH_scale1
>
> 2 - Is generating words from random letters will do the same as using 
> English words?
> The provided eng.trainingtext has text such as :
> "different New Articles page 23 a To Service ~~ a details DC that don't as 
> 7 «« Date:"
>
> What if I use something random like this:
> "sqwrLwU2bo BLiRDhvAoM USyWtpBFi5 UwLgXyoz1e UqiXudhrhz dDKAdnI8Z2 
> YIl6T6d7m6 G2IVtTRbuu Lh6NvWNLc3 CGD2SXOoNT"
>  
>
> Thanks
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/79be938d-4ccb-4d95-9375-fff026049411n%40googlegroups.com.

Reply via email to