As the wiki say : New feature It is possible to add a few new characters to the character set > and train for them by fine tuning, without a large amount of training data. > > I'm trying to add some symbol to jpn.traineddata by using fine tuning a few characters but i'm so wondering how many training text line is good for japanese? In english eng.training_text file have only 70 lines of text but in japanese this have 1670 lines of text. And i think if i add all of 1670 lines to training text i maybe got overfiting. Must i have put all of this to traning text or a few of lines?
https://github.com/tesseract-ocr/langdata/blob/master/eng/eng.training_text https://github.com/tesseract-ocr/langdata/blob/master/jpn/jpn.training_text Thanks for read! Sorry for my bad english ! -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/545fa723-736a-4283-b85b-8d37c176bbfa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.