Hi I'm KOREAN I'm studying Tesseract 4.0 https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00 This page is very useful to study tesseract 4.0
But, I'm poor at Reading English & Understanding Tesseract training 4.0 In short, the next senentes cannot be understood by me. *Creating Training Data* As with base Tesseract, there is a choice between rendering synthetic training data from fonts, or labelling some pre-existing images (like ancient manuscripts for example). In either case, the required format is still the tiff/box file pair, except that the boxes only need to cover a textline instead of individual characters. 'Newline' boxes with tab as the character must be inserted between textlines to indicate the end-of-line. Multi-word boxes require a different box format, as the space would confuse the parser I have no idea .... Could you explain this sentence to me & I want to see the example of the box file /tiff (by tesseract 4.0) Thank you . -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/a20fd0e3-b3ae-4ab2-9fa1-97b147fc86aa%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.