[tesseract-ocr] I have a Question about Creating Traing Data

이경준 Tue, 27 Feb 2018 22:14:36 -0800

Hi 
I'm KOREAN
I'm studying Tesseract 4.0 
https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00
This page is very useful to study tesseract 4.0


But, I'm poor at Reading English & Understanding Tesseract training 4.0 
In short, the next senentes cannot be understood by me.

*Creating Training Data*

As with base Tesseract, there is a choice between rendering synthetic 
training data from fonts, or labelling some pre-existing images (like 
ancient manuscripts for example). In either case, the required format is 
still the tiff/box file pair, except that the boxes only need to cover a 
textline instead of individual characters. 'Newline' boxes with tab as the 
character must be inserted between textlines to indicate the end-of-line. 
Multi-word boxes require a different box format, as the space would confuse 
the parser

I have no idea .... Could you explain this sentence to me & I want to see 
the example of the box file /tiff (by tesseract 4.0) 

Thank you . 

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/a20fd0e3-b3ae-4ab2-9fa1-97b147fc86aa%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

[tesseract-ocr] I have a Question about Creating Traing Data

Reply via email to