[tesseract-ocr] Line level training

2018-11-11 Thread favpdf
Dear All, Currently, tesseract training is based on the pair (tiff and box). It's not easy to make box file (char level) if we try to train some scanned document images not generated by programs. My question is whether we have a plan to support line level training in future? Thanks! Rega

Re: [tesseract-ocr] Line level training

2018-11-12 Thread favpdf
That means we can label some existing images with text line boxes instead of individual char boxes in current tesseract 4.0? I checked the box files generated by the training process and found that char boxes were still there. Thanks, Jun 在 2018年11月12日星期一 UTC+8下午5:26:48,Lorenzo Blz写道: > > Tes

Re: [tesseract-ocr] Line level training

2018-11-12 Thread favpdf
It's clear now. Thanks for the information. Jun 在 2018年11月12日星期一 UTC+8下午7:38:19,Lorenzo Blz写道: > Il giorno lun 12 nov 2018 alle ore 11:53 > > ha scritto: > >> That means we can label some existing images with text line boxes instead >> of individual char boxes in current tesseract 4.0? I check