Please also see http://doc-creator.labri.fr/
which makes it easy to create synthetic data similar to manuscript pages.
On Tue, Jun 12, 2018 at 9:03 PM ShreeDevi Kumar
wrote:
> Please see the project https://github.com/OCR-D/ocrd-train
>
> It has support for training tesseract if you provide li
Please see the project https://github.com/OCR-D/ocrd-train
It has support for training tesseract if you provide line images and
matching ground truth text.
On Tue, Jun 12, 2018 at 8:19 PM wrote:
> Same question here. I see that the documentation on training Tesseract 4
> makes some reference t
Same question here. I see that the documentation on training Tesseract 4
makes some reference to manuscripts:
As with base Tesseract, there is a choice between rendering synthetic
training data from fonts, or labeling some pre-existing images (like
ancient manuscripts for example).
So, if
>I have an image and a text file with the line content for each line of
manuscript text. The doc says what to do, but not how.
>I first thought I'd need img/box files pairs, but it seems it was for
Tesseract 3 and is now irrelevant...
Tesseract4.0.0beta.1 does not officially support LSTM training
Please try tesseract 4.0.0beta.1 with languages such as
*enm* (English, Middle (1100-1500))
and
Fraktur script
Also, look at the following project from a few years back
http://emop.tamu.edu/outcomes/Franken-Plus
ShreeDevi
भजन - की
5 matches
Mail list logo