Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread ShreeDevi Kumar
Please also see http://doc-creator.labri.fr/ which makes it easy to create synthetic data similar to manuscript pages. On Tue, Jun 12, 2018 at 9:03 PM ShreeDevi Kumar wrote: > Please see the project https://github.com/OCR-D/ocrd-train > > It has support for training tesseract if you provide li

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread ShreeDevi Kumar
Please see the project https://github.com/OCR-D/ocrd-train It has support for training tesseract if you provide line images and matching ground truth text. On Tue, Jun 12, 2018 at 8:19 PM wrote: > Same question here. I see that the documentation on training Tesseract 4 > makes some reference t

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-06-12 Thread jbcamps
Same question here. I see that the documentation on training Tesseract 4 makes some reference to manuscripts: As with base Tesseract, there is a choice between rendering synthetic training data from fonts, or labeling some pre-existing images (like ancient manuscripts for example). So, if

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
>I have an image and a text file with the line content for each line of manuscript text. The doc says what to do, but not how. >I first thought I'd need img/box files pairs, but it seems it was for Tesseract 3 and is now irrelevant... Tesseract4.0.0beta.1 does not officially support LSTM training

Re: [tesseract-ocr] Tesseract 4 for old languages

2018-03-12 Thread ShreeDevi Kumar
Please try tesseract 4.0.0beta.1 with languages such as *enm* (English, Middle (1100-1500)) and Fraktur script Also, look at the following project from a few years back http://emop.tamu.edu/outcomes/Franken-Plus ShreeDevi भजन - की