Please try tesseract 4.0.0beta.1  with languages such as

*enm* (English, Middle (1100-1500))

and

Fraktur  script

Also, look at the following project from a few years back

http://emop.tamu.edu/outcomes/Franken-Plus

ShreeDevi
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

On Mon, Mar 12, 2018 at 4:32 AM, Guillaume Desforges <aceu...@gmail.com>
wrote:

> Hi
>
> I want to try using Tesseract 4 for old manuscript languages ("The Song of
> Roland" and such).
>
> I have looked at https://github.com/tesseract-ocr/tesseract/wiki/
> TrainingTesseract-4.00 but the steps are very unclear.
>
> I have an image and a text file with the line content for each line of
> manuscript text. The doc says what to do, but not how.
>
> I first thought I'd need img/box files pairs, but it seems it was for
> Tesseract 3 and is now irrelevant...
>
> So I guess my starting point is here : https://github.com/
> tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#
> tutorial-guide-to-lstmtraining
>
> There is no tool to create the lstm-recoder directly. Instead there is a
>> new tool, combine_lang_model which takes as input an input_unicharset
>>  and script_dir(script_dir points to the langdata directory) and
>> optional word list files. It creates the lstm-recoder from the
>> input_unicharset and creates all the dawgs, if wordlists are provided,
>> putting everything together into a traineddata file.
>
>
> I don't really get this part. How do I make  input_unicharset ? What is
> langdata?
>
> Thanks
>
> Guillaume Desforges
>
> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit https://groups.google.com/d/
> msgid/tesseract-ocr/fe1d68a2-76ce-4005-98ea-672710365517%
> 40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/fe1d68a2-76ce-4005-98ea-672710365517%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVyE8w9vtpvXnDX6-KKr5Drpy9Rh1AazTHCgTLKMOFyVA%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to