Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

Shree Devi Kumar Fri, 06 Dec 2013 17:43:18 -0800

Matthew,
I had tried registering for Aletheia a few months ago. No response so far.
Shree


Shree Devi Kumar
____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com


On Sat, Dec 7, 2013 at 2:57 AM, matthew christy <[email protected]>wrote:

> Hi Janusz,
>
> You're right, Aletheia is not open-source. My mistake on a poor choice of
> words. However, it is free to use after registering, which is also free.
> The only restriction that I'm sure about on it's use is in a commercial
> product. I'll see if I can get a comment on that from someone at PRImA.
>
> Thanks,
> Matt
>
>
> On Friday, December 6, 2013 2:10:56 PM UTC-6, matthew christy wrote:
>>
>> Hi All,
>>
>> The Initiative for Digital Humanities, Media, and Culture (IDHMC) at
>> Texas A&M University, as part of its Early Modern OCR Project 
>> (eMOP<http://emop.tamu.edu/>)
>> has created a new tool, called Franken+, that provides a way to create font
>> training for the Tesseract OCR engine using page images. This is in
>> contrast to Tesseract's documented 
>> method<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>of 
>> font training which involves using a word processing program with a
>> modern font. Franken+ has now been released for beta testing and we invite
>> anyone who's interested to give it a try and to please provide feedback.
>>
>> Franken+ works in conjunction with PRImA's open source Aletheia 
>> tool<http://www.primaresearch.org/tools.php>and allows users to easily and 
>> quickly identify one or more idealized forms
>> of each glyph found on a set of page images. These identified forms are
>> then used to generate a set of Franken-page images matching the page
>> characteristics documented in Tesseract's training instructions, but with a
>> font used in an actual early modern printed document. Franken+ allows you
>> to create Tesseract box files, but will also guide you through the entire
>> Tesseract training process, producing a .traneddata file, and even allow
>> you to identify and OCR documents using that training. In addition,
>> Franken+ makes it easy to combine training from multiple fonts into one
>> training set.
>>
>> For eMOP we are using Franken+ to create training for Tesseract from page
>> images of early modern printed works, but we also think it can be used just
>> as effectively to train Tesseract using images of any kind of font that's
>> not readily available via a word processor. For example, I've seen posts in
>> this group about wanting to train Tesseract to read the signs on the front
>> of buses.
>>
>> You can find out more about Franken+ at http://emop.tamu.edu/node/54 and
>> http://dh-emopweb.tamu.edu/Franken+/. The code is also available open
>> source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B.
>>
>> Thanks,
>> Matt Christy
>>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

Reply via email to