Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

zdenko podobny Fri, 06 Dec 2013 23:01:46 -0800

I have the same experience.

Zdenko



On Sat, Dec 7, 2013 at 2:42 AM, Shree Devi Kumar <[email protected]>wrote:

> Matthew,
> I had tried registering for Aletheia a few months ago. No response so
> far.
> Shree
>
> Shree Devi Kumar
> ____________________________________________________________
> भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com
>
>
> On Sat, Dec 7, 2013 at 2:57 AM, matthew christy <[email protected]>wrote:
>
>> Hi Janusz,
>>
>> You're right, Aletheia is not open-source. My mistake on a poor choice of
>> words. However, it is free to use after registering, which is also free.
>> The only restriction that I'm sure about on it's use is in a commercial
>> product. I'll see if I can get a comment on that from someone at PRImA.
>>
>> Thanks,
>> Matt
>>
>>
>> On Friday, December 6, 2013 2:10:56 PM UTC-6, matthew christy wrote:
>>>
>>> Hi All,
>>>
>>> The Initiative for Digital Humanities, Media, and Culture (IDHMC) at
>>> Texas A&M University, as part of its Early Modern OCR Project 
>>> (eMOP<http://emop.tamu.edu/>)
>>> has created a new tool, called Franken+, that provides a way to create font
>>> training for the Tesseract OCR engine using page images. This is in
>>> contrast to Tesseract's documented 
>>> method<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>of 
>>> font training which involves using a word processing program with a
>>> modern font. Franken+ has now been released for beta testing and we invite
>>> anyone who's interested to give it a try and to please provide feedback.
>>>
>>> Franken+ works in conjunction with PRImA's open source Aletheia 
>>> tool<http://www.primaresearch.org/tools.php>and allows users to easily and 
>>> quickly identify one or more idealized forms
>>> of each glyph found on a set of page images. These identified forms are
>>> then used to generate a set of Franken-page images matching the page
>>> characteristics documented in Tesseract's training instructions, but with a
>>> font used in an actual early modern printed document. Franken+ allows you
>>> to create Tesseract box files, but will also guide you through the entire
>>> Tesseract training process, producing a .traneddata file, and even allow
>>> you to identify and OCR documents using that training. In addition,
>>> Franken+ makes it easy to combine training from multiple fonts into one
>>> training set.
>>>
>>> For eMOP we are using Franken+ to create training for Tesseract from
>>> page images of early modern printed works, but we also think it can be used
>>> just as effectively to train Tesseract using images of any kind of font
>>> that's not readily available via a word processor. For example, I've seen
>>> posts in this group about wanting to train Tesseract to read the signs on
>>> the front of buses.
>>>
>>> You can find out more about Franken+ at http://emop.tamu.edu/node/54and
>>> http://dh-emopweb.tamu.edu/Franken+/. The code is also available open
>>> source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B.
>>>
>>> Thanks,
>>> Matt Christy
>>>
>>  --
>> --
>> You received this message because you are subscribed to the Google
>> Groups "tesseract-ocr" group.
>> To post to this group, send email to [email protected]
>> To unsubscribe from this group, send email to
>> [email protected]
>> For more options, visit this group at
>> http://groups.google.com/group/tesseract-ocr?hl=en
>>
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to [email protected].
>> For more options, visit https://groups.google.com/groups/opt_out.
>>
>
>  --
> --
> You received this message because you are subscribed to the Google
> Groups "tesseract-ocr" group.
> To post to this group, send email to [email protected]
> To unsubscribe from this group, send email to
> [email protected]
> For more options, visit this group at
> http://groups.google.com/group/tesseract-ocr?hl=en
>
> ---
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> For more options, visit https://groups.google.com/groups/opt_out.
>

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Re: Franken+ Released -- New Tool For Training Tesseract on Fonts from Page Images

Reply via email to