Hi Janusz, You're right, Aletheia is not open-source. My mistake on a poor choice of words. However, it is free to use after registering, which is also free. The only restriction that I'm sure about on it's use is in a commercial product. I'll see if I can get a comment on that from someone at PRImA.
Thanks, Matt On Friday, December 6, 2013 2:10:56 PM UTC-6, matthew christy wrote: > > Hi All, > > The Initiative for Digital Humanities, Media, and Culture (IDHMC) at Texas > A&M University, as part of its Early Modern OCR Project > (eMOP<http://emop.tamu.edu/>) > has created a new tool, called Franken+, that provides a way to create font > training for the Tesseract OCR engine using page images. This is in > contrast to Tesseract's documented > method<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>of font > training which involves using a word processing program with a > modern font. Franken+ has now been released for beta testing and we invite > anyone who's interested to give it a try and to please provide feedback. > > Franken+ works in conjunction with PRImA's open source Aletheia > tool<http://www.primaresearch.org/tools.php>and allows users to easily and > quickly identify one or more idealized forms > of each glyph found on a set of page images. These identified forms are > then used to generate a set of Franken-page images matching the page > characteristics documented in Tesseract's training instructions, but with a > font used in an actual early modern printed document. Franken+ allows you > to create Tesseract box files, but will also guide you through the entire > Tesseract training process, producing a .traneddata file, and even allow > you to identify and OCR documents using that training. In addition, > Franken+ makes it easy to combine training from multiple fonts into one > training set. > > For eMOP we are using Franken+ to create training for Tesseract from page > images of early modern printed works, but we also think it can be used just > as effectively to train Tesseract using images of any kind of font that's > not readily available via a word processor. For example, I've seen posts in > this group about wanting to train Tesseract to read the signs on the front > of buses. > > You can find out more about Franken+ at http://emop.tamu.edu/node/54 and > http://dh-emopweb.tamu.edu/Franken+/. The code is also available open > source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B. > > Thanks, > Matt Christy > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

