Matthew, I had tried registering for Aletheia a few months ago. No response so far. Shree
Shree Devi Kumar ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sat, Dec 7, 2013 at 2:57 AM, matthew christy <[email protected]>wrote: > Hi Janusz, > > You're right, Aletheia is not open-source. My mistake on a poor choice of > words. However, it is free to use after registering, which is also free. > The only restriction that I'm sure about on it's use is in a commercial > product. I'll see if I can get a comment on that from someone at PRImA. > > Thanks, > Matt > > > On Friday, December 6, 2013 2:10:56 PM UTC-6, matthew christy wrote: >> >> Hi All, >> >> The Initiative for Digital Humanities, Media, and Culture (IDHMC) at >> Texas A&M University, as part of its Early Modern OCR Project >> (eMOP<http://emop.tamu.edu/>) >> has created a new tool, called Franken+, that provides a way to create font >> training for the Tesseract OCR engine using page images. This is in >> contrast to Tesseract's documented >> method<http://code.google.com/p/tesseract-ocr/wiki/TrainingTesseract3>of >> font training which involves using a word processing program with a >> modern font. Franken+ has now been released for beta testing and we invite >> anyone who's interested to give it a try and to please provide feedback. >> >> Franken+ works in conjunction with PRImA's open source Aletheia >> tool<http://www.primaresearch.org/tools.php>and allows users to easily and >> quickly identify one or more idealized forms >> of each glyph found on a set of page images. These identified forms are >> then used to generate a set of Franken-page images matching the page >> characteristics documented in Tesseract's training instructions, but with a >> font used in an actual early modern printed document. Franken+ allows you >> to create Tesseract box files, but will also guide you through the entire >> Tesseract training process, producing a .traneddata file, and even allow >> you to identify and OCR documents using that training. In addition, >> Franken+ makes it easy to combine training from multiple fonts into one >> training set. >> >> For eMOP we are using Franken+ to create training for Tesseract from page >> images of early modern printed works, but we also think it can be used just >> as effectively to train Tesseract using images of any kind of font that's >> not readily available via a word processor. For example, I've seen posts in >> this group about wanting to train Tesseract to read the signs on the front >> of buses. >> >> You can find out more about Franken+ at http://emop.tamu.edu/node/54 and >> http://dh-emopweb.tamu.edu/Franken+/. The code is also available open >> source at https://github.com/idhmc-tamu/eMOP/tree/master/Franken%2B. >> >> Thanks, >> Matt Christy >> > -- > -- > You received this message because you are subscribed to the Google > Groups "tesseract-ocr" group. > To post to this group, send email to [email protected] > To unsubscribe from this group, send email to > [email protected] > For more options, visit this group at > http://groups.google.com/group/tesseract-ocr?hl=en > > --- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/groups/opt_out. > -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

