On Sat, Jul 21, 2007 at 08:10:27PM +0200, Bob Proulx wrote: > Rodolfo Medina wrote: > > Somewhere in the web I read that OCR software under Linux is very > > poor at the moment and that it's better to use MS Windows for that: > > unfortunately my test seems to confirm that. What do you Debian > > listers think? > > I think you should check out these articles. > > > http://google-code-updates.blogspot.com/2006/08/announcing-tesseract-ocr.html > > http://code.google.com/p/tesseract-ocr/ > > http://www.linux.com/articles/57222
hey, looks pretty good. The linux.com article complains about having to manually crop out photos and the limited file formats accepts (tiff only) but those are pretty minor. Its should be fairly simple to put wrappers around to clean up the and convert files format to get data into the thing without having to grok OCR code. IOW, I would expect to see this get used as a backend in various other existing graphics code bases to make OCR really viable in OSS. A
signature.asc
Description: Digital signature