Hi,
Recently there was a post mentioning tesseract.
Turns out that is an award winning opensource OCR that works!
I tried it out:
1. apt-get install tesseract-ocr
2. apt-get install tesseract-ocr-eng
3. use xsane to scan a page at dpi 300 and save as .tif
4. run: convert foo.tif -depth 8 foo1.tif
5. doit: tesseract foo1.tif foo2 -l eng
And voilá! There is foo2.txt with the text.
This is a page that I scanned:
http://www.scribd.com/doc/9267859/p13x1
This is the result:
http://www.scribd.com/doc/9269769/p13
The only errors where some punctuation marks.
{2} tesseract comes by default with the German dic.
[3] don't scan at less than 300 dpi
[4] the result form xsane is depth 16 which tesseract can't handle so
you have to convert the result to depth 8.
Hugo
--
To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org