On 28 Dec 2008, andmalc wrote: > On Dec 28, 5:10 am, Anthony Campbell <a...@acampbell.org.uk> wrote: > > On 21 Dec 2008, Hugo Vanwoerkom wrote: > > > [snip] > > > Yes, tesseract does work well. Here, xsane gives depth 24, but conversion > > to depth 8 is neither possible nor necessary. Following the docs, I did > > There is an option at the top of the Preferences/Filetyple tab to save > in 8-bit, but glad to know this isn't needed. > > > export TESSDATA_PREFIX="/usr/share/tesseract-ocr/" > > > > There was no need for "- l eng" since I only had the English version of > > tesseract installed. So to scan a page saved at 300 dpi I just do: > > > > tesseract foo.dvi foo > > > > The result is excellent. I got pretty good results with ocrad but > > tesseract is definitely better. > > I got poor results on a plain text sample, and much better using gocr > with the same scan saved by xsane in pnm format. I see your input > file is a DVI. Is that format yield better results than TIFF? If so, > how did you convert to that from the formats that xsane will save to? > > Took me a while to figure out that tesseract will not read a TIFF if > its file extension is 'tiff' instead of 'tif'. Hadn't quite noticed > that in the previous poster's instructions. > >
Sorry, that was a stupid slip; I meant tiff. And yes, you are right, the termination has to be tif. I get v. poor results with gocr - unusable, in fact. But ocrad is better though not as good as tesseract. -- Anthony Campbell - a...@acampbell.org.uk Microsoft-free zone - Using Debian GNU/Linux http://www.acampbell.org.uk (blog, book reviews, and sceptical articles) -- To UNSUBSCRIBE, email to debian-user-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org