On 14 Nov 2001, Andras BALI wrote: > On Wed, Nov 14, 2001 at 09:34:24AM -0800, Jeffrey W. Baker wrote: > > > > I've just received a grant for a project that will involve scanning and > > > storing a substantial number (e.g., around 3000) of short documents. These > > > documents will be analyzed as text, which means I'll have to use OCR > > > software as well as a scanner with an automatic document feed. > > [...] > > > There is an OCR package from Mentalix called Pixel!FX. It supports only > > SCSI scanners, and I believe it is very expensive. > > Before spending lots of money, you may want to check `gocr' (apt-get > install gocr) if it matches your needs. I found it suitable enough for > scanning short (1-2p.) documents (and I assume it'd do the job for > longer ones as well) and since it has a console interface, its usage > can be easily automated (and even customized, thanks to the libgocr > library). >
Thanks for the suggestion of gocr; first impressions are very good, especially if the file is converted with djpeg first. Anthony -- Anthony Campbell - running Linux GNU/Debian (Windows-free zone) For an electronic book (The Assassins of Alamut), skeptical essays, and over 140 book reviews, go to: http://www.acampbell.org.uk/ Our planet is a lonely speck in the great enveloping cosmic dark. In our obscurity, in all this vastness, there is no hint that help will come from elsewhere to save us from ourselves. [Carl Sagan]