On Wed, Jul 14, 2004 at 12:38:04AM -0400, Mark Roach wrote: > On Sat, 2004-07-10 at 01:14 +0200, martin f krafft wrote: > > also sprach William Ballard <[EMAIL PROTECTED]> [2004.07.10.0041 +0200]: > > > Search the archives for my and other's discussions about project > > > gutenbergs tests with gocr and other open source OCR programs. > > > > great pointer. I guess the conclusion here is that gocr and clara > > pretty much suck and for any serious work, I have to go with > > OmniPage or other commercial products. Damn. > > At my last employer, I used Ascent Capture (on windows) to scan images > and index them against a postgresql+debian server and used a wxPython > application I wrote to search and view them. We used indexing info > (date, names, etc.) instead of the text of the documents, but Ascent > Capture can do that too. Obviously there are non-free parts to that > solution, but that was the best I was able to come up with. If you'd > like some more info on that setup feel free to drop me a line off-list.
I scan every piece of paper with my name on it and every receipt even for bubble gum and shred the originals. All you need is a clever directory structure and some hacky little scripts. I scan most things at 150dpi as .png and produce 50% sized images for eyeballing. A script builds web pages with <img> tags of the 75dpi images, with an <a> link to the larger image when you click on it. It works good enough. I produce about 3GB of scans per year. The hardest part is shelling around a directory structure like /paper/d4/BigOldBank/40713/0{1,2,3}.png /paper/d4/BigOldBank/Slips/Cash/40713-McDonalds.png /paper/d4/PowerCompany/40614/... /paper/d4/E-Broker/31231~30101/01.png that's a lot of keystrokes when you're scanning. I wrote a little GUI app I plan to put on SourceForge that lets me pick the elements from lists and has a calendar to enter dates, then renames them. I find things like PaperPort constraining. I like my hacky-scripts, like a lot of Linux things they are a bit hacky but make you happy! -- To UNSUBSCRIBE, email to [EMAIL PROTECTED] with a subject of "unsubscribe". Trouble? Contact [EMAIL PROTECTED]