Hello Trev, I'm using ocropus and tesseract in Knoppix with a very good detection rate. This is how it usually works (with ocropus version 0.3):
scanimage --mode lineart --resolution 300 | \ pnmflip -topbottom -leftright > scan.pnm (I use pnmflip because my scanner scans from bottom to top, thus producing an upside-down picture). ocroscript recognize --tesslanguage=eng scan.pnm | \ sed 's,</span>,</span><br/>,g' | \ elinks -dump-width 79 -no-connect -force-html -no-numbering \ -no-references -dump > scan.txt" (I use sed and elinks to produce a formatted plain text with correct linebreaks to reflect the original layout. Just lines larger than 80 chars are also split for convenience when using a 40-letters braille device for reading). If you get an empty page, you should try again with the page turned upside-down or turned 90/270 degrees to landscape. ocropus does not yet detect page orientation on its own. There are some ways to autodetect this by scripting, but all of them are slower than just retrying with the picture rendered with a different orientation. For most printed books, I get an error/misdetection rate below 2%, even for multicolumn texts and two-page scanning, which is IMHO pretty good. Regards -Klaus On Tue, Jan 19, 2010 at 07:35:27AM -0500, trev.saund...@gmail.com wrote: > Hi, > > Mario mentioned a while ago that he thought Ocropus was working well, > unfortunitely my experience is that it recognizes exactly nothing on a page > just sptting out a header and footer for a page without even an attempt at > recognition. I am using ocropus the packages in testing, can anyone provide > any advice? > > thanks! > Trev -- To UNSUBSCRIBE, email to debian-accessibility-requ...@lists.debian.org with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org