On Mon, 2010-12-06 at 16:55 +0000, Barry Drake wrote: > On Mon, 2010-12-06 at 16:07 +0000, Simon Greenwood wrote: > > > I had a need to do some OCR recently and came across a project called > > tesseract-ocr: http://code.google.com/p/tesseract-ocr/. It's based on > > HP code that dates from the mid-90s. I've only used it to extract text > > from existing graphics but it seems to be very accurate. > > You're right - it is accurate - and it works with the neat gui frontend > that Danté mentioned - gscan2pdf. Makes a fantastic combination that's > amazingly easy to use. Tesseract and gscan2pdf really ought to get into > the normal Ubuntu release .... or at least be well promoted in the > 'Software Centre' and Synaptic so they are easy to find. The only one > that's really easy to find is gocr, and so far I'm not that impressed.
OCRFeeder is another option: it is in the Ubuntu repo, uses Tesseract as a default back-end and can be installed from the software centre. I haven't used it extensively so I have no idea how it compares to gscan2pdf. Cheers, Bruno -- [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-uk https://wiki.ubuntu.com/UKTeam/
