On Mon, 2010-12-06 at 16:07 +0000, Simon Greenwood wrote: > I had a need to do some OCR recently and came across a project called > tesseract-ocr: http://code.google.com/p/tesseract-ocr/. It's based on > HP code that dates from the mid-90s. I've only used it to extract text > from existing graphics but it seems to be very accurate.
You're right - it is accurate - and it works with the neat gui frontend that Danté mentioned - gscan2pdf. Makes a fantastic combination that's amazingly easy to use. Tesseract and gscan2pdf really ought to get into the normal Ubuntu release .... or at least be well promoted in the 'Software Centre' and Synaptic so they are easy to find. The only one that's really easy to find is gocr, and so far I'm not that impressed. Thank you both. This will save me a lot of time in the future. It will also save me having to say to my daughter or my sister 'Well, I've got this Windows program ..... " Barry Drake. -- Sent from my desktop using Ubuntu - the window-free environment that gives me real fresh air. -- [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-uk https://wiki.ubuntu.com/UKTeam/
