I have a scanned PDF material to which I want to add hidden text layer, so 
I could index the document. I used ghostscript black and white tiff output 
device (tiffg4) to extract pages as tiff images, and here is example of 
what they look like:
<http://i.imgur.com/5sZSl.png>

Processing this image with tesseract, does not give good results.
Changing ghostscript output DPI (600, 300, 150, 96) shows that image at 96 
DPI gives best result from tesseract but it's still not satisfactory.

I then used 8-bit gray tiff output from ghostscript, instead 1-bit black 
and white, and in this case at 150 DPI I got even better result then 
previously with 96 DPI black and white. However still not there yet.

Can someone suggest which filter could enhance this image so that I get 
better results? I could use imagemagick, but also can use general imaging 
filter from program language, so just name it if you know how.


TIA

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Reply via email to