I have a scanned PDF material to which I want to add hidden text layer, so I could index the document. I used ghostscript black and white tiff output device (tiffg4) to extract pages as tiff images, and here is example of what they look like:
<http://i.imgur.com/5sZSl.png> Processing this image with tesseract, does not give good results. Changing ghostscript output DPI (600, 300, 150, 96) shows that image at 96 DPI gives best result from tesseract but it's still not satisfactory. I then used 8-bit gray tiff output from ghostscript, instead 1-bit black and white, and in this case at 150 DPI I got even better result then previously with 96 DPI black and white. However still not there yet. Can someone suggest which filter could enhance this image so that I get better results? I could use imagemagick, but also can use general imaging filter from program language, so just name it if you know how. TIA -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to tesseract-ocr@googlegroups.com To unsubscribe from this group, send email to tesseract-ocr+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en