Re: Enchancing "half-toned" image for tesseract processing

Zdenko Podobný Sat, 31 Mar 2012 11:24:03 -0700

Dn(a 31.03.2012 15:59, klo  wrote / napísal(a):
>
> I have a scanned PDF material to which I want to add hidden text layer, so 
> I could index the document. I used ghostscript black and white tiff output 
> device (tiffg4) to extract pages as tiff images, and here is example of 
> what they look like:
>
> <http://i.imgur.com/5sZSl.png>
>
> Processing this image with tesseract, does not give good results.
> Changing ghostscript output DPI (600, 300, 150, 96) shows that image at 96 
> DPI gives best result from tesseract but it's still not satisfactory.
>
> I then used 8-bit gray tiff output from ghostscript, instead 1-bit black 
> and white, and in this case at 150 DPI I got even better result then 
> previously with 96 DPI black and white. However still not there yet.
>
> Can someone suggest which filter could enhance this image so that I get 
> better results? I could use imagemagick, but also can use general imaging 
> filter from program language, so just name it if you know how.
>
>
> TIA
>
It is a difficult to suggest you the best strategy if you do not provide
input (pdf) and exact command how you run conversion. There are several
way/tools how to convert pdf to image [1],[2]...


[1]
http://virtualvoid.posterous.com/pdf-to-image-conversion-comparing-pdf-rendere
[2]
http://stackoverflow.com/questions/75500/best-way-to-convert-pdf-files-to-tiff-files#221341

-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to tesseract-ocr@googlegroups.com
To unsubscribe from this group, send email to
tesseract-ocr+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

Re: Enchancing "half-toned" image for tesseract processing

Reply via email to