Re: Enchancing "half-toned" image for tesseract processing

2012-03-31 Thread klo
On Saturday, March 31, 2012 8:01:22 PM UTC+2, piggy wrote: > > I recommend a blur filter. I generally use the gimp motion blur. I do two > blurs perpendicular to each other along the axes of the half-tonescreen > with a magnitude equal to the screen dot size. > I haven't used gimp for batch proc

Re: How to instruct tesseract not to use ligatures (i.e. don't use fi, fl... instead fi, fl...)

2012-03-31 Thread Zdenko Podobný
Dňa 31.03.2012 16:17, klo wrote / napísal(a): > In my simple testing, I find this most common problem, is there a way to > instruct tesseract not to use those glyphs without limiting it to ASCII? > > I use tesseract 3.01 BTW > put them to blacklist with variable tessedit_char_blacklist (search f

Re: Enchancing "half-toned" image for tesseract processing

2012-03-31 Thread Zdenko Podobný
Dn(a 31.03.2012 15:59, klo wrote / napísal(a): > > I have a scanned PDF material to which I want to add hidden text layer, so > I could index the document. I used ghostscript black and white tiff output > device (tiffg4) to extract pages as tiff images, and here is example of > what they look

Re: Enchancing "half-toned" image for tesseract processing

2012-03-31 Thread La Monte H. P. Yarroll
I recommend a blur filter. I generally use the gimp motion blur. I do two blurs perpendicular to each other along the axes of the half-tonescreen with a magnitude equal to the screen dot size. On Sat, Mar 31, 2012 at 9:59 AM, klo wrote: > I have a scanned PDF material to which I want to add hidd

Enchancing "half-toned" image for tesseract processing

2012-03-31 Thread klo
I have a scanned PDF material to which I want to add hidden text layer, so I could index the document. I used ghostscript black and white tiff output device (tiffg4) to extract pages as tiff images, and here is example of what they look like: Processing this im

How to instruct tesseract not to use ligatures (i.e. don't use fi, fl... instead fi, fl...)

2012-03-31 Thread klo
In my simple testing, I find this most common problem, is there a way to instruct tesseract not to use those glyphs without limiting it to ASCII? I use tesseract 3.01 BTW -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, s