Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-23 Thread Shree Devi Kumar
https://github.com/tesseract-ocr/tesseract/pull/2294 by @bertsky adds the whitelist/blacklist functionality for Tesseract4. It has not been merged yet. On Sat, Mar 23, 2019 at 2:58 PM Lorenzo Bolzani wrote: > Il giorno mar 19 mar 2019 alle ore 06:03 Jonathan Muller < > jmul...@pukogames.com> ha

Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-23 Thread Lorenzo Bolzani
Il giorno mar 19 mar 2019 alle ore 06:03 Jonathan Muller < jmul...@pukogames.com> ha scritto: > 5 - Create a whitelist based on the zone of probable characters (this one > improves accuracy a lot !) > Ho do you do whitelisting with tesseract 4.x? As far as I know is not yet supported. I do the

Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-19 Thread gl00637
Thank you for your response, my experience with OCR is limited to the conversion of screenshots I may take online, yours far more extensive I think. And thank you particularly for items 2 and 5, slight skewing of the image may better account for the distortions in size and or aspect ratio that

Re: [tesseract-ocr] General strategies for dealing with problem images

2019-03-18 Thread Jonathan Muller
I don't really agree with your statement. There is a lot of things we had to consider with image processing before tesseract finally gave us accurate results. But it all makes sense. Here is our actual pipeline: 1 - Cleanup the image: remove any artifact of the camera or scan device, cut the pape

[tesseract-ocr] General strategies for dealing with problem images

2019-03-18 Thread gl00637
I would like some advice concerning the general use of tesseract, because my experience with it tends to two extremes: either tesseract performs flawlessly, with no prior modification of the image necessary except cropping to the text and (most significant) enlarging the image by a factor of 2