thanks for your reply, Lorenzo
I will test more samples to see if it only happens with holes.
if so, probably just do a morph hole filling before ocr as workaround for 
now.

btw, I'm using version 3.x. Is there a chance 4.x handles this issue better?

Lorenzo Blz於 2018年6月22日星期五 UTC+8下午4時06分01秒寫道:
>
>
> I'd try to upscale the images so that one letter is about 40/50 pixels 
> tall and see if that helps.
> I'd also try a morphological open/erode operation (or a blur/resharpen) to 
> simply fill the holes.
>
> I do not know if there are any special parameters for this kind of 
> problems (that I've encountered too).
>
> In general, adding noise to training data make the model more robust. You 
> may use custom code or something like imgaug 
> <https://github.com/aleju/imgaug> to generate random variations with 
> random white spots and other corruptions.
>
>
> Bye
>
> Lorenzo
>
>>
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/7d2fbf82-b3f7-4dfb-87f2-0e1ec85fcd75%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to