thanks for your reply, Lorenzo I will test more samples to see if it only happens with holes. if so, probably just do a morph hole filling before ocr as workaround for now.
btw, I'm using version 3.x. Is there a chance 4.x handles this issue better? Lorenzo Blz於 2018年6月22日星期五 UTC+8下午4時06分01秒寫道: > > > I'd try to upscale the images so that one letter is about 40/50 pixels > tall and see if that helps. > I'd also try a morphological open/erode operation (or a blur/resharpen) to > simply fill the holes. > > I do not know if there are any special parameters for this kind of > problems (that I've encountered too). > > In general, adding noise to training data make the model more robust. You > may use custom code or something like imgaug > <https://github.com/aleju/imgaug> to generate random variations with > random white spots and other corruptions. > > > Bye > > Lorenzo > >> >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7d2fbf82-b3f7-4dfb-87f2-0e1ec85fcd75%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.