I suspect your problem is more to do with the tabular format and the lines than the fact that it's Korean or the image quality. You might want to search the archive for other threads discussing handling tabular data and/or line removal. There's a Leptonica tutorial on line removal (http://www.leptonica.org/line-removal.html), but table OCR a little specialized.
Tom On Wednesday, January 13, 2021 at 8:12:58 AM UTC-5 Glenn wrote: > Hello, I am currently working on this Korean dataset and was having some > issues on getting the values all correctly. A few problems are the pictures > being slightly wonky as well as it being in Korean. > > [image: ApplicationFrameHost_bxb8Ck9yTh.png] > > I cropped the data as well as made it greyscale to attempt to better the > image, but it still looks slightly blurry. I'm not sure if this is the best > way and can crop out to a larger image. > > The current problem is that the performance is not very good. The default > settings gives me a jumble. Although I found that psm 4 is the best, it > still does not look very good and it seems like tesseract just breaks > halfway through. > [image: Code_I1PxTycm88.png] > How can I improve this? I was thinking of cutting the data into slices to > read each, but still I am not sure if I can fix this. Is the image quality > just not good enough? > > Thank you > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/7cc9a31d-9632-4fe9-89ba-918ee34269fen%40googlegroups.com.