Thank you, I will take a look at looking for tabular data ocr. On Thursday, January 14, 2021 at 12:24:15 AM UTC+9 tfmo...@gmail.com wrote:
> I suspect your problem is more to do with the tabular format and the lines > than the fact that it's Korean or the image quality. You might want to > search the archive for other threads discussing handling tabular data > and/or line removal. There's a Leptonica tutorial on line removal ( > http://www.leptonica.org/line-removal.html), but table OCR a little > specialized. > > Tom > > On Wednesday, January 13, 2021 at 8:12:58 AM UTC-5 Glenn wrote: > >> Hello, I am currently working on this Korean dataset and was having some >> issues on getting the values all correctly. A few problems are the pictures >> being slightly wonky as well as it being in Korean. >> >> [image: ApplicationFrameHost_bxb8Ck9yTh.png] >> >> I cropped the data as well as made it greyscale to attempt to better the >> image, but it still looks slightly blurry. I'm not sure if this is the best >> way and can crop out to a larger image. >> >> The current problem is that the performance is not very good. The default >> settings gives me a jumble. Although I found that psm 4 is the best, it >> still does not look very good and it seems like tesseract just breaks >> halfway through. >> [image: Code_I1PxTycm88.png] >> How can I improve this? I was thinking of cutting the data into slices to >> read each, but still I am not sure if I can fix this. Is the image quality >> just not good enough? >> >> Thank you >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/5eac69b0-87d8-40d8-9dfa-10e978b67cfen%40googlegroups.com.