Thank you, I will take a look at looking for tabular data ocr.

On Thursday, January 14, 2021 at 12:24:15 AM UTC+9 tfmo...@gmail.com wrote:

> I suspect your problem is more to do with the tabular format and the lines 
> than the fact that it's Korean or the image quality. You might want to 
> search the archive for other threads discussing handling tabular data 
> and/or line removal. There's a Leptonica tutorial on line removal (
> http://www.leptonica.org/line-removal.html), but table OCR a little 
> specialized.
>
> Tom
>
> On Wednesday, January 13, 2021 at 8:12:58 AM UTC-5 Glenn wrote:
>
>> Hello, I am currently working on this Korean dataset and was having some 
>> issues on getting the values all correctly. A few problems are the pictures 
>> being slightly wonky as well as it being in Korean.
>>
>> [image: ApplicationFrameHost_bxb8Ck9yTh.png]
>>
>> I cropped the data as well as made it greyscale to attempt to better the 
>> image, but it still looks slightly blurry. I'm not sure if this is the best 
>> way and can crop out to a larger image.
>>
>> The current problem is that the performance is not very good. The default 
>> settings gives me a jumble. Although I found that psm 4 is the best, it 
>> still does not look very good and it seems like tesseract just breaks 
>> halfway through.
>> [image: Code_I1PxTycm88.png]
>> How can I improve this? I was thinking of cutting the data into slices to 
>> read each, but still I am not sure if I can fix this. Is the image quality 
>> just not good enough?
>>
>> Thank you
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/5eac69b0-87d8-40d8-9dfa-10e978b67cfen%40googlegroups.com.

Reply via email to