Hello, I've been using Tesseract 4.1 for some time. I am using Tesseract with Sinhala language. I got good results for most of the images I tried. I trained Tesseract with different fonts. But as the documentation says, I had to preprocess my images to obtain good results.
Then I tried Tesseract 5 with line images as .tif and the labels as .gt.txt. Then I used the generated .traineddata file to extract the text. But that didn't give me good results. I used image processing segmentation to obtain line images. Is it wrong to obtain line images using python segmentation? Could someone please explain me the possible reason? Thank you very much -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/40a95c6f-b459-4937-930f-1eb103bc4f82n%40googlegroups.com.