Thank you very much for the link. Can we use non-unicode fonts as well? I have attached a sinhala font that I'm struggling to train.
Thank you very much On Thu, Oct 6, 2022 at 11:10 AM Saman Kurdi <saman.uk...@gmail.com> wrote: > Hello, > > This might help. > > https://www.mdpi.com/2076-3417/11/20/9752 > > Refards. > > On Thu, Oct 6, 2022 at 07:37 Umanda Dikwatta <abey.u...@gmail.com> wrote: > >> Hello, >> >> I've been using Tesseract 4.1 for some time. I am using Tesseract with >> Sinhala language. I got good results for most of the images I tried. I >> trained Tesseract with different fonts. But as the documentation says, I >> had to preprocess my images to obtain good results. >> >> Then I tried Tesseract 5 with line images as .tif and the labels as >> .gt.txt. Then I used the generated .traineddata file to extract the text. >> But that didn't give me good results. I used image processing segmentation >> to obtain line images. Is it wrong to obtain line images using python >> segmentation? >> >> Could someone please explain me the possible reason? >> >> Thank you very much >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-ocr+unsubscr...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/40a95c6f-b459-4937-930f-1eb103bc4f82n%40googlegroups.com >> <https://groups.google.com/d/msgid/tesseract-ocr/40a95c6f-b459-4937-930f-1eb103bc4f82n%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/CAH4VOMLc9f9choNcjUkJVNSt%3DHJazzxBNb-MfDeLvwVUqDMO7Q%40mail.gmail.com > <https://groups.google.com/d/msgid/tesseract-ocr/CAH4VOMLc9f9choNcjUkJVNSt%3DHJazzxBNb-MfDeLvwVUqDMO7Q%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAFGR8aAuxrn-XtN3b_PUvjPFKZRAubRBG6Y%2Bwm3jExY5UL0m6Q%40mail.gmail.com.
apex_a.pura-042.ttf
Description: Binary data