[tesseract-ocr] Training Tesseract for new fonts

Umanda Dikwatta Wed, 05 Oct 2022 22:37:33 -0700

Hello,

I've been using Tesseract 4.1 for some time. I am using Tesseract with 
Sinhala language. I got good results for most of the images I tried. I 
trained Tesseract with different fonts. But as the documentation says, I 
had to preprocess my images to obtain good results.


Then I tried Tesseract 5 with line images as .tif and the labels as 
.gt.txt. Then I used the generated .traineddata file to extract the text. 
But that didn't give me good results. I used image processing segmentation 
to obtain line images. Is it wrong to obtain line images using python 
segmentation? 

Could someone please explain me the possible reason?

Thank you very much

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/40a95c6f-b459-4937-930f-1eb103bc4f82n%40googlegroups.com.

[tesseract-ocr] Training Tesseract for new fonts

Reply via email to