[tesseract-ocr] Fine tuning tha.traineddata with character that is not in original unichaset file

2022-09-19 Thread Unnop Paripunnang
I would like to fine tuning tesseract traineddata with Thai language (tha). But unfortunately, after extract original tha.traineddata from official tesseract tessdata-best. I've found that there is some character missing in tha.unicharset, e.g. Thai number ๐ ๑ ๒ ๓ ๕ (0 1 2 3 5) is appear in tha

[tesseract-ocr] Can we add background when generate dataset from text2image?

2022-10-12 Thread Unnop Paripunnang
Hi Everyone I'm very new here. I just curious that is there any shortest way to modify text2image.cpp to add background on generated .tiff file instead of just white background? I have an idea that we may also can add background to tiff file later by any image processing program before we combin