I am trying to scan a Santali book with multiple character (Ol chiki script + English script + Odia script) with gImageReader 3.3.1 (17fa17) which uses Tesseract 4.1.0 but unable to get satisfactory results.
I have tried with English + Odia script are working fine they are giving very good result. But when I use Santali + Odia or English + Santali or Santali + Odia + English the output text becomes Odia, English or Odia and English respectively, instead of showing Ol chiki text in place. I have a file available for testing <https://www.dropbox.com/s/xwvin9bqkwc4zol/Santali-Odia-English.tiff?dl=0>. Also, by only using Santali tessdata it transliterate English and Odia words as Ol Chiki script. When I use "*sat.tessdata*" to scan a normal santali image, it worked well. Note: Ol chiki is the main writing script of Santali people approved by government of India. I think Ol Chiki is a new script not well supported by many software so the processed image text output always shows boxes, I solved this problem by coping it to the Notepad and saving. Exporting it to pdf is ok, I created editable text from it, no problem. I have created many OCR editable pdf with gImageReader. My question is how to get combined multiple language output in Santali, Odia and English. Also I want to know why the text output of image when processed giving output for English and Odia but not for Santali or vice versa. I have tried to train the language, it is taking a lot of time, I have little knowledge on coding. If their is any problem with sat.tessdata then i can take up with learning with Tesseract training. I have used tessdata of - Santali - https://github.com/indic-ocr/tessdata/tree/master/sat - Odia - https://github.com/indic-ocr/tessdata/tree/master/ori - English - default of gImageReader -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b92d12f0-95d4-4348-aedb-c2fe6b071f5d%40googlegroups.com.

