Re: [tesseract-ocr] recognising roman with sanskrit diacritics

2018-06-30 Thread Shree Devi Kumar
I have uploaded a new version of traineddata file at https://github.com/Shreeshrii/tessdata_shreetest/blob/master/iast-layer-18003.traineddata Attached is the OCRed output for pages 13-24 of dark pdf with it. I am still training a different variation. On Wed, Jun 27, 2018 at 6:46 PM Shree Devi

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-06-30 Thread Shree Devi Kumar
Also check that there is no tab or other unprintable character in your training text. Which version of tesseract are you using? show output of tesseract -v On Sat, Jun 30, 2018 at 8:04 PM Shree Devi Kumar wrote: > Then there must be a mismatch between the unicharset you are using and the > t

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-06-30 Thread Shree Devi Kumar
Then there must be a mismatch between the unicharset you are using and the training text. eg. check whether the copyright symbol is in your unicharset. On Sat, Jun 30, 2018 at 4:48 PM john wrote: > I saw that link. this error occured many times,how can i prevent that? > > On Saturday, June 30, 2

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-06-30 Thread john
I saw that link. this error occured many times,how can i prevent that? On Saturday, June 30, 2018 at 3:17:26 PM UTC+4:30, shree wrote: > > see > https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#error-messages-from-training > > On Sat, Jun 30, 2018 at 3:23 PM john > > wrote:

Re: [tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-06-30 Thread Shree Devi Kumar
see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#error-messages-from-training On Sat, Jun 30, 2018 at 3:23 PM john wrote: > Encoding of string failed! Failure bytes: ffc2 ffa9 20 ffd8 > ffa8 ffd8 ffa7 ffd8 ffae ffd8 ffaa ffd9

[tesseract-ocr] Encoding of string failed when finetune fot adding new fonts is fas language

2018-06-30 Thread john
Encoding of string failed! Failure bytes: ffc2 ffa9 20 ffd8 ffa8 ffd8 ffa7 ffd8 ffae ffd8 ffaa ffd9 ff86 ffd8 ffa7 20 ffd9 ff84 ffd8 ffa7 ffd8 ffa4 ffd8 ffb3 20 ffdb ff8c ffd9 ff86 ffd8 f

Re: [tesseract-ocr] wron Characters in LibreOffice Writer with German spezial Characters

2018-06-30 Thread Martin Jenniges
Hello, thank you for your answer. I have found the answer in LibreOffice: File open/filtered as txt- text encoding, then chose utf-8 See regard Martin Am 29.06.2018 um 19:45 schrieb Zdenko Podobny: this is not tesseract problem: https://ask.libreoffice.org/en/question/97993/why-doesnt-lo-w