Re: [tesseract-ocr] Diacriticals Training

2020-11-11 Thread shreyansh dwivedi
Hello shree, Than, what is the way to train the sanskrit along with roman diacritical and achieve accuracy too or the alternative ways to do achieve this ? Regards, On Thu, Nov 5, 2020 at 8:15 PM Shree Devi Kumar wrote: > Legacy engine training won't work for Devanagari. The cube engine which >

[tesseract-ocr] Choosing background when generating output using PDF config.

2020-11-11 Thread Jonas Winkler
Hello. I've got some input document input.pdf. This comes straight from a scanner and thus I do some preprocessing to improve accuracy (i.e., unpaper, black/white, increased contrast), which yields preprocessed.png. When using the command tesseract preprocessed.png output pdf I receive a docu

[tesseract-ocr] Error: Assert failed: in file tessdatamanager.cpp

2020-11-11 Thread Teem
when executing the command "combine_tessdata -e tesseract / tessdata / eng.traineddata eng.lstm" I get the error "tesseract :: TessdataManager :: TessdataTypeFromFileName (filename, & type): Error: Assert failed: in file tessdatamanager.cpp, line 297 Illegal instruction (core dumped) " What coul

Re: [tesseract-ocr] Low tesseract accuracy

2020-11-11 Thread Shree Devi Kumar
Suggest you pre-process images instead of training. See https://tesseract-ocr.github.io/tessdoc/ImproveQuality.html On Tue, Nov 10, 2020 at 12:14 PM Dinesh Yakkanti wrote: > Hello Everyone, >I am trying to build custom tesseract-ocr model. I am getting > high error rate even if i kep