Re: [tesseract-ocr] tesseract data files

2018-03-04 Thread Simon Eigeldinger
Hm. I guess i just ship all 3 of them. *lol* and add the text of the wiki to the readme. Greetings, Simon Am 04.03.2018 um 18:43 schrieb ShreeDevi Kumar: The traineddata files in tessdata_best are larger in size and OCR takes more time. They are supposedly slightly more accurate, but there are

Re: [tesseract-ocr] tesseract data files

2018-03-04 Thread ShreeDevi Kumar
The traineddata files in tessdata_best are larger in size and OCR takes more time. They are supposedly slightly more accurate, but there are no definitive results provided by Ray. tessdata_fast is what has been shipped for Debian and Ubuntu, so that seems the way to go for doing OCR. These however

Re: [tesseract-ocr] tesseract data files

2018-03-04 Thread Simon Eigeldinger
Hi ShreeDevi, I have scraped the cygwin builds. i am using now the builds i get from the appveyor builds which just needs me to repackage the resulting stuff. so tessdata_best isn't like the wiki says for better accuracy? greetings, Simon Am 03.03.2018 um 05:12 schrieb ShreeDevi Kumar: Hi S

Re: [tesseract-ocr] tesseract data files

2018-03-02 Thread ShreeDevi Kumar
> tessdata repo supports both --oem 0 and --oem 1, but the files are older and may NOT be fully compatible with current code. The results may vary depending on language and oem used. I have NOT tested this much, since newer traineddata give better accuracy for Indian languages. ShreeDevi ___

Re: [tesseract-ocr] tesseract data files

2018-03-02 Thread ShreeDevi Kumar
Hi Simon, If you are planning to package using 4.00alpha from master branch, please use traineddata files from tessdata_fast. These are the files that have been shipped for Ubuntu 18.04 and included in Debian. See https://github.com/tesseract-ocr/tesseract/wiki for some links. You can update the

[tesseract-ocr] tesseract data files

2018-03-02 Thread Simon Eigeldinger
Hi all, Just looked at the git commits for tesseract and read that there has been changes to the OCR modes. are the 3 tessdata sets still valid? tessdata_fast and tessdata_best have been updated so i guess those reflect the latest developments but tessdata hasn't an update since september. i