[tesseract-ocr] Hindi language version not working. VietOCR.NET-4.5_64

2018-03-01 Thread Sohan Shekhawat
Hello, As per the document about "How to OCR Hindi text using VietOCR", I followed all the steps like : 1. Added *hin.traineddata* file in C:\Program Files\VietOCR.NET\tessdata folder. 2. added *hi_IN.dic* file in C:\Program Files\VietOCR.NET\dict folder After that, I see the Hindi language o

[tesseract-ocr] Re: hOCR bbox viewer?

2018-03-01 Thread weishupeng
Thank you for sharing! On Thursday, October 13, 2016 at 3:59:28 PM UTC+2, Zeth Weissman wrote: > > Better late than never, but found this tool that will do what you want. > > http://www.primaresearch.org/tools/PAGEViewer > > You just need to rename your hocr or html file (depending on version of

Re: [tesseract-ocr] Hindi language version not working. VietOCR.NET-4.5_64

2018-03-01 Thread ShreeDevi Kumar
That document is for an old version of tesseract. Please use vietocr version which supports tesseract 4.00alpha. Download traineddata files for 4.00alpha from tessdata_fast You can try OCR with both hin and Devanagari traineddata files. On 01-Mar-2018 3:23 PM, "Sohan Shekhawat" wrote: > Hello

[tesseract-ocr] Re: Hindi language version not working. VietOCR.NET-4.5_64

2018-03-01 Thread shree
Use latest version vietocr https://sourceforge.net/projects/vietocr/files/vietocr/5.0alpha/ or https://sourceforge.net/projects/vietocr/files/vietocr.net/5.0alpha/ Use vietocr to download the traineddata from https://github.com/tesseract-ocr/tessdata_fast On Thursday, March 1, 2018 at 3:23:4

[tesseract-ocr] Re: bash script to help finetune training for Korean

2018-03-01 Thread 이경준
Thank U I really really appreicate for your kindness. Thank U 2018년 3월 1일 목요일 오후 4시 37분 51초 UTC+9, shree 님의 말: > > The log file sent earlier was only for training steps. > > Complete log file which shows output on console during building of > training data using tesstrain.sh is attached now.

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread 이경준
No. I'm really Sorry about complaining about tesseract(4.0) I mean that tesseract is great , but is not perfect(100%) I think that Tesseract is fairy good. But, I have a clue about customizing and using Tesseract(4.0) rightfully Thank U. At first I know the trainneddata is 3 types tessdata

[tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread 이경준
And additonal question combine_tessdata -u kor.traineddata What is that "-u" what is that meaning ?? I can not find that option(flag) .. wiki - github page Could you give me a explanation 2018년 2월 28일 수요일 오후 4시 21분 17초 UTC+9, 이경준 님의 말: > > Hi I'm studying this passage. But I cannot understa

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
> combine_tessdata -u kor.traineddata What is that meaning ? Could you explain for me ? That command will show and unpack the components of your traineddata file. eg. from tesdata_fast combine_tessdata -u ./tessdata_fast/kor.traineddata ./tessdata_fast/kor. Extracting tessdata components from ./

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
> I would to make a customized and trainned "New trainneddata" OK. But training from scratch takes a lot of time. I assume that you want to finetune. Please note that the traineddata files in tessdata and tessdata_best and tessdata_fast are NOT compatible. So, it depends on what version of tess

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread 이경준
Oh. I know ㅜㅜㅜ Thank u I was really impressd by U OK. Thank you very much Last question ... I can not understand .. trainned data type Your saying means that in the tesseract 4.0 / tessdata_best is better than tessdata // ㅜㅜㅜ what is the tessdata_fast ㅜㅜ Fast integer versio

Re: [tesseract-ocr] Re: I'm reading Using tesstrain (tesseract 4.0) wiki passage _ I have a question

2018-03-01 Thread ShreeDevi Kumar
Tesseract 4.00 alpha has two OCR engines. One is the legacy tesseract engine which was used in 3.0x and the other is neural net based LSTM engine available in 4.00alpha - master branch in github. the traineddata files in tesseract-ocr/tessdata have language models compatible with both of these. If

[tesseract-ocr] What is difference between unicharset and lstm-unicharset ?

2018-03-01 Thread 이경준
Hi . Thank you for seeing my questions 1. What is difference between 'unicharset' and 'lstm-unicharset' ? I know to make 'unicharset' by command line : "$ tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num).box But I don't know to make 'lstm-unicharset' ??? cf) .tr -> .ls

[tesseract-ocr] What is difference between "unicharset file" and "lstm-unicharset file"

2018-03-01 Thread 이경준
Hi . Thank you for seeing my questions 1. What is difference between 'unicharset' and 'lstm-unicharset' ? I know to make 'unicharset' by command line : "$ tesseract (lang).(filename).exp(num).tif (lang).(filename).exp(num).box But I don't know to make 'lstm-unicharset' ??? cf) .tr -> .ls

Re: [tesseract-ocr] What is difference between "unicharset file" and "lstm-unicharset file"

2018-03-01 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc#components ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Fri, Mar 2, 2018 at 6:22 AM, 이경준 wrote: > > Hi . Thank you for