[tesseract-ocr] Re: Network overfitting processing

2017-09-17 Thread robertyoung0511
On the other side, the network contains the LSTM layers. Does the LSTM in the network train the word order? But I find that the word order in the trained_text file is chaotic. 在 2017年9月18日星期一 UTC+8下午2:30:33,roberty...@gmail.com写道: > > Hello, > > I am using the finetune training to train my mo

[tesseract-ocr] Subtitle Edit 3.5.3 Japanese

2017-09-17 Thread hs
Not much else to say, this message keeps getting spammed when I start OCR. All I've done was... Downloaded a file from here: https://github.com/tesseract-ocr/tessdata and placed it here: \SubtitleEdit-3.5.3\Tesseract\tessdata\jpn.traineddata Am I missing some other files or something, and whe

[tesseract-ocr] Network overfitting processing

2017-09-17 Thread robertyoung0511
Hello, I am using the finetune training to train my model for the chi_sim language with the network of [1,48,0,1 Ct3,3,16 Mp3,3 Lfys64 Lfx96 Lrx96 Lfx512 O1c1] After analyzing this network, I cannot find the any regularization operations in the layers, and there is only one convolution layer i

[tesseract-ocr] Re: ERROR: /tmp/tmp.8JcoYdZI17/chi_sim/chi_sim.unicharset does not exist or is not readable

2017-09-17 Thread shree
On Saturday, September 16, 2017 at 2:22:47 PM UTC+5:30, shree wrote: > > https://github.com/tesseract-ocr/tesseract/pull/1134/files > should fix it. > > >> Sorry, that is not the correct fix. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. T

Re: [tesseract-ocr] new tessdata repos on github

2017-09-17 Thread 'Simon Eigeldinger' via tesseract-ocr
Hi, Thanks for the info. Greetings, Simon Am 17.09.2017 um 19:16 schrieb ShreeDevi Kumar: Simon, There is a significant difference in speed. Depending on the language, the difference in accuracy may be minimal or more. You should compare both for a representative sample to see which is m

Re: [tesseract-ocr] new tessdata repos on github

2017-09-17 Thread ShreeDevi Kumar
Simon, There is a significant difference in speed. Depending on the language, the difference in accuracy may be minimal or more. You should compare both for a representative sample to see which is most suitable On 17-Sep-2017 10:28 PM, "'Simon Eigeldinger' via tesseract-ocr" < tesseract-ocr@g

Re: [tesseract-ocr] new tessdata repos on github

2017-09-17 Thread 'Simon Eigeldinger' via tesseract-ocr
Hi ShreeDevi, Thanks for the info. So it seems for blind people who need the best accuracy they should use tessdata_best. Greetings, Simon Am 17.09.2017 um 16:52 schrieb ShreeDevi Kumar: Please see https://github.com/tesseract-ocr/tesseract/issues/995#issuecomment-329667239 ShreeDevi

Re: [tesseract-ocr] Re: tesstrain.sh: /tmp/tmp.XXXXX/xxx/xxx.Font.exp0.box does not exist or is not readable

2017-09-17 Thread Dan9er
Added that and it worked perfectly. I'm finally done. On Saturday, September 16, 2017 at 7:41:39 PM UTC-4, Dan9er wrote: > > I ditched my 500+ font fontlist for one with just 3. It runs much faster > now, and I got to Phase M before I got a ./langdata/font_properties does > not exist or is not

Re: [tesseract-ocr] new tessdata repos on github

2017-09-17 Thread ShreeDevi Kumar
Please see https://github.com/tesseract-ocr/tesseract/issues/995#issuecomment-329667239 ShreeDevi भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com On Sun, Sep 17, 2017 at 2:57 AM, 'Simon Eigeldinger' via tesseract-ocr < tesseract-ocr

[tesseract-ocr] new tessdata repos on github

2017-09-17 Thread 'Simon Eigeldinger' via tesseract-ocr
Hi all, I guess i need some help understanding that. I have seen that there are now 3 repos on github containing .traineddata files. Let's see if i understand them right. Tessdata fast: Fast recognition but lesser accuracy. tessdata best: Slower recognition but higher accuracy. and there i