Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread tu tonquang
I have tried some case likes: 1. Remove --train_listfile argument from lstmtraining command 2. Change name of argument value, for example: --train_listfile "wrong_file.txt" from lstmtraining command (wong_file not exist in file system) 3. given full path of "eng.training_files.txt" file as "/ho

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread Shree Devi Kumar
Maybe it is not finding your ./eng.training_files.txt Try giving its full path in lstmtraining command. -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesser

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread tu tonquang
I'm using Linux (Ubuntu) system to edit this file. Besides I write this shell to check EOL of file and result is Unix EOL [image: Screenshot from 2018-10-20 10-54-18.png] [image: Screenshot from 2018-10-20 10-54-30.png] Vào 10:36:26 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, shree đã viết: >

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread Shree Devi Kumar
The files need to use Unix EOL. On Fri, Oct 19, 2018 at 11:16 PM tu tonquang wrote: > Thank you > But I did same thing but I also get an error like that. It is my file: > > [image: Screenshot from 2018-10-20 09-53-37.png] > > > > It is my terminal: > > [image: Screenshot from 2018-10-20 10-14-07

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread tu tonquang
Thank you But I did same thing but I also get an error like that. It is my file: [image: Screenshot from 2018-10-20 09-53-37.png] It is my terminal: [image: Screenshot from 2018-10-20 10-14-07.png] Vào 09:19:28 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, shree đã viết: > > On Fri, Oct 19, 201

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread Shree Devi Kumar
On Fri, Oct 19, 2018 at 10:02 PM Seokbong Choi wrote: > Can you share the content of "eng.training_files.txt" file? that > --train_listfile argument refers to? > Thanks. > > The contents will differ based on the fonts chosen and the output diectory. See the following for a sample: /home/ubuntu/t

Re: [tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread Seokbong Choi
Can you share the content of "eng.training_files.txt" file? that --train_listfile argument refers to? Thanks. On Fri, Oct 19, 2018 at 1:59 PM tu tonquang wrote: > I want my application able to recognize characters like: 'Φ' > > Vào 00:56:01 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, tu tonquang đ

[tesseract-ocr] Re: Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread tu tonquang
I want my application able to recognize characters like: 'Φ' Vào 00:56:01 UTC+7 Thứ Bảy, ngày 20 tháng 10 năm 2018, tu tonquang đã viết: > > Hi, > > *I have some errors when I follow this tutorial to retrain tesseract: * > > I follow this link to retrain tesseract with my image dataset (I retrain

[tesseract-ocr] Retrain tesseract 4 model from real image (not from text file and tesstrain.sh)

2018-10-19 Thread tu tonquang
Hi, *I have some errors when I follow this tutorial to retrain tesseract: * I follow this link to retrain tesseract with my image dataset (I retrain tesseract with real image, not from text file via tesstrain.sh) https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#creating-sta

[tesseract-ocr] Tesseract misreading numbers

2018-10-19 Thread 'ilochray' via tesseract-ocr
I am using Tesseract 3.0.5 with the .Net wrapper installed from the Nuget package https://www.nuget.org/packages/Tesseract/ . I have a program which reads text from an image. I am using PSM = 6 and pre-process the image to remove noise. I select a portion of the image for processing and have

Re: [tesseract-ocr] What do iteration numbers mean in the train logging?

2018-10-19 Thread Shree Devi Kumar
See https://github.com/tesseract-ocr/tesseract/blob/3a7f5e4de459f4c64f36e08b18ce1b66b1fbc876/src/lstm/lstmtrainer.cpp#L410 On Fri, 19 Oct 2018, 09:01 , wrote: > I get the following log lines while training tesseract: > > At iteration *303839/569300/573167*, Mean rms=0.777%, delta=2.588%, char

[tesseract-ocr] Re: Server performance is 3x as slow versus local machine

2018-10-19 Thread David Tran
SSE and AVX were not found for either my local or server machine tesseract v4.0.0-rc3.20181014 leptonica-1.76.0 libgif 5.1.4 : libjpeg 8d (libjpeg-turbo 1.5.3) : libpng 1.6.34 : libtiff 4.0. 9 : zlib 1.2.11 : libwebp 0.6.1 : libopenjp2 2.2.0 Is there a way to add those? If it will help boost

[tesseract-ocr] What do iteration numbers mean in the train logging?

2018-10-19 Thread benda . krisztian
I get the following log lines while training tesseract: At iteration *303839/569300/573167*, Mean rms=0.777%, delta=2.588%, char train=7.443%, word train=13.343%, skip ratio=0.6%, wrote checkpoint. What do the first three numbers mean? Which is the real iteration number? And what are the other