Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread ElGato ElMago
Thanks a lot, Shree. I'll look it in. 2019年5月30日木曜日 14時39分52秒 UTC+9 shree: > > See https://github.com/Shreeshrii/tessdata_shreetest > > Look at the files engrestrict*.* and also > https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text > > Create training text of abo

[tesseract-ocr]

2019-05-29 Thread iqra
Sent from Mail for Windows 10 How I combine “License Plates-OCR master” with “LPEX master” I am executing LPEX master’s file Extraction.py My project is to separately show number plate’s numbers please help me -- You received this message because you are subscribed to the Google Groups "tess

[tesseract-ocr] mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110 Illegal instruction (core dumped)

2019-05-29 Thread Jennil Thiyam
lstmtraining --model_output ~/tesstutorial/train_wa/wa \ > --continue_from ~/tesstutorial/train_wa/ben.lstm \ > --traineddata ~/tesstitorial/train_wa/ben/ben.traineddata \ > --old_traineddata tessdata/best/ben.traineddata \ > --train_listfile ~/tesstutorial/train_wa/ben.training_files.txt \ > --max

Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread Shree Devi Kumar
See https://github.com/Shreeshrii/tessdata_shreetest Look at the files engrestrict*.* and also https://github.com/Shreeshrii/tessdata_shreetest/blob/master/eng.digits.training_text Create training text of about 100 lines and finetune for 400 lines On Thu, May 30, 2019 at 9:38 AM ElGato ElMago

Re: [tesseract-ocr] mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110

2019-05-29 Thread Jennil Thiyam
I add only one character like 30 times in the ben.training_text (that too in the end of the original training text), which meant i dint modified the original ben.training_text in large aspect. still why i am getting this "normalization failed" in many of the words which are already in the original

Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread ElGato ElMago
I had about 14 lines as attached. How many lines would you recommend? Fine tuning gives much better result but it tends to pick other character than in E13B that only has 14 characters, 0 through 9 and 4 symbols. I thought training from scratch would eliminate such confusion. 2019年5月30日木曜日 10

Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread Shree Devi Kumar
For training from scratch a large training text and hundreds of thousands of iterations are recommended. If you are just fine tuning for a font try to follow instructions for training for impact, with your font. On Thu, 30 May 2019, 06:05 ElGato ElMago, wrote: > Thanks, Shree. > > Yes, I saw t

Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread ElGato ElMago
Thanks, Shree. Yes, I saw the instruction. The steps I made are as follows: Using tesstrain.sh: src/training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata \ --fontlist "E13Bnsd" --o

Re: [tesseract-ocr] MRZ/MRP (Machine-readable zone/passport) dataset for tesseract v4

2019-05-29 Thread Mamadou
Hello Lorenzo, We're fine tuning en.traineddata without modifications with charset restriction within [A-Z0-9]. We're using the default parameters and the model converges very fast. We have #1376 images from Google image used to test the accuracy. The reported accuracy is min(detector, recognize

Re: [tesseract-ocr] mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110

2019-05-29 Thread Jennil Thiyam
One simple question, I get confuse every time. The question is about setting the TESSDATA_PREFIX environment variable. Which path should i set? */usr/local/share/tessdata* (but here i could not find .traineddata, but if this is the path, can i just copy the .traineddata to this folder "tess

Re: [tesseract-ocr] mgr_.Init(traineddata_path.c_str()):Error:Assert failed:in file ../../src/lstm/lstmtrainer.h, line 110

2019-05-29 Thread Shree Devi Kumar
Check that the training text you used is normalized correctly, also check the Bengali normalization/validation rules https://github.com/tesseract-ocr/tesseract/issues/1038 -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this

Re: [tesseract-ocr] Trained data for E13B font

2019-05-29 Thread Shree Devi Kumar
see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files On Wed, May 29, 2019 at 3:18 PM ElGato ElMago wrote: > Hi, > > I wish to make a trained data for E13B font. > > I read the training tutorial and made a base_checkpoint file according to > the me

[tesseract-ocr] Trained data for E13B font

2019-05-29 Thread ElGato ElMago
Hi, I wish to make a trained data for E13B font. I read the training tutorial and made a base_checkpoint file according to the method in Training From Scratch. Now, how can I make a trained data from the base_checkpoint file? -- You received this message because you are subscribed to the Goo

AW: [tesseract-ocr] Tesseract Windows binaries on Appveyor

2019-05-29 Thread Eigeldinger Simon
Thanks for the fix. Greetings, Simon Mit freundlichen Grüßen Simon Eigeldinger Informatik Nebengebäude 1, OG1 [Hohenems_logo]Stadt Hohenems Kaiser-Franz-Josef-Straße 4 6845 Hohenems T: +43 5576 7101-1143 | E: simon.eigeldin...@hohenems.at | www.hohenems.at Diese Nachricht und allfällige angehä

Re: [tesseract-ocr] MRZ/MRP (Machine-readable zone/passport) dataset for tesseract v4

2019-05-29 Thread Lorenzo Bolzani
Hi Mamadou, this sounds very interesting. How did you do the training and accuracy measurements? What parameters did you use for the model? Thanks, bye Lorenzo Il giorno lun 27 mag 2019 alle ore 07:38 Mamadou ha scritto: > Hello, > > We have open sourced (BSD license) MRZ/MRP (Machine-readabl

Re: [tesseract-ocr] Tesseract Windows binaries on Appveyor

2019-05-29 Thread Zdenko Podobny
Artifacts are again available[1]. Filenames are decision of author of sw[2] (used for build). If you do not like them you can build tesseract by yourself (or you can rename exe files, but not dlls). [1] https://ci.appveyor.com/project/zdenop/tesseract/build/job/p4wb6dwx18fbhbkp/artifacts [2] htt