>I get a file named output_checkpoint with 200MB. I renamed it to ccy.traineddata and put it in the tessdata folder. *Is this how it's supposed to do*?
No. Please see https://github.com/tesseract-ocr/tesseract/wiki/TrainingTesseract-4.00#combining-the-output-files >*Is there a way to check if a traineddata file is valid*? https://github.com/tesseract-ocr/tesseract/blob/master/doc/combine_tessdata.1.asc -d *.traineddata* *FILE*…: Lists directory of components from the .traineddata file. combine_tessdata -d tessdata/eng.traineddata On Tue, Sep 10, 2019 at 7:40 PM Nuno Feliciano <nfelici...@gmail.com> wrote: > > Thanks for the quick reply. The first time I got the error was after the > learning process, so I did a step backwards to replicate the error. > > When I train the model > lstmtraining > --traineddata D:/software/Tesseract-OCR-4.0/tessdata/ccy.traineddata > -U D:/software/Tesseract-OCR/tessdate/Latin.unicharset > --train_listfile D:/software/Tesseract-OCR/training/list.train > --net_spec > "[1,40,0,1 Ct5,5,64 Mp3,3 Lfys128 Lbx256 Lbx256 O1c1]" > --model_output D:/software/Tesseract-OCR/training/model/output > > I get a file named output_checkpoint with 200MB. I renamed it to > ccy.traineddata and put it in the tessdata folder. *Is this how it's > supposed to do*? > Then know When I execute the OCR I get > Error opening data file > D:\software\Tesseract-OCR-4.0\tessdata/ccy.traineddata > Please make sure the TESSDATA_PREFIX environment variable is set to your > "tessdata" directory. > Failed loading language 'ccy' > Tesseract couldn't load any languages! > Could not initialize tesseract. > > The file exists, and I can open in a text editor. > > *Is there a way to check if a traineddata file is valid*? > > Thanks, > Nuno > > segunda-feira, 9 de Setembro de 2019 às 17:09:39 UTC+1, shree escreveu: >> >> Combine-lang-model only creates the starter traineddata. It is used as >> part of lstm training process. It cannot be used for recognition. >> >> Training from scratch requires running the lstmtraing command. >> >> On Mon, Sep 9, 2019, 21:36 Nuno Feliciano <nfeli...@gmail.com> wrote: >> >>> >>> >>> >>> >>> Hi, >>> >>> I am trying to make a model from scratch. >>> I created a language using >>> combine_lang_model --input_unicharset >>> D:\software\Tesseract-OCR-4.0\tessdata\Latin.unicharset --script_dir >>> D:\software\Tesseract-OCR-4.0\tessdata --output_dir >>> D:\software\Tesseract-OCR-4.0\training\output *--lang ccy* >>> Than I put the generated ccy.traineddata file in tessdata folder and I >>> execute >>> tesseract --tessdata-dir D:\software\Tesseract-OCR-4.0\tessdata -l ccy >>> <file> stdout, which gives me >>> *Failed loading language 'ccy'* >>> Tesseract couldn't load any languages! >>> Could not initialize tesseract. >>> >>> tesseract --list-langs gives me >>> ccy >>> eng >>> osd >>> ... >>> >>> I got Latin.unicharset from >>> https://raw.githubusercontent.com/tesseract-ocr/langdata_lstm/master/Latin.unicharset >>> >>> Can anyone help? >>> >>> Thanks, >>> Nuno Feliciano >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "tesseract-ocr" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to tesser...@googlegroups.com. >>> To view this discussion on the web visit >>> https://groups.google.com/d/msgid/tesseract-ocr/f0157ef9-7b83-4fa3-8cf5-3697514d6de0%40googlegroups.com >>> <https://groups.google.com/d/msgid/tesseract-ocr/f0157ef9-7b83-4fa3-8cf5-3697514d6de0%40googlegroups.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- > You received this message because you are subscribed to the Google Groups > "tesseract-ocr" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to tesseract-ocr+unsubscr...@googlegroups.com. > To view this discussion on the web visit > https://groups.google.com/d/msgid/tesseract-ocr/9a4f9c1d-009a-4420-a662-26b2678e253a%40googlegroups.com > <https://groups.google.com/d/msgid/tesseract-ocr/9a4f9c1d-009a-4420-a662-26b2678e253a%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- ____________________________________________________________ भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduVP6an5i20EkW8V8xVCRd4xugAEZ-rL48UUdHUcqjr5Eg%40mail.gmail.com.