foo.traineddata

Shree Devi Kumar Sun, 29 Jul 2018 09:20:02 -0700

Continue_from should be used when you want to train a new language based on
an existing language or to add some characters to an existing language.

There is no existing language called 'foo' - you should replace it with the
lang code for the language you are training.

On Sun, Jul 29, 2018 at 9:44 PM <eng.ahmed.osama.1...@gmail.com> wrote:

> I duplicated the tessdata and still getting this error
>
> combine_tessdata -u /mnt/e/projects/Training_Tesseract/ocrd-train/usr/
> share/tessdata/foo.traineddata  /mnt/e/projects/Training_Tesseract/ocrd-
> train/usr/share/tessdata/foo.
> Failed to read /mnt/e/projects/Training_Tesseract/ocrd-train/usr/share/
> tessdata/foo.traineddata
> Makefile:97: recipe for target 'data/unicharset' failed
>
>  I can't found the foo.traineddata in this folder.
>
>
>
>
> On Sunday, July 29, 2018 at 5:19:05 PM UTC+2, chandra churh chatterjee
> wrote:
>>
>> keep the foo.traineddata inside the tessdata folder and then run the
>> command.
>>
>> On Sun, Jul 29, 2018 at 5:00 AM <eng.ahmed....@gmail.com> wrote:
>>
>>> I am using a bash script to train LSTM model. I have the images and box
>>> file.
>>>
>>>
>>> My problem is the error returns when the command  combine_tessdata
>>> executed . also i have checked and no file called foo.traineddata created.
>>>
>>>
>>> Here is the bash code .
>>> export
>>>
>>>
>>> SHELL := /bin/bash
>>> LOCAL := $(PWD)/usr
>>> PATH := $(LOCAL)/bin:$(PATH)
>>> TESSDATA =  /usr/share/tesseract-ocr/tessdata
>>> LANGDATA = $(PWD)/langdata
>>>
>>>
>>> # Name of the model to be built. Default: $(MODEL_NAME)
>>> MODEL_NAME = foo
>>>
>>>
>>> # Name of the model to continue from. Default: $(CONTINUE_FROM)
>>> CONTINUE_FROM = $(MODEL_NAME)
>>>
>>>
>>> # No of cores to use for compiling leptonica/tesseract. Default: $(CORES)
>>> CORES = 4
>>>
>>>
>>> # Leptonica version. Default: $(LEPTONICA_VERSION)
>>> LEPTONICA_VERSION := 1.75.3
>>>
>>>
>>> # Tesseract commit. Default: $(TESSERACT_VERSION)
>>> TESSERACT_VERSION := 9ae97508aed1e5508458f1181b08501f984bf4e2
>>>
>>>
>>> # Tesseract langdata version. Default: $(LANGDATA_VERSION)
>>> LANGDATA_VERSION := master
>>>
>>>
>>> # Tesseract model repo to use. Default: $(TESSDATA_REPO)
>>> TESSDATA_REPO = _fast
>>>
>>>
>>> # Train directory. Default: $(TRAIN)
>>> TRAIN := data/train
>>>
>>>
>>> # Normalization Mode - see src/training/language_specific.sh for
>>> details. Default: $(NORM_MODE)
>>> NORM_MODE = 2
>>>
>>>
>>> # Page segmentation mode. Default: $(PSM)
>>> PSM = 6
>>>
>>>
>>> # Ratio of train / eval training data. Default: $(RATIO_TRAIN)
>>> RATIO_TRAIN := 0.90
>>>
>>>
>>> # BEGIN-EVAL makefile-parser --make-help Makefile
>>>
>>>
>>> help:
>>>  @echo ""
>>>  @echo "  Targets"
>>>  @echo ""
>>>  @echo "    unicharset       Create unicharset"
>>>  @echo "    lists            Create lists of lstmf filenames for
>>> training and eval"
>>>  @echo "    training         Start training"
>>>  @echo "    proto-model      Build the proto model"
>>>  @echo "    leptonica        Build leptonica"
>>>  @echo "    tesseract        Build tesseract"
>>>  @echo "    tesseract-langs  Download tesseract-langs"
>>>  @echo "    langdata         Download langdata"
>>>  @echo "    clean            Clean all generated files"
>>>  @echo ""
>>>  @echo "  Variables"
>>>  @echo ""
>>>  @echo "    MODEL_NAME         Name of the model to be built. Default:
>>> $(MODEL_NAME)"
>>>  @echo "    CONTINUE_FROM      Name of the model to continue from.
>>> Default: $(CONTINUE_FROM)"
>>>  @echo "    CORES              No of cores to use for compiling
>>> leptonica/tesseract. Default: $(CORES)"
>>>  @echo "    LEPTONICA_VERSION  Leptonica version. Default:
>>> $(LEPTONICA_VERSION)"
>>>  @echo "    TESSERACT_VERSION  Tesseract commit. Default:
>>> $(TESSERACT_VERSION)"
>>>  @echo "    LANGDATA_VERSION   Tesseract langdata version. Default:
>>> $(LANGDATA_VERSION)"
>>>  @echo "    TESSDATA_REPO      Tesseract model repo to use. Default:
>>> $(TESSDATA_REPO)"
>>>  @echo "    TRAIN              Train directory. Default: $(TRAIN)"
>>>  @echo "    NORM_MODE          Normalization Mode - see
>>> src/training/language_specific.sh for details. Default: $(NORM_MODE)"
>>>  @echo "    PSM                Page segmentation mode. Default: $(PSM)"
>>>  @echo "    RATIO_TRAIN        Ratio of train / eval training data.
>>> Default: $(RATIO_TRAIN)"
>>>
>>>
>>> # END-EVAL
>>>
>>>
>>> ALL_BOXES = data/all-boxes
>>> ALL_LSTMF = data/all-lstmf
>>>
>>>
>>> # Create unicharset
>>> unicharset: data/unicharset
>>>
>>>
>>> # Create lists of lstmf filenames for training and eval
>>> lists: $(ALL_LSTMF) data/list.train data/list.eval
>>>
>>>
>>> data/list.train: $(ALL_LSTMF)
>>>  total=`cat $(ALL_LSTMF) | wc -l` \
>>>     no=`echo "$$total * $(RATIO_TRAIN) / 1" | bc`; \
>>>     head -n "$$no" $(ALL_LSTMF) > "$@"
>>>
>>>
>>> data/list.eval: $(ALL_LSTMF)
>>>  total=`cat $(ALL_LSTMF) | wc -l` \
>>>     no=`echo "($$total - $$total * $(RATIO_TRAIN)) / 1" | bc`; \
>>>     tail -n "+$$no" $(ALL_LSTMF) > "$@"
>>>
>>>
>>> # Start training
>>> training: data/$(MODEL_NAME).traineddata
>>>
>>>
>>> data/unicharset: $(ALL_BOXES)
>>>  combine_tessdata -u $(TESSDATA)/$(CONTINUE_FROM).traineddata  $(
>>> TESSDATA)/$(CONTINUE_FROM).
>>>  unicharset_extractor --output_unicharset "$(TRAIN)/my.unicharset" 
>>> --norm_mode
>>> $(NORM_MODE) "$(ALL_BOXES)"
>>>  merge_unicharsets $(TESSDATA)/$(CONTINUE_FROM).lstm-unicharset $(TRAIN
>>> )/my.unicharset  "$@"
>>>
>>>
>>> $(ALL_BOXES): $(sort $(patsubst %.tif,%.box,$(wildcard $(TRAIN)/*.tif)))
>>>  find $(TRAIN) -name '*.box' -exec cat {} \; > "$@"
>>>
>>>
>>> $(TRAIN)/%.box: $(TRAIN)/%.tif $(TRAIN)/%.gt.txt
>>>  python3 generate_line_box.py -i "$(TRAIN)/$*.tif" -t
>>> "$(TRAIN)/$*.gt.txt" > "$@"
>>>
>>>
>>> $(ALL_LSTMF): $(sort $(patsubst %.tif,%.lstmf,$(wildcard
>>> $(TRAIN)/*.tif)))
>>>  find $(TRAIN) -name '*.lstmf' -exec echo {} \; | sort -R -o "$@"
>>>
>>>
>>> $(TRAIN)/%.lstmf: $(TRAIN)/%.box
>>>  tesseract $(TRAIN)/$*.tif $(TRAIN)/$* --psm $(PSM) lstm.train
>>>
>>>
>>> # Build the proto model
>>> proto-model: data/$(MODEL_NAME)/$(MODEL_NAME).traineddata
>>>
>>>
>>> data/$(MODEL_NAME)/$(MODEL_NAME).traineddata: $(LANGDATA) data/unicharset
>>>  combine_lang_model \
>>>    --input_unicharset data/unicharset \
>>>    --script_dir $(LANGDATA) \
>>>    --output_dir data/ \
>>>    --lang $(MODEL_NAME)
>>>
>>>
>>> data/checkpoints/$(MODEL_NAME)_checkpoint: unicharset lists proto-model
>>>  mkdir -p data/checkpoints
>>>  lstmtraining \
>>>    --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
>>>    --net_spec "[1,36,0,1 Ct3,3,16 Mp3,3 Lfys48 Lfx96 Lrx96 Lfx256
>>> O1c`head -n1 data/unicharset`]" \
>>>    --model_output data/checkpoints/$(MODEL_NAME) \
>>>    --learning_rate 20e-4 \
>>>    --train_listfile data/list.train \
>>>    --eval_listfile data/list.eval \
>>>    --max_iterations 10000
>>>
>>>
>>> data/$(MODEL_NAME).traineddata: data/checkpoints/$(MODEL_NAME)_checkpoint
>>>  lstmtraining \
>>>  --stop_training \
>>>  --continue_from $^ \
>>>  --traineddata data/$(MODEL_NAME)/$(MODEL_NAME).traineddata \
>>>  --model_output $@
>>>
>>>
>>> # Build leptonica
>>> leptonica: leptonica.built
>>>
>>>
>>> leptonica.built: leptonica-$(LEPTONICA_VERSION)
>>>  cd $< ; \
>>>  ./configure --prefix=$(LOCAL) && \
>>>  make -j$(CORES) && \
>>>  make install && \
>>>  date > "$@"
>>>
>>>
>>> leptonica-$(LEPTONICA_VERSION): leptonica-$(LEPTONICA_VERSION).tar.gz
>>>  tar xf "$<"
>>>
>>>
>>> leptonica-$(LEPTONICA_VERSION).tar.gz:
>>>  wget 'http://www.leptonica.org/source/$@'
>>>
>>>
>>> # Build tesseract
>>> tesseract: tesseract.built tesseract-langs
>>>
>>>
>>> tesseract.built: tesseract-$(TESSERACT_VERSION)
>>>  cd $< && \
>>>  sh autogen.sh && \
>>>  PKG_CONFIG_PATH="$(LOCAL)/lib/pkgconfig" \
>>>  LEPTONICA_CFLAGS="-I$(LOCAL)/include/leptonica" \
>>>  ./configure --prefix=$(LOCAL) && \
>>>  LDFLAGS="-L$(LOCAL)/lib"\
>>>  make -j$(CORES) && \
>>>  make install && \
>>>  make -j$(CORES) training-install && \
>>>  date > "$@"
>>>
>>>
>>> tesseract-$(TESSERACT_VERSION):
>>>  wget
>>> https://github.com/tesseract-ocr/tesseract/archive/$(TESSERACT_VERSION).zip
>>>  unzip $(TESSERACT_VERSION).zip
>>>
>>>
>>> # Download tesseract-langs
>>> tesseract-langs: $(TESSDATA)/eng.traineddata
>>>
>>>
>>> # Download langdata
>>> langdata: $(LANGDATA)
>>>
>>>
>>> $(LANGDATA):
>>>  #wget '
>>> https://github.com/tesseract-ocr/langdata/archive/$(LANGDATA_VERSION).zip
>>> '
>>>  unzip $(LANGDATA_VERSION).zip
>>>
>>>
>>> $(TESSDATA)/eng.traineddata:
>>>  cd $(TESSDATA) && wget
>>> https://github.com/tesseract-ocr/tessdata$(TESSDATA_REPO)/raw/master/$(notdir
>>> $@)
>>>
>>>
>>> # Clean all generated files
>>> clean:
>>>  find data/train -name '*.box' -delete
>>>  find data/train -name '*.lstmf' -delete
>>>  rm -rf data/all-*
>>>  rm -rf data/list.*
>>>  rm -rf data/$(MODEL_NAME)
>>>  rm -rf data/unicharset
>>>  rm -rf data/checkpoints
>>>
>>>
>>> Also here is the error
>>>
>>>
>>> combine_tessdata -u /usr/share/tesseract-ocr/tessdata/foo.traineddata  /
>>> usr/share/tesseract-ocr/tessdata/foo.
>>> Failed to read /usr/share/tesseract-ocr/tessdata/foo.traineddata
>>> Makefile:97: recipe for target 'data/unicharset' failed
>>> make: *** [data/unicharset] Error 1
>>>
>>>
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "tesseract-ocr" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to tesseract-oc...@googlegroups.com.
>>> To post to this group, send email to tesser...@googlegroups.com.
>>> Visit this group at https://groups.google.com/group/tesseract-ocr.
>>> To view this discussion on the web visit
>>> https://groups.google.com/d/msgid/tesseract-ocr/964f8a60-ec0e-44d9-a6a2-1b81eb49ab2b%40googlegroups.com
>>> <https://groups.google.com/d/msgid/tesseract-ocr/964f8a60-ec0e-44d9-a6a2-1b81eb49ab2b%40googlegroups.com?utm_medium=email&utm_source=footer>
>>> .
>>> For more options, visit https://groups.google.com/d/optout.
>>>
>> --
> You received this message because you are subscribed to the Google Groups
> "tesseract-ocr" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to tesseract-ocr+unsubscr...@googlegroups.com.
> To post to this group, send email to tesseract-ocr@googlegroups.com.
> Visit this group at https://groups.google.com/group/tesseract-ocr.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/tesseract-ocr/ac3496b9-899d-4590-a015-1adc2de0327d%40googlegroups.com
> <https://groups.google.com/d/msgid/tesseract-ocr/ac3496b9-899d-4590-a015-1adc2de0327d%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
> For more options, visit https://groups.google.com/d/optout.
>


-- 

____________________________________________________________
भजन - कीर्तन - आरती @ http://bhajans.ramparivar.com

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/CAG2NduWYq2HLQ13-GOY%2Bq6e1FPmR9yAjV4u-ZdVKboA4%3DBEd4Q%40mail.gmail.com.
For more options, visit https://groups.google.com/d/optout.

Re: [tesseract-ocr] combine_tessdata. Failed to read /usr/share/tesseract-ocr/tessdata/foo.traineddata

Reply via email to