Hi I'm studying this passage. But I cannot understand  what is that meaning 
flag "--noextract_font_properties" ? . so I saw the file 
/tesseract/training/tesstrain.sh  

But I cannot Find "--noextract_font_properites"

Here usage : 

# USAGE:
#
# tesstrain.sh
#    --fontlist FONTS           # A list of fontnames to train on.
#    --fonts_dir FONTS_PATH     # Path to font files.
#    --lang LANG_CODE           # ISO 639 code.
#    --langdata_dir DATADIR     # Path to tesseract/training/langdata 
directory.
#    --output_dir OUTPUTDIR     # Location of output traineddata file.
#    --overwrite                # Safe to overwrite files in output_dir.
#    --linedata_only            # Only generate training data for 
lstmtraining.
#    --run_shape_clustering     # Run shape clustering (use for Indic 
langs).
#    --exposures EXPOSURES      # A list of exposure levels to use (e.g. 
"-1 0 1").
#
# OPTIONAL flags for input data. If unspecified we will look for them in
# the langdata_dir directory.
#    --training_text TEXTFILE   # Text to render and use for training.
#    --wordlist WORDFILE        # Word list for the language ordered by
#                               # decreasing frequency.
#
# OPTIONAL flag to specify location of existing traineddata files, required
# during feature extraction. If unspecified will use TESSDATA_PREFIX 
defined in
# the current environment.
#    --tessdata_dir TESSDATADIR     # Path to tesseract/tessdata directory.
#
# NOTE:
# The font names specified in --fontlist need to be recognizable by Pango 
using
# fontconfig. An easy way to list the canonical names of all fonts 
available on
# your system is to run text2image with --list_available_fonts and the
# appropriate --fonts_dir path.






Using tesstrain

The setup for running tesstrain.sh 
<https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh>
 is 
the same as for base Tesseract. Use --linedata_onlyoption for LSTM 
training. Note that it is beneficial to have more training text and make 
more pages though, as neural nets don't generalize as well and need to 
train on something similar to what they will be running on. If the target 
domain is severely limited, then all the dire warnings about needing a lot 
of training data may not apply, but the network specification may need to 
be changed.

Training data is created using tesstrain.sh 
<https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh> 
as 
follows: Note that your fonts location may vary.

training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \
  --noextract_font_properties --langdata_dir ../langdata \
  --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain



Thank U Very much . I want to reply Everybody

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/05a54fa0-b5c0-48eb-b7a1-7db0fe8dfe81%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to