Hi I'm studying this passage. But I cannot understand what is that meaning flag "--noextract_font_properties" ? . so I saw the file /tesseract/training/tesstrain.sh
But I cannot Find "--noextract_font_properites" Here usage : # USAGE: # # tesstrain.sh # --fontlist FONTS # A list of fontnames to train on. # --fonts_dir FONTS_PATH # Path to font files. # --lang LANG_CODE # ISO 639 code. # --langdata_dir DATADIR # Path to tesseract/training/langdata directory. # --output_dir OUTPUTDIR # Location of output traineddata file. # --overwrite # Safe to overwrite files in output_dir. # --linedata_only # Only generate training data for lstmtraining. # --run_shape_clustering # Run shape clustering (use for Indic langs). # --exposures EXPOSURES # A list of exposure levels to use (e.g. "-1 0 1"). # # OPTIONAL flags for input data. If unspecified we will look for them in # the langdata_dir directory. # --training_text TEXTFILE # Text to render and use for training. # --wordlist WORDFILE # Word list for the language ordered by # # decreasing frequency. # # OPTIONAL flag to specify location of existing traineddata files, required # during feature extraction. If unspecified will use TESSDATA_PREFIX defined in # the current environment. # --tessdata_dir TESSDATADIR # Path to tesseract/tessdata directory. # # NOTE: # The font names specified in --fontlist need to be recognizable by Pango using # fontconfig. An easy way to list the canonical names of all fonts available on # your system is to run text2image with --list_available_fonts and the # appropriate --fonts_dir path. Using tesstrain The setup for running tesstrain.sh <https://github.com/tesseract-ocr/tesseract/wiki/Training-Tesseract-%E2%80%93-tesstrain.sh> is the same as for base Tesseract. Use --linedata_onlyoption for LSTM training. Note that it is beneficial to have more training text and make more pages though, as neural nets don't generalize as well and need to train on something similar to what they will be running on. If the target domain is severely limited, then all the dire warnings about needing a lot of training data may not apply, but the network specification may need to be changed. Training data is created using tesstrain.sh <https://github.com/tesseract-ocr/tesseract/blob/master/training/tesstrain.sh> as follows: Note that your fonts location may vary. training/tesstrain.sh --fonts_dir /usr/share/fonts --lang eng --linedata_only \ --noextract_font_properties --langdata_dir ../langdata \ --tessdata_dir ./tessdata --output_dir ~/tesstutorial/engtrain Thank U Very much . I want to reply Everybody -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/05a54fa0-b5c0-48eb-b7a1-7db0fe8dfe81%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.