You make your own in the same directory your eng.traineddata is. My folder structure looks like this:
<https://lh3.googleusercontent.com/-124nDg2VCOQ/WuHKV7RHx6I/AAAAAAAAAKg/iW1jX3QNI_kbcMTefRVrX-n6sy9Dm-eIgCLcBGAs/s1600/Screen%2BShot%2B2018-04-26%2Bat%2B8.46.42%2BAM.png> On Monday, April 2, 2018 at 4:08:21 AM UTC-4, 이경준 wrote: > > Hi .. > > > I incited this page . > > I cannot find (lang).user-words . > > How can I find? > > > Tesseract config files consist of lines with variable-value pairs (space > separated). The variables are documented as flags in the source code like > the following one in tesseractclass.h: > > STRING_VAR_H(tessedit_char_blacklist, "", "Blacklist of chars not to > recognize"); > > These variables may enable or disable various features of the engine, and > may cause it to load (or not load) various data. For instance, let’s > suppose you want to OCR in English, but suppress the normal dictionary and > load an alternative word list and an alternative list of patterns — these > two files are the most commonly used extra data files. > > If your language pack is in /path/to/eng.traineddata and the hocr config > is in /path/to/configs/hocr then create three new files: > > /path/to/eng.user-words: > > the > quick > brown > fox > jumped > > /path/to/eng.user-patterns: > > 1-\d\d\d-GOOG-411 > www.\n\\\*.com > > /path/to/configs/bazaar: > > load_system_dawg F > load_freq_dawg F > user_words_suffix user-words > user_patterns_suffix user-patterns > > Now, if you pass the word *bazaar* as a trailing command line parameter > to Tesseract, Tesseract will not bother loading the system dictionary nor > the dictionary of frequent words and will load and use the eng.user-words > and eng.user-patterns files you provided. The former is a simple word list, > one per line. The format of the latter is documented in dict/trie.h on > read_pattern_list(). > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To post to this group, send email to tesseract-ocr@googlegroups.com. Visit this group at https://groups.google.com/group/tesseract-ocr. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/96afc423-282b-4db5-9db6-8dce4dad0815%40googlegroups.com. For more options, visit https://groups.google.com/d/optout.