You make your own in the same directory your eng.traineddata is.

My folder structure looks like this:

<https://lh3.googleusercontent.com/-124nDg2VCOQ/WuHKV7RHx6I/AAAAAAAAAKg/iW1jX3QNI_kbcMTefRVrX-n6sy9Dm-eIgCLcBGAs/s1600/Screen%2BShot%2B2018-04-26%2Bat%2B8.46.42%2BAM.png>


On Monday, April 2, 2018 at 4:08:21 AM UTC-4, 이경준 wrote:
>
> Hi ..
>
>
> I incited this page .
>
> I cannot find (lang).user-words .
>
> How can I find? 
>
>
> Tesseract config files consist of lines with variable-value pairs (space 
> separated). The variables are documented as flags in the source code like 
> the following one in tesseractclass.h:
>
> STRING_VAR_H(tessedit_char_blacklist, "", "Blacklist of chars not to 
> recognize");
>
> These variables may enable or disable various features of the engine, and 
> may cause it to load (or not load) various data. For instance, let’s 
> suppose you want to OCR in English, but suppress the normal dictionary and 
> load an alternative word list and an alternative list of patterns — these 
> two files are the most commonly used extra data files.
>
> If your language pack is in /path/to/eng.traineddata and the hocr config 
> is in /path/to/configs/hocr then create three new files:
>
> /path/to/eng.user-words:
>
> the
> quick
> brown
> fox
> jumped
>
> /path/to/eng.user-patterns:
>
> 1-\d\d\d-GOOG-411
> www.\n\\\*.com
>
> /path/to/configs/bazaar:
>
> load_system_dawg     F
> load_freq_dawg       F
> user_words_suffix    user-words
> user_patterns_suffix user-patterns
>
> Now, if you pass the word *bazaar* as a trailing command line parameter 
> to Tesseract, Tesseract will not bother loading the system dictionary nor 
> the dictionary of frequent words and will load and use the eng.user-words 
> and eng.user-patterns files you provided. The former is a simple word list, 
> one per line. The format of the latter is documented in dict/trie.h on 
> read_pattern_list().
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To post to this group, send email to tesseract-ocr@googlegroups.com.
Visit this group at https://groups.google.com/group/tesseract-ocr.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/96afc423-282b-4db5-9db6-8dce4dad0815%40googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to