I know there are some similar posts - I've read them all! - but they don't 
seem to provide an answer.  I'm in  Windows 11 with Tesseract 
5.2.0.20220712.

I was having trouble applying a user word list instead of the dawg list so 
I made a very simple example with one is not correctly detected plus a 
user-words file with one entry of a close match.

So, here's the image, temp.png, which is a slightly blurred image of 
"testW0rd", and using this command:
"C:\Program Files\Tesseract-OCR\tesseract" temp.png output --psm 3
I get the result "testwurd" in output.txt.

OK, so following instructions in now when I put a file called 
eng.user-words with one entry - "testWord" in C:\Program 
Files\Tesseract-OCR\tessdata and a text file called bazaar in C:\Program 
Files\Tesseract-OCR\tessdata\configs with the following lines:
load_system_dawg     F
load_freq_dawg       F
user_words_suffix    user-words
language_model_penalty_non_dict_word 1

And run again, I get the same result as before: "testwurd".  It doesn't 
seem to be using the user-words file?  Or rather since it errors if it's 
not there, it is accessing it but possibly not doing anything with it?

Any ideas why this is not working, would really appreciate some help with 
this from an expert.

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b9450ec9-f943-40dd-8948-c2071e0f96f1n%40googlegroups.com.

Attachment: eng.user-words
Description: Binary data

Attachment: bazaar
Description: Binary data

Reply via email to