That is very interesting. I was expecting the dictionary to have some 
significant impact on the output. I am getting no impact at all. Yes, my 
images are pretty fine: regular scanned (300dpi) book, and i m on Tesseract 
5.  Sure, I will dig into this forum, and also with the experimentation. 

If my results are consistent, I will report back.  We might need to  update 
our assumptions (and the wiki). 

Thank you for your clarification dear Zdenko.

On Sunday, November 19, 2023 at 9:15:42 PM UTC+3 zdenop wrote:

> AFAIR there were tests with the legacy engine where the effect of 
> improving results quality by dictionaries where measured as 10-15% for 
> common text.
> However: adding a word to a dictionary has never ensured Tesseract's 
> accurate recognition of that word.
> For non-word inputs (e.g. serial numbers ...) it was always suggested to 
> turn off dictionaries.
> IMO results depend on the input image quality (for good image quality it 
> seems like no effect). If you need more detail/experiences dig into the 
> history of this forum (especially after releasing first version 3).
>
> I never heard that anybody would do such a test for the LSTM engine.
>
> Zdenko
>
>
> ne 19. 11. 2023 o 18:37 Des Bw <desal...@gmail.com> napísal(a):
>
>> Does Tesseract actually use the dictionary (wordlist) included into the 
>> model (traineddata file)?
>>
>> - I am not getting any difference/impact by including a dictionary (word 
>> list) into the file. 
>>
>> Has anybody experimented with a dictionary set up?
>>
>> -- 
>> You received this message because you are subscribed to the Google Groups 
>> "tesseract-ocr" group.
>> To unsubscribe from this group and stop receiving emails from it, send an 
>> email to tesseract-oc...@googlegroups.com.
>> To view this discussion on the web visit 
>> https://groups.google.com/d/msgid/tesseract-ocr/381c213c-da12-482a-accf-e6847c0fc01bn%40googlegroups.com
>>  
>> <https://groups.google.com/d/msgid/tesseract-ocr/381c213c-da12-482a-accf-e6847c0fc01bn%40googlegroups.com?utm_medium=email&utm_source=footer>
>> .
>>
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/b3f5cb99-e183-4b80-bb4e-7db0b961c842n%40googlegroups.com.

Reply via email to