That is very interesting. I was expecting the dictionary to have some significant impact on the output. I am getting no impact at all. Yes, my images are pretty fine: regular scanned (300dpi) book, and i m on Tesseract 5. Sure, I will dig into this forum, and also with the experimentation.
If my results are consistent, I will report back. We might need to update our assumptions (and the wiki). Thank you for your clarification dear Zdenko. On Sunday, November 19, 2023 at 9:15:42 PM UTC+3 zdenop wrote: > AFAIR there were tests with the legacy engine where the effect of > improving results quality by dictionaries where measured as 10-15% for > common text. > However: adding a word to a dictionary has never ensured Tesseract's > accurate recognition of that word. > For non-word inputs (e.g. serial numbers ...) it was always suggested to > turn off dictionaries. > IMO results depend on the input image quality (for good image quality it > seems like no effect). If you need more detail/experiences dig into the > history of this forum (especially after releasing first version 3). > > I never heard that anybody would do such a test for the LSTM engine. > > Zdenko > > > ne 19. 11. 2023 o 18:37 Des Bw <desal...@gmail.com> napísal(a): > >> Does Tesseract actually use the dictionary (wordlist) included into the >> model (traineddata file)? >> >> - I am not getting any difference/impact by including a dictionary (word >> list) into the file. >> >> Has anybody experimented with a dictionary set up? >> >> -- >> You received this message because you are subscribed to the Google Groups >> "tesseract-ocr" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to tesseract-oc...@googlegroups.com. >> To view this discussion on the web visit >> https://groups.google.com/d/msgid/tesseract-ocr/381c213c-da12-482a-accf-e6847c0fc01bn%40googlegroups.com >> >> <https://groups.google.com/d/msgid/tesseract-ocr/381c213c-da12-482a-accf-e6847c0fc01bn%40googlegroups.com?utm_medium=email&utm_source=footer> >> . >> > -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/b3f5cb99-e183-4b80-bb4e-7db0b961c842n%40googlegroups.com.