[tesseract-ocr] [questions] what happened to `tessdata_best` in Tesseract 5?

Alessandro Griseta Sun, 06 Jul 2025 07:18:27 -0700

I tried manually adding files I needed 
from https://github.com/tesseract-ocr/tessdata_best (`equ.traineddata`, 
`osd.traineddata`, `ita.traineddata`) inside 
`/usr/share/tesseract-ocr/5/tessdata`: unfortunately I then found out the 
hard way that these only work on Tesseract 4 XD.


1. It seems funny though: does that really mean I'll get better results by 
downgrading so that I can actually use these files?

I understand the performance loss, but I'm particularly interested in 
getting the best of `equ.traineddata`, which to my understanding interprets 
math characters, which are often a challenge for OCR engines, so was trying 
to get the absolute best scan possible for that.

2. Also, I wasn't able to specify `-l equ` as the error told me Tesseract 
is supposed to deal with that on its own: if that's the case, is `equ` 
installed by default with `sudo apt-get install tesseract-ocr` (couldn't 
find it in `tessdata` folder, and don't know where else to look for it)?

3. I also tested the Docker image: if I put `equ.traineddata` and 
`osd.traineddata` inside the `tessdata` folder will they (which I have 
chosen manually) actually be used?

Hope this all makes sense, don't be afraid to ask :)
Alessandro

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion visit 
https://groups.google.com/d/msgid/tesseract-ocr/789d7514-bded-49e4-95ed-44cfb0049ad1n%40googlegroups.com.

[tesseract-ocr] [questions] what happened to `tessdata_best` in Tesseract 5?

Reply via email to