Hello Deborah,
Hopefully this isn't off-topic, and I don't mean to derail your thread, but 
I just wanted to chime in that I am having some very similar difficulties 
and considerations in the hopes that it will generate enough interest to 
yield an effective solution.
On Sunday, June 16, 2024 at 2:41:27 AM UTC-4 Deborah wrote:

> Hello, I am using Tesseract to extract some data from screenshots.
> I've noticed that sometimes there are mistakes in interpreting characters 
> like '0' and 'O', 'P' and 'R' or '-' and '—' or the other way around. This 
> happen with the same font. And it happens sometimes even with some 
> preprocessing, like binarization.
> Is there a comprehensive map of all characters that are usually mistakenly 
> recognised that are very similar?
> I need that map in order to calculate effective string distance with 
> Levenshtein and adjust the cost for characters that are very similar. 
> Thanks.
>

-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to tesseract-ocr+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/bf0f930c-b641-439e-b4c1-6ac24c4d7c4en%40googlegroups.com.

Reply via email to