[tesseract-ocr] Training tesseract for non alphanumeric symbols

RuePat07 Thu, 19 Sep 2024 05:28:32 -0700

I have around 10 symbols (icons) that I would like to add to the English 
model. For example   ← or →. i have ability to generate these images as 
many as i want. however they don't conform to any font. And all of them 
don't belong to all fonts. As in symbol A might not be available in Font A 
but is available in Font B, while symbol B is available in Font B not in 
Font A.
i Looked at the following section of documentation
https://github.com/tesseract-ocr/tessdoc/blob/main/tess4/TrainingTesseract-4.00.md#fine-tuning-for--a-few-characters
But it is not possible to add it in text as if it doesn't exist in the font 
it will generate question marks in images. 
So my questions are
1. First, which solution is the best for me? A. Finetune it B. Retrain a 
couple layers C. Train from scratch and combine it with English language
2. What kind of data i would need for the same? are there any tools that 
will help me generate it in my case?


-- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/tesseract-ocr/22e0473e-7860-4929-8179-7ed26c815accn%40googlegroups.com.

[tesseract-ocr] Training tesseract for non alphanumeric symbols

Reply via email to