Hi Consider an image containing a mix of English and German text.
Extracting wordstr boxes from it and fixing mistakes. When fine tuning the two languages, I get encoding errors for English as it does not contain German chars. What is the correct approach here? 1. Ignore encoding errors? What effect does this have on the result? 2. Create two box files changing German words like 'Dänemark' to 'Danemark' for eng? 3. Remove German wordstr's from box file when fine tuning deu? 4. Add German chars to the English unicodecharset? 5. Something else? /Jacob -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to tesseract-ocr+unsubscr...@googlegroups.com. To view this discussion on the web visit https://groups.google.com/d/msgid/tesseract-ocr/edac0bd6-57bb-4afc-8e3c-a02a4c1f007cn%40googlegroups.com.