Hi Bryan, On Tue, Dec 10, 2013 at 07:18:57PM -0600, Bryan Tarpley wrote: > We've found that when two letters > aren't touching, Tesseract has trouble identifying them together as a single > ligature, /especially/ given that the character "e" by itself looks exactly > the > same as the one in "ke."
Oh, I see. That's something Tesseract ought to do better, really, if it knows there are some 'characters' trained which are big enough that a combined box may make sense. I'll have a look into the code which does the boxing at some point to see if I can find a way to improve it, but probably that won't be for some time. > In those cases, even though the printer may have > combined the "k" and the "e" onto the same plate to form the "ligature" "ke," > (what's the better word for plate here?), it is better to train Tesseract to > recognize them as separate characters, from what we've found. That sound sensible. However, as I mentioned in an earlier email, I question the wisdom of training these characters using non-rectangular polygons, as Tesseract will be breaking the ligature into rectangles anyway, so e.g. it'll never see the flourish of the tail of 'k' in the same box as the main part of 'k', so training for a 'k' with the full flourish can't help it. > I feel like I'm > talking in circles, so if this is making no sense, I can try to give example > images of what I'm talking about tomorrow. We both have a little I suspect, but hopefully we understand each other completely now. I at least believe I do... Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

