Hi Bryan, On Tue, Dec 10, 2013 at 09:13:37AM -0800, Bryan Tarpley wrote: > I've attached an example from one of our documents. Consider the capital 'T' > which overhangs the 'u', and the 'k' which underlies the 'e'. We've also > found > instances where, on certain fonts, almost all of the italics characters > overlap. These are not ligatures.
Curious... Is this a title? If so, maybe they used fancier methods (e.g. custom cutting the squares)? The T only overhangs the u a tiny bit, and as it's an italic font anyway I suspect that could be the ink spreading a touch. But the K certainly looks a lot like a ligature (whether custom designed for the title or not). I recently read the book "A View of Early Typography" by Harry Carter, who mentions that Aldus used at least 65 different ligatures for all sorts of letter joins. Granted he was exceptional, but also prolific. I thoroughly recommend that book, incidentally - it's heavy going, but awesome. IIRC there's nothing stopping you from treating things like that as a character that output multiple letters when training, if it doesn't make sense to preserve the ligature (which for cases like this it probably wouldn't). If your university has an old printing press, go visit it and find someone to show you around - it's great fun! Nick -- -- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To post to this group, send email to [email protected] To unsubscribe from this group, send email to [email protected] For more options, visit this group at http://groups.google.com/group/tesseract-ocr?hl=en --- You received this message because you are subscribed to the Google Groups "tesseract-ocr" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/groups/opt_out.

