Hi Bryan,

On Tue, Dec 10, 2013 at 09:13:37AM -0800, Bryan Tarpley wrote:
> I've attached an example from one of our documents.  Consider the capital 'T'
> which overhangs the 'u', and the 'k' which underlies the 'e'.  We've also 
> found
> instances where, on certain fonts, almost all of the italics characters
> overlap.  These are not ligatures.

Curious... Is this a title? If so, maybe they used fancier methods
(e.g. custom cutting the squares)? The T only overhangs the u a tiny
bit, and as it's an italic font anyway I suspect that could be the
ink spreading a touch. But the K certainly looks a lot like a
ligature (whether custom designed for the title or not).

I recently read the book "A View of Early Typography" by Harry
Carter, who mentions that Aldus used at least 65 different ligatures
for all sorts of letter joins. Granted he was exceptional, but also
prolific. I thoroughly recommend that book, incidentally - it's
heavy going, but awesome.

IIRC there's nothing stopping you from treating things like that as
a character that output multiple letters when training, if it
doesn't make sense to preserve the ligature (which for cases like
this it probably wouldn't).

If your university has an old printing press, go visit it and find
someone to show you around - it's great fun!

Nick

-- 
-- 
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en

--- 
You received this message because you are subscribed to the Google Groups 
"tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.

Reply via email to