Quote/Cytat - Nick White <[email protected]> (Mon 09 Dec 2013
01:29:18 PM CET):
On Sat, Dec 07, 2013 at 11:29:25AM -0800, Tom Morris wrote:
In watching Bryan Tarpley's Franken+ presentation
(http://emop.tamu.edu/node/54
) it's pretty obvious from the example that there are (at least)
two clusters
of glyphs for the letter 'o': a tall skinny glyph and a round glyph.
Good point Tom. It wasn't clear to me from that presentation whether
the glyphs had all been taken from the same document. I slightly
suspect skinny o and round o were from different documents, or that
they were different fonts in the same document, because to me they
don't look close enough to have been made with the same metal
characters. Granted early printing was rather more haphazard than
today's (excluding print on demand, obviously ;-) ), but I still
find it hard to believe that they would have used characters cut so
differently interchangably very often.
It depends what do you mean by "very".
In the dirty OCR of a 19th century the digit 1 occured in quite
strange places. I thought they are just OCR errors, but it appeared
that the printer used it instead of capital I, probably because of the
too small number of the proper types.
Best regards
Janusz
--
Prof. dr hab. Janusz S. Bień - Uniwersytet Warszawski (Katedra
Lingwistyki Formalnej)
Prof. Janusz S. Bień - University of Warsaw (Formal Linguistics Department)
[email protected], [email protected], http://fleksem.klf.uw.edu.pl/~jsbien/
--
--
You received this message because you are subscribed to the Google
Groups "tesseract-ocr" group.
To post to this group, send email to [email protected]
To unsubscribe from this group, send email to
[email protected]
For more options, visit this group at
http://groups.google.com/group/tesseract-ocr?hl=en
---
You received this message because you are subscribed to the Google Groups "tesseract-ocr" group.
To unsubscribe from this group and stop receiving emails from it, send an email
to [email protected].
For more options, visit https://groups.google.com/groups/opt_out.