Re: [XeTeX] Latin Modern, from TFM to Unicode

Adam Twardoch (List) Wed, 12 Jun 2013 12:00:30 -0700

Doug,

if you think of the TFM slot indices as "glyph indices" rather than"character codes", then possibly, you can find a 1:1 mapping of all TFMindices to glyph IDs in the OTF. But not to Unicode codepoints. If yourmethod of drawing glyphs on screen allows you to address glyph IDsdirectly (e.g. using FreeType or other such library which allowslow-level addressing of glyph IDs within an OTF font file), then youshould be able to achieve it.

However -- I personally don't know which glyphs have this correspondenceor whether the ones in the OTFs have the same repertoire or metrics.You'd probably be best to contact the GUST e-foundry project membersdirectly:

http://www.gust.org.pl/projects/e-foundry

Unfortunately, apart from http://www.gust.org.pl/contact-info I don'tsee any easy way to contact them using public channels.


Regards,
Adam



On 13-06-12 21:32, Doug McKenna wrote:

Thanks for all the responses.

I understand the distinction between Unicode characters (code points) and
glyphs, and that an OpenType font can have glyphs in it that do not
correspond to any Unicode code points.  I don't quite get whether or how
those non-Unicode glyphs are subject to being found via the 'cmap' table,
or whether they have glyph IDs that are known or can be determined by
some documented convention outside the OpenType font file.  Or whether
they are part of some internal ligature-like structure that only the
OpenType font has information about (which might mean that the glyph IDs
can change internally from one release to the next of the OT font).

Arthur Reutenauer responded:

These glyphs or parts of glyphs can probably be mapped one-to-one to font

slots in the

original lmex10, but that does not make them characters.

Understood about not being characters.  But it's that one-to-one mapping
from each slot in TFM to an equivalent slot in OpenType (for Latin
Modern) I'm interested in pinning down (hopefully not "probably").  It
certainly appears that every glyph represented by "lmex10.tfm" can be
found in the "Latin Modern Math" font file, though I haven't gone through
all 128 trying to find where they appear in the OT font.

Khaled Hosny wrote:

[snip numerous good explanations]

Thanks.  I understand better what's going on inside the OpenType font,
and can now imagine how FontBook is figuring out which glyphs are not the
targets of the 'cmap' table's Unicode code point inputs.  And I
understand that the math extension font contains glyphs for different
sizes of the same symbol, but kept in different slots with different
glyph indices (if that's the right term) in the TFM file.

I"m not sure what do you want to achieve, and you might be asking the wrong

question,

so it might be better to elaborate more on your actual goal.

I have my own homebrew math layout system that determines where to place
math glyphs based on information in the lmex10.tfm and other TFM files.
For reasons peculiar to my needs, I'm not interested in creating PDF or
DVI output.  I just want to draw a math glyph on my screen using "Latin
Modern Math" at a computed position, based on where TeX would place it
using the metrics in "lmex10.tfm" or other TFM file (the extent to which
I'm accurately simulating TeX is a side-issue, but I'm trying hard).  My
assumption was that the glyphs in the OT file are the visually the same,
and have the same metrics/bounding boxes, etc. as the original TFM
metrics.  Or if they don't have quite the same metrics, the differences
are not going to change over time with new versions of the OT font.

I assumed that every one of the 128 glyphs represented by slots in
lmex10.tfm would be found in the OpenType font "Latin Modern Math", along
with lots of other glyphs.  I had thought that all the glyphs in the OT
font had Unicode character designations, but have now understood that
that is not a good assumption.

Consider the radical sign.  In the TFM file, there is information that
TeX uses to determine which final glyph(s) to use, based on the height of
the box of whatever's underneath the radical.  So TeX chooses the glyph
in slot "70 for small height, or the glyph in slot "71 for medium height,
or the one in slot "72 for large height, or slot "73 for even larger
height.  If none of those fixed-height glyphs are high enough, presumably
TeX goes into a tall symbol construction algorithm based on data within
the TFM file, using glyphs representing pieces of radical signs, kept in
slots "74, "75, and "76.

Using FontBook, in the "Latin Modern" OpenType file, the glyph for the
official Unicode code point U+221A SQUARE ROOT is glyph ID #2839.  So
that's a "character" I suppose.  The 'cmap' table maps that Unicode value
to that glyph ID and it can be drawn as a character would.  But there are
also non-Unicode glyphs for partial radical signs, all of which look
identical to the glyphs shown by /fonttable for "lmex10.tfm" (which are
taken from some PFB file).  In particular, I've figured out by inspection
the following partial answer to what I'm interested in:

small radical    TFM slot "70 ==> OTF glyph #2843 (no Unicode designation)
medium radical   TFM slot "71 ==> OTF glyph #2844 (no Unicode designation)
large radical    TFM slot "72 ==> OTF glyph #2845 (no Unicode designation)
larger radical   TFM slot "73 ==> OTF glyph #2846 (no Unicode designation)

radical bottom   TFM slot "74 ==> OTF glyph #2840 (U+23B7 RADICAL SYMBOL
BOTTOM)
vertical bar     TFM slot "75 ==> OTF glyph #2841 (no Unicode
deisignation)
top corner       TFM slot "76 ==> OTF glyph #2842 (no Unicode
deisignation)

So given that there are partial glyphs useful for building very large
radical signs in "Latin Modern Math", and given that most, though not all
of them, have no official Unicode code point assigned to them, how does
an outside process that wants to use the OT font to draw a very large
radical sign tell the font what to draw.  Since there's no mapping from
Unicode, then the outside process either needs to know the absolute glyph
IDs inside the font, or it needs to cause the font to go into some
internal construction mode, like building a ligature, where the font
itself knows the sequence and position of the glyphs to use to construct
the tall symbol.  The latter seems impossible, because the font can't
know the threshold height at which to stop construction.  The former
means hard coding internal glyph IDs somewhere outside the font, which
I'm hoping is not fragile, but worrying might be.

Sorry for the reams of details, but I'm trying to be explain my confusion
exactly.


Doug McKenna



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
   http://tug.org/mailman/listinfo/xetex



--

May success attend your efforts,
-- Adam Twardoch
(Remove "list." from e-mail address to contact me directly.)



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Re: [XeTeX] Latin Modern, from TFM to Unicode

Reply via email to