On 2/23/11 9:27 AM, Meho R. wrote:
That page provides an algorithm that is supposed to be used to convert
glyph names into sequences of Unicode character points.  If your glyphs
are named according to it, and if the PDF software follows it too, then
PDF software is supposed to be able to figure out what characters a glyph
represents for search purposes, even if it's not a standard ligature.  Of
course, there's no guarantee that a given package really will support the
rules properly.

Thanks for the link. However, even Adobe's OTF fonts have same problems
when used with XeLaTeX regarding ligatures and searchability, so I don't
think it is a naming convention issue. Curiously, when OTF fonts are
used with Scribus and ligatures are inserted manually, they are
recognized in PDF and no searchability issue there. Also, when OTF fonts
are converted to TTF, it seems searchability issue is gone with XeLaTeX
too (at least seems like that for couple of fonts I just tested).

This is an old problem. Whether or not search works properly in the most widely used PDF readers depends on the font. As noted here, the ligature glyphs have got to be properly named so that the PDF software can derive the names of the component glyphs. But *in addition* it is necessary that the ligature glyphs be

1. unencoded; or
2. one of the few ligatures with Unicode encodings (fi, fl, etc.).

If non-standard ligatures have been assigned code-points in the font's Private Use Area, most PDF software won't try to analyze the glyph name. The thinking seems to be: if the point of such analysis is to derive one or more Unicode encodings, what's the point if the glyph already has one?

At least one person has mentioned to me that some PDF software does better, but I haven't observed this myself. And workarounds have been discussed on this list.

Peter



--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
 http://tug.org/mailman/listinfo/xetex

Reply via email to