> On the PostScript side, it should be theoretically possible to > use the `GlyphNames2Unicode' dictionary (an undocumented Adobe > Distiller extension) so that PS->PDF software can provide > non-standard mappings. Right now, I haven't found a full > example code for that.
An interesting point. I've played around a bit with this and the difficulty I've had is getting ghostscript to actually emit a ToUnicode map. I've managed it only by hacking the ghostscript source (making pdf_simple_font_needs_ToUnicode() always return true). Additionally, I've added a GlyphNames2Unicode dictionary to Courier's FontInfo, like this: /GlyphNames2Unicode << /quoteright 16#0027 /quoteleft 16#0060 /minus 16#002d >> def (FontInfo is in the "visible" part of the font file, so no disassembly is required. grops could insert this while reencoding, but I'm against doing this unconditionally.) With these changes selecting the Courier text in the resulting PDF in acroread returns ASCII code points. Another point: GlyphNames2Unicode appears to only support single Unicode points, so we can't map one glyph to a sequence of characters, as would be desirable for uncommon ligatures. But for copy-and-pasting command lines it should be enough.
encoding.pdf
Description: Adobe PDF document