On Sunday 06 May 2012 12:01:02 Werner LEMBERG wrote: > Ideally, there should be a proper ToUnicode cmap in the PDF so that > copy and paste gives good results. On the PostScript side, it should > be theoretically possible to use the `GlyphNames2Unicode' dictionary > (an undocumented Adobe Distiller extension) so that PS->PDF software > can provide non-standard mappings. Right now, I haven't found a full > example code for that. > > However, the gropdf driver could directly add support for that... > Deri? > > > Werner
Werner, I've just had a cursory look at the pdf reference (1.4 - the one with a proper index!) and it looks like the example given on page 371 is very close to what we would need. I think this cmap could be used with groff encoding as given in text.enc:- /CIDInit /ProcSet findresource begin 12 dict begin begincmap /CIDSystemInfo << /Registry (Adobe) /Ordering (UCS) /Supplement 0 >> def /CMapName /Adobe-Identity-UCS def /CMapType 2 def 1 begincodespacerange <0000> <FFFF> endcodespacerange 2 beginbfrange <0020> <007f> <0020> <008b> <008f> [<00660066> <00660069> <0066006c> <006600660069> <00660066006C>] <- ligatures at 139-143 <00ad> <00ad> <002d> <- change minus to hyphen endbfrange endcmap CMapName currentdict /CMap defineresource pop end end Perhaps not ideal since this is tied to 'text.enc', in fact this ToUnicode CMAP is only embedded if the Groff font specifies encoding as 'text.enc' in its font file. Is there a better way of doing this? Are there other codes which should also be mapped here? (Quotes?) (NB I still intend to use code from 'dvipdfmx' to do the font subsdetting at some point, which I believe includes a ToUnicode CMap, so this is a temporary solution.) Attached is a small pdf showing this cmap in use. Generated from:- .sp 1i Finally we finished ffirst playing the flute. .br Now we test \- minus. It should copy and paste expanding the ligatures (NB TR font does not have ffi or ffl defined), and also search properly when viewing the pdf. Cheers Deri
fi.pdf
Description: Adobe PDF document