Hi Jiang,
Anyone feel like trying out the following patches?
https://gist.github.com/jjgod/0d4b6339d761a5423f82
Patch 1 will fix the ToUnicode generation for all non-subst glyphs in
a non-XeTeX generated dvi. In our case, non-subst glyphs are the
glyphs that are *NOT* changed by applying OpenType vertical layout
features. To fix this I merely applied the same strategy I used to fix
XeTeX generated PDF documents. It's a small change and should be safe
to commit.
In patch 2 I tried out a more interesting approach and it supersedes
the efforts in patch 1: this patch utilize the cmap we provided in the
map file to do CID -> Unicode lookup. In your test case the cmap is
UniSourceHanSansJP-UTF16-V.
To do this reverse lookup I have to extend CMap structure a little bit
to store reverse mapping information, with that it's quite easy to
simply generate ToUnicode stream with all used cids. The commit
message has more details.
Since these patches are relatively experimental I'm hesitate to commit
them right away, would be nice if any of you can try it out or review
them. With my limited testing it doesn't show any problem and we can
produce perfectly copyable PDF from the sample.dvi in this test case.
Hi Jiang,
Please commit both of patch1 and patch2!! I think that is great.
Users of W32TeX know that it is always experimental.
BTW, there is late declaration of a variable in c99
around line 1123 in tt_cmap.c:
for (j = 0; j < 8; j++) {
unsigned int cid;
gid = 8 * i + j;
if (!is_used_char2(used_glyphs, gid))
continue;
cid = cff_charsets_lookup_inverse(cffont, gid);
int ch = CMap_reverse_decode(cmap_loaded, cid);
--->
for (j = 0; j < 8; j++) {
unsigned int cid;
int ch;
gid = 8 * i + j;
if (!is_used_char2(used_glyphs, gid))
continue;
cid = cff_charsets_lookup_inverse(cffont, gid);
ch = CMap_reverse_decode(cmap_loaded, cid);
Thanks,
Akira
--------------------------------------------------
Subscriptions, Archive, and List information, etc.:
http://tug.org/mailman/listinfo/xetex