Hi Heiko, and Akira, On 06/11/2011, at 3:55 AM, Heiko Oberdiek wrote:
> \special{% > pdf:ann width 4bp height 2bp depth 2bp<<% > /Type/Annot% > /foo/ab#abc > /Subtype/Link% > /Border[0 0 1]% > /C[0 0 1]% blue border > /A<<% > /S/GoToR%% > /F(t.tex)% > /D<66f6f8>% > % Result: <66f6f8>, but ** WARNING ** Failed to convert input > string toUTF16... > % /D<c3a46e6368c3b872>% > % Result: <feff00e4006e0063006800f80072> > >>% > >>% > }% I've verified that this is indeed what happens, with This is XeTeX, Version 3.1415926-2.2-0.9997.4 (TeX Live 2010) Now looking at the source coding, at: http://ftp.tug.org/svn/texlive/trunk/Build/source/texk/xdvipdfmx/src/spc_pdfm.c?diff_format=u&view=log&pathrev=13771 it is hard to see how those results can occur. The warning message is only produced when the function maybe_reencode_utf8(pdf_obj *instring) returns a value less than 1 (e.g. -1) viz. lines 571--578: function: modstrings >>> } >>> else { >>> r = maybe_reencode_utf8(vp); >>> } >>> if (r < 0) /* error occured... */ >>> WARN("Failed to convert input string to UTF16..."); >>> } >>> break; or lines 1145--1150 (for pdf:dest but not actually used here) >>> #ifdef ENABLE_TOUNICODE >>> error = maybe_reencode_utf8(name); >>> if (error < 0) >>> WARN("Failed to convert input string to UTF16..."); >>> #endif >>> array = parse_pdf_object(&args->curptr, args->endptr, NULL); Now that function should find only ASCII bytes in '<66f6f8>' and '<c3a46e6368c3b872>' . In both cases the string should have remained silently unmodified. viz. lines 474--481 function: maybe_reencode_utf8 >>> /* check if the input string is strictly ASCII */ >>> for (cp = inbuf; cp < inbuf + inlen; ++cp) { >>> if (*cp > 127) { >>> non_ascii = 1; >>> } >>> } >>> if (non_ascii == 0) >>> return 0; /* no need to reencode ASCII strings */ What am I reading wrong? If anything. Has there been an earlier de-coding of <....> hex-strings into byte values, done either by XeTeX or xdvipdfmx ? If so, then surely it is this which is unneccessary. (Not XeTeX, since the string is correct in the .xdv file.) e.g. function pst_string_parse_hex in pst_obj.c seems to be doing this. But that is only supposed to be used with coding from cmap_read.c and t1-load.c . And these are only meant for interpreting the font data that goes into content streams. So I'm at a loss in understanding this. But 'modstrings' is applied recursively, and part of it seems to be checking for a CMap (when appropriate?). So maybe there is an unintended un-encoding that precedes an encoding? > > It seems that *all* literal strings are affected by the > unhappy reconversions. But the PDF specification lets no choice, > there are various places for byte strings. > In the example, if a file name has byte string XY and the destination Z, > then the file name is XY and the file name Z and nothing else. Otherwise > neither the file or the destination will be found. > > Thus either (XeTeX/)xdvipdfmx finds a way for specifying arbitrary > byte strings (at least for PDF strings(/streams)) -- it is a > requirement of the PDF specification. Or we have to conclude > that 8-bit is not supported and that means US-ASCII. > > Yours sincerely > Heiko Oberdiek Hope this helps --- or you can help me :-) Cheers, Ross ------------------------------------------------------------------------ Ross Moore ross.mo...@mq.edu.au Mathematics Department office: E7A-419 Macquarie University tel: +61 (0)2 9850 8955 Sydney, Australia 2109 fax: +61 (0)2 9850 8114 ------------------------------------------------------------------------ -------------------------------------------------- Subscriptions, Archive, and List information, etc.: http://tug.org/mailman/listinfo/xetex