On 02/16/2011 11:37 PM, Andy Black wrote: > Hussein: > > One of our users recently ran into a frustrating situation. He had a > <link> element whose href attribute was set as a relative URI to a sound > file in MP3 format. The sound file was on his hard disk in a > subdirectory of where the main file is located. (This is for the > XLingPaper custom configuration files we have for use with XXE, but I > suspect that is not a factor.) The file name included accented vowels. > As an example, one was > > ɔ̀kɔ́ àbɔ̀n.mp3 > > (The name may not come through correctly in this email. The Unicode > characters are: U+0254 U+0300 k U+0254 U+0301 (space) a U+0300 b U+0254 > U+0300 n . m p 3.)
Surprising! Why, for example, express agrave as U+0061 U+0300 (that is, Letter 'a' followed by Combining Diacritical Mark '`') when there is a Unicode character for agrave: U+00E0? > > When he used the Browse Files... option in the Attributes Editor of XXE, > the result was > > %C9%94%CC%80k%C9%94%CC%81%20%C3%A0b%C9%94%CC%80n.mp3 > > I understand that this is the same file name using percent-encoding > (with UTF-8 encoding values) for those requiring it with the exception > that the acute a in the file name is two Unicode characters (U+0061 > U+0300) while the acute a in the URI is one Unicode character (U+00E0). > Apparently, this difference is crucial. > > The problem becomes clear when the user's XML file is converted to > either a web page output or a PDF output and the user clicks on the > link. The browser or PDF reader indicates that it will look for the > correct file name (at least, it looks correct - one can see the acute a, > for example), but these applications report that they could not find the > file. Looking at the web page, the file name is exactly as the URI > returned from the Attributes Editor as given above. Similarly for the > PDF file. So why is it that everything looks good, but these > applications say that they cannot find the file? > > From what we can tell, the problem is for characters like the acute > accented a. The file system on the user's hard drive Probably Mac OS X HFS+. See http://stackoverflow.com/questions/3610013/file-listfiles-mangles-unicode-names-with-jdk-6-unicode-normalization-issues > uses the decomposed > form (NFD) of the acute a (i.e. it is U+0061 U+0300) while the result of > the Browse Files... option in the Attributes Tool (U+00E0) uses the > composed form (NFC). When the composed form is used by a web browser or > PDF reader, there is a mismatch to the file name on the hard drive so > the file cannot be found. > > Is this a known issue with the Browse Files... dialog box in the > Attributes Tool? That is, is it known that this tool converts NFD format > to NFC? Is there a preferences setting that can be set to control this? > Is there some other work-around available? > XXE simply uses the characters of the filenames passed to it by the Java runtime. Therefore it's a Java issue and not an XXE issue. Java does not seem to keep the original decomposed form of the characters. I don't know any (simple) workaround. -- XMLmind XML Editor Support List [email protected] http://www.xmlmind.com/mailman/listinfo/xmleditor-support

