-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 André,
On 5/18/2010 4:18 AM, André Warnier wrote: > Among other bizarre things, it does mean that in a URL, one has to > encode the hostname using one method, and the rest of the URL using > another method. As if encoding issues were not already complicated enough. > > This, and a lot of other non-USASCII encoding issues all through the > web, point to the real need to move to a fully Unicode/UTF-8 based web > infrastructure. It rather puzzles me why this does not seem to be a > major topic of discussion in forums such as this one. +1 Unfortunately, you can't guarantee how the browser will interpret your URL, no matter how you encode it. If you do a simple UTF-8 dump of the URL (that is, no %-encoding, no nothing), you run the risk of using "illegal" characters as far as the browser is concerned. Technically speaking, this is impossible because MIME headers must be in US-ASCII, so the ñ is not legal, therefore it must be encoded. The next question is how to encode it. The Unicode code point for ñ is 00f1, so a simple encoding might be %F1, but the browser is then free to decide if that character is ñ (straight unicode) or something else (not sure what... ISO-8859-1 0xf1 is also ñ). If you encode it in UTF-8, you ought to get 0xc3b1 which should be encoded into the URL as %c3%b1. Both of these encodings (ISO-8859-1 and UTF-8) were observed by the OP under various circumstances. The problem is that the rules and practice for browser behavior are ... unclear. Even in the presence of such rules, every browser I've seen has a setting for "use UTF-8 for URL encoding" and the default settings are, I'm sure, inconsistent between browsers and even versions of the same browser. Since the user can always choose to override what the server expects, it's probably best to restrict your URLs to US-ASCII. For this reason, we have stopped using GET for any requests that could reasonably be expected to contain non-US-ASCII data (such as FORM submissions), but this may mean that you have to "misspell" certain words (such as niño) in file names and paths. For my money, this URL should be the one to use (at least when ignoring the "standard" for internationalized domain names mentioned elsewhere in this thread): http://www.coru%c3%b1a.es. Good luck, - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.10 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAkv1obkACgkQ9CaO5/Lv0PBpBwCfamhsEyA+4Zf/srgt+BUrTu00 mfQAoL98xsYx470lIPljlqM2qbpJmpDB =2PlI -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org