Hi, everybody. Thanks for the answers ! Just to make myself clear:
1. Always to set request charset before doing anything else fixes the bug; 2. When the bug is "on", only input data (request) is wrong. Previously utf-8 encoded data is rendered right (response). At least, Firefox says that the pages were using UTF-8 as encoding. Andre: On Wed, 2009-07-08 at 18:14 +0200, André Warnier wrote: > > > That is only one of the issues (browser inconsistencies). Inconsistencies ? In Microsoft IE ? Never ! ;-) > > If you want to really tackle this complex issue, you need to be > systematic, make sure you understand the bits and pieces, and do > everything right. > A short overview : > > 1) choose Unicode/UTF-8 as your charset/encoding, for *everything*. > Don't try to mix and match, or you'll get in trouble. Promise. Checked. > > Applying #1 above : > > 2) find out the available "locales" on the Linux host where you run this > Tomcat. > "locale -a | more" > Pick one locale that has "utf8" in the name, note its name. > In the system script that starts Tomcat, add > export LC_ALL="pt_pt.u...@euro" > (or whichever locale you have chosen) > That sets the "system locale" for the JVM that runs Tomcat, and is a way > to make it independent from whatever may be the system's configured > "default locale". I'll change any starting script to set this before Tomcat get running. I've used to use LANG=C or JVM System properties directly (like file.encoding, user.???? and etc). > > 3) All your html pages should have a declaration like : > <meta http-equiv="content-type" value="text/html; charset=UTF-8" /> Checked. > > 4) All your html <form> tags should have an attribute : > accept-charset="UTF-8" I'll change the jsp files to include this. > > 5) a URL is in no particular charset. A URL is *bytes*. > Any byte in a URL, that is not (generally speaking) such that it can be > represented by an ASCII letter a-zA-Z0-9, will be encoded as %xy, where > xy is the hexadecimal representation of this byte. > After decoding these %xy things, the result is again bytes, and that's > how your application sees it. Ok. I think that is nothing like that in this webapp. > > 6) In your application, you can decide to interpret this series of > bytes, as a string in the UTF-8 encoding, and decode it as such into > Unicode *characters*. > Forget about any parameters to specify the charset of URLs, they only > confuse things totally. > The only way you know what was the underlying encoding, is when you know > for sure that all URLs that will hit your server, come from a known > source of which you controlled the encoding. ? > > 7) When submitting the values of the <input> tags of a form, browsers > will generally respect the basic encoding of the html page in which the > form was included, and (usually) also the "accept-charset" attribute. > By specifying both, you almost always win, as long as the submitted form > comes from your application, and has the right encoding. Ok. > > 8) In theory, you should also make sure that all responses sent by your > server to a browser, if they are html pages, contain the correct HTTP > header : > Content-type: text/html; charset=UTF-8 > That, you can check with a browser add-on such as > - LiveHttpHeader for Firefox > - Fiddler2 for IE > and examine what goes out and what comes in. > You can also use Wireshark. > The good news is that most webservers do this correctly. > The bad news is that IE usually ignores this header, and makes its own > decision as to what the content is. Newer IE versions may be better. Ok. Page properties (in Firefox) is showing UTF-8 as encoding. > > 9) Java's internal charset is Unicode. > So when you do request.getParameter(), you will always get what Java > considers to be the proper Unicode translation of how the parameter came in. > The problem is to not let Java get confused about what it receives from > the browser. By doing all the above, you minimise the chances that it > will be confused. Ok. > > 10) If you want to really make sure, include in all your forms some > hidden input value, containing a known string with "accented" characters > (áàéèÜÖ and such). > Then, before you process any other parameter in your webapp, check if > that string matches one that you have defined in your servlet. > If it does not, then something is wrong. > Ok. --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org