-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA1 André,
André Warnier wrote: | I am sorry to butt in again, but are you *really* sure that the problem | is not earlier in the chain than what you think ? | I have read the article at the link given earlier : | http://wiki.apache.org/tomcat/Tomcat/UTF-8 | and I am quite sure that what is said in that article is wrong, or at | least incomplete. The article seems to assume that whatever the browser | sends is always iso-8859-1, and that at the server level you can then | just go and "decode" it into utf-8. That is wrong, I can assure you. You're right: you can't just assume that the incoming data is UTF-8. The problem is that browsers often do not send a Content-Type encoding string along with all POST requests. They /should/, but sometimes they do not. In these cases, the server is left to guess. Guessing is hard, but most browsers act somewhat predictably... | Browsers will send utf-8 if the right conditions are met, and you will | corrupt that data if you force it through a second encoding/decoding. | Browsers will also sometimes send iso-8859-1, if you are not careful or | if the browser is buggy. It happens. (iso-8859-1 is the default in | HTTP, so if you do not specify things differently, that is what you'll get). Most browsers will send request #1 in the same encoding that was used for response #0. That is, if a page is encoded in UTF8, then the encoding using to submit from that page (unless otherwise specified) will use the same encoding -- even if that encoding is not specified in the Content-Type header. | In an ideal world, when a browser sends a string parameter via a POST, | each parameter value should be enclosed in a part with a header and a | content. The header of the part should have a line | Content-type: text/plain; charset=xxxxx | and the content of that part should then be in that xxxx charset encoding. "parting" is not required, here. You just encode the whole POST with the same encoding, and use the standard Content-Type header including the encoding. Now, back to the server. No server should ever clobber an encoding specified by the client. The filter example on this page needs to be fixed so that the encoding is only set if one is not detected. This is a BIG BUG in the filter shown on that page, and someone should fix it (maybe I will... I just registered for the Wiki). If you /know/ that your pages are being sent in UTF-8 and you make a reasonable assumption that requests with no Content-Type encoding will use the encoding of the previous response, then the filter listed on the aforementioned page is acceptable (again, with a check for an existing content type encoding). | It is quite possible that Tomcat's innards do not do things correctly | when they decode a POST, and just deliver the raw parameter value as | received. But that would surprise me, and I would submit that it would | then be a bug. Tomcat does, in fact, decode the parameters properly. That's what the setCharacterEncoding parameter does -- it sets the character encoding that will be used by any Reader used to read the request's body. Your code does not have to do anything special. - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org iEYEARECAAYFAkhf4xkACgkQ9CaO5/Lv0PBWuACePccDgzP9kudNTq6v7d88qe98 KowAoILM6V+uJESshpiSQOGfAnvdDGA1 =4a8J -----END PGP SIGNATURE----- --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]