-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 André,
On 6/26/13 11:40 AM, André Warnier wrote: > Shanti Suresh wrote: >> Hi Chris, >> >> This is such an interesting discussion. I am not sure what to >> make of this person's comment: >> >> ------------------- TAXI 2012-10-09 09:03:59 PDT >> >> Wow, no fix since 8 years... >> >> And this is a real bug: If the HTTP header says the file is >> encoded in ISO-8859-1 the common way to override this with HTML >> is: >> >> <meta http-equiv="Content-Type" content="text/html; >> charset=utf-8"> >> >> Firefox reads the body in UTF-8 then, which is fine, but the >> charset used in forms is still ISO-8859-1, so you have to add >> accept-charset="utf-8" to the form just for firefox (other >> browser automatically use UTF-8 or send the charset with the >> content-type). >> >> So: Why the hell is nobody fixing this bug? --------------- >> >> >> So the questions I have are: (1) Firefox is not properly sending >> UTF-8 in the POST request even if it reads the HTML page in >> UTF-8? And other browsers are now sending "charset=utf-8" based >> on the the HTML META tag? (2) Firefox has started respecting the >> accept-charset="utf-8" attribute in forms now such that it adds >> charset to the Content-Type header of the POST request? I'm >> confused. I thought Mozilla was not going to fix this issue. >> >> Thanks for any clarifications. >> > > I think that you are still confused.. :-) (As are, in part, some of > the people who posted on that Mozilla bug). > > (1) browsers, in general, are *not* sending a "charset" attribute > in their POST submissions (whether form-url-encoded or multipart). > This is a real pity, because it is the source of much confusion, > and the real reason why servers have to go through loops to figure > out (or force) the character set/encoding of the data that they are > getting from browser POSTs. It is a shame, but you're running into a hole in the spec: only text/* should have a charset, and we're not talking about text/* unfortunately. So, the spec technically forbids fixing the problem. > And the Mozilla people seem to say that it is that way, because > when they tried to add this "charset" attribute, it broke a number > of server applications at the time (8 years ago), and they see no > reason to think that it would not still be the same today, so they > arer not trying it again. It's more like 15 years ago at this point. But, how many times have we said "Tomcat is spec-compliant and your client is not, so you can go #@$*(& yourself"? The same is true here for the browser: they are being spec-compliant, even if it is rather stupid. It would be nice to introduce a new HTTP header like "Form-Content-Type" or whatever and it's always valid (because old, stupid software will ignore it and new software will respect it). > (1a) what browsers *will* do, in general, is to send POST data in > the same character set/encoding as the one of the HTML *page* which > contains the form being posted. It's worth pointing out that this is merely convention, and only because it obviously makes sense to operate this way. > But, even when sending UTF-8 encoded data according to this > principle, they are *not* indicating that it is UTF-8 data, which > is basically wrong, because the standard HTTP/HTML character set is > iso-8859-1, and they *should* indicate it when that is not what > they are sending. But that is the reality. No, as much as it pains me to do so, I agree with with Mozilla folks on this one: adding a charset attribute to an application/x-form-urlencoded Content-Type violates the spec. There is no good solution. > (2) the "accept-charset" attribute of a <form> does not mean that > this <form> will *send* data according to that charset/encoding. It /should/. Why do you think it wouldn't? > It indicates that any data that is entered in the form's input > boxes will be interpreted as being in that charset. I think that's roughly the same thing. The browser would be stupid not to send the data using that character encoding. > So the fact of adding an "accept-charset" attribute to your <form> > tags does not make it so that the browser will magically change its > behaviour when POSTing data. Whether it's magic or not, accept-charset should cause the form to be sent in one of the charsets listed. If you only list one, it should be sent using that character encoding or the browser should give the user an error saying it doesn't know how to encode that way. All browsers know how to do UTF-8. > In other words, it's a mess, and the mess is mainly due to some > lack of precision in the original RFC's, but it is being > perpetuated now by the fear of browser developers of breaking > server applications by doing things right. Which is rather funny in > a way, considering all the things that browser developers do all > the time anyway which do break existing applications. > > We really need an RFC for HTTP 2.0, with UTF-8 as the default > charset/encoding. +1 Maybe they can clear-up Tomcat logging configuration while they are at it :) - -chris -----BEGIN PGP SIGNATURE----- Version: GnuPG/MacGPG2 v2.0.17 (Darwin) Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJRy0pNAAoJEBzwKT+lPKRYcPMP/05UTIVUYcKDE8iLKDagLsxK E5ugZLlmpPvV1seTiK7SelPp+w0gmNX6EcjKZsPjFkrHCxZWPImUl5NA741sJZyY G6OgZnnMvVmF08RrlcO2o7bA3/LtVec7W8Umm4VPBkX67oG2ng2MavI4egaf1unn kvFLCvUJPiwyt0DMU8w5sdQJuRJzOvFLwJcUAv70iEiqhri2urGcKvKe8OMBHU1m vM26eG2HDyKG0ZXtUiXk93YQCqwINIK4mF7uchcj4oV31b9gl+Yh8LnqvyN0aTqa JbE1821MmOo+EY95BdOQa2/XeFfBiMfFIOySTgpvBgHGcSpyFP+l8sNiP1kuugMa pP2s6nmrX5PfbXtQ3YB/trFVitXrK99+zVy94M3MI+QZkZU4sb0KXRbMQI5IX82A YhPVz0sEIBNLXWOCqGOlPltHary8Aai+V0E+/NSI1BUX9n56KuR0WInTKJ3SxOUc xMW7pdsWLPDNbBIa9MH2LOWgk8+DEdHNKteR6yvNlll83DqOFP69DZ5S04b+Q6fF XQ3zCF0VmKv5S2vApceEvtC4PVvcyG4M8QYu+yF4a9yPbr9iJNpEuQg0jpvS2wSN D18ujIY+ER15rAQ1rm8wE4bmRS5DQiRaLod8yMBBZ3Pzr+DYT75lzm0KhAKuoiDj YFAmg7S3Q646KXFcmg5y =LcjD -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
