-----BEGIN PGP SIGNED MESSAGE----- Hash: SHA256 André,
On 11/18/16 3:50 AM, André Warnier (tomcat) wrote: > On 18.11.2016 05:56, Christopher Schultz wrote: >> Since UTF-8 is supposed to be the "official" character encoding, > > Now where is that specified ? As far as I know, the default > charset for everything HTTP and HTML-wise is still iso-8859-1, no ? > (and unfortunately so). I apologize for the sloppy language: this particular vendor's service claims that UTF-8 if the standard *for their service*. Not for HTTP in general. >> The vendor has responded with (paraphrasing) "it seems we don't >> completely follow this standard; we're considering what to do >> next, which may include no change". This is a big vendor with >> *lots* of software clients, so maintaining backward compatibility >> is going to be a big deal for them. I've got some tricks up my >> sleeve if they decide not to change anything. Hooray for specs. >> :( > > What I never understood in all that, is why browsers and other > clients never seem to respect (and servers do not seem to enforce) > what is indicated here : > > https://www.ietf.org/rfc/rfc2388.txt 4.5 Charset of text in form > data > > This would be a simple way to get rid of umpteen character > set/encoding issues encountered when trying to interpret <form> > data POSTed to web applications. The problem is that application/x-www-form-urlencoded doesn't give a client a natural way to specify the character encoding, and a/xwfu can be used inside of a multipart/form-data package as well. You've just moved the problem from the Content-Type of the request to the Content-Type of the *part* of the multi-part request. Nothing has been solved by using multipart/form-data. And browsers certainly DO use that, but almost exclusively for things like file-upload, since files tend to be very big already, and urlencoding a bunch of binary bytes makes the file size increase quite a bit. > It seems to me contrary to common sense that in our day and age, > the rules for this could not be set once and for all to something > like : > > 1) the default character set/encoding of HTTP and HTML is > Unicode/UTF-8 (instead of the current really archaic iso-8859-1) 2) > URLs (including query-strings) should be by default interpreted as > Unicode/UTF-8, encoded as per > https://tools.ietf.org/html/rfc3986#section-2 3) for POST requests > : - for the Content-type "application/x-www-form-urlencoded", > there SHOULD be a charset attribute indicating the charset and > encoding. By default, this is "text/plain; charset=UTF-8" Don't forget, charset == encoding. The text/plain is the MIME type, and that's already been defined as application/x-www-form-urlencoded. Somewhere it should just explicitly say "a/xwfu" must contain only ASCII bytes, and always encodes a text blob in UTF-8 encoding. But it will never happen (see below). > - for the Content-type "multipart/form-data", each "part" MUST have > a Content-type header. If this Content-type is a "text" type, then > the Content-type header SHOULD contain a charset attribute. If > omitted, by default this is "charset=UTF-8". > > and be done with it once and for all. Right: once and for all, for new clients who implement the spec. All old clients, servers, proxies, , etc. be damned. It's just not possible due to the need to be backward-compatible with really weird stuff like "smart" toasters and refrigerators, WebTV (remember that?) and all manner of embedded devices that will never be updated. What we really need is a new header that says "here's everything you need to know about encoding for this request" and clients and servers who both support that header can use it. All other uses need to fall-back to this old and nasty heuristic. - -chris -----BEGIN PGP SIGNATURE----- Comment: GPGTools - http://gpgtools.org Comment: Using GnuPG with Thunderbird - http://www.enigmail.net/ iQIcBAEBCAAGBQJYLyYAAAoJEBzwKT+lPKRYF90QAJNOyadgrG7DDyWLSuFfKkep VAoc5yziddaHoTKpcExGrEB+LV5gJ35XR2Q+CiOCNoTR1O3oOJyflk2s8e+lqeZ9 2rqIlauOwwWC13dfwpcOENkeC3eyHn85d3NkuuFsqvqRl+Wuv4qvqRiv/kos723i cKmgqbAE9zRjNxuIqym3J8m6BhwzJGN3HqtiUueTYphChW81V10hc8XElJEPDbAH eGpdunp8eu4pbi36RZV5r2nZU2yHZVDd+HJnTFG4WJ/NvHODuJsR39fB+GANI0QJ +OHS9b7Wpcl2eCPs8geVTSqe57vDBrhymFjIUorPuQeW0SxrwDJMdTJ4zYtqnY2B fD7u9Lvo+RT/eskIcdFGVq5xUEBr2OIfx2XO2V7VlA52x+WJ421TLFRUQq67Un40 yDsPXEBHMVar2cyG2wOJsb/t6ndlCY30b1FPOD2zrg1XFxxzjaOCwUtZXqgX7sfu H1Dalbg4S/8vPS5Yrd7ZHk4RgYr5GGMBcK01KC07Q/TrOFkw9ssqvfQTyl30jxZ/ /x74KMRAbJVsUuhJ0i8QLM0KqPMpJ9wP9jwQF4YFUFwTDp6xBa/FRVAXCmJQxKom JFCky4YhVvOGVOK2iwDDQRJee1ahz0V+maJii1fSHVYMCrWrzGNZ6LMeuZAsovs0 ZjotO2X+XAPpLwczn6tI =7oxR -----END PGP SIGNATURE----- --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org