Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials

André Warnier Wed, 26 Jun 2013 08:41:12 -0700

Shanti Suresh wrote:

Hi Chris,


This is such an interesting discussion.  I am not sure what to make of this
person's comment:

-------------------
TAXI   2012-10-09 09:03:59 PDT

Wow, no fix since 8 years...

And this is a real bug: If the HTTP header says the file is encoded in
ISO-8859-1 the common way to override this with HTML is:

<meta http-equiv="Content-Type" content="text/html; charset=utf-8">

Firefox reads the body in UTF-8 then, which is fine, but the charset
used in forms is still ISO-8859-1, so you have to add
accept-charset="utf-8" to the form just for firefox (other browser
automatically use UTF-8 or send the charset with the content-type).

So: Why the hell is nobody fixing this bug?
---------------


So the questions I have are:
(1) Firefox is not properly sending UTF-8 in the POST request even if it
reads the HTML page in UTF-8?  And other browsers are now sending
"charset=utf-8" based on the the HTML META tag?
(2) Firefox has started respecting the accept-charset="utf-8" attribute in
forms now such that it adds charset to the Content-Type header of the POST
request?   I'm confused.  I thought Mozilla was not going to fix  this
issue.

Thanks for any clarifications.


I think that you are still confused.. :-)
(As are, in part, some of the people who posted on that Mozilla bug).

(1) browsers, in general, are *not* sending a "charset" attribute in their POSTsubmissions (whether form-url-encoded or multipart).This is a real pity, because it is the source of much confusion, and the real reason whyservers have to go through loops to figure out (or force) the character set/encoding ofthe data that they are getting from browser POSTs.And the Mozilla people seem to say that it is that way, because when they tried to addthis "charset" attribute, it broke a number of server applications at the time (8 yearsago), and they see no reason to think that it would not still be the same today, so theyarer not trying it again.

(1a) what browsers *will* do, in general, is to send POST data in the same characterset/encoding as the one of the HTML *page* which contains the form being posted.But, even when sending UTF-8 encoded data according to this principle, they are *not*indicating that it is UTF-8 data, which is basically wrong, because the standard HTTP/HTMLcharacter set is iso-8859-1, and they *should* indicate it when that is not what they aresending. But that is the reality.

(2) the "accept-charset" attribute of a <form> does not mean that this <form> will *send*data according to that charset/encoding. It indicates that any data that is entered inthe form's input boxes will be interpreted as being in that charset.So the fact of adding an "accept-charset" attribute to your <form> tags does not make itso that the browser will magically change its behaviour when POSTing data.

In other words, it's a mess, and the mess is mainly due to some lack of precision in theoriginal RFC's, but it is being perpetuated now by the fear of browser developers ofbreaking server applications by doing things right.Which is rather funny in a way, considering all the things that browser developers do allthe time anyway which do break existing applications.


We really need an RFC for HTTP 2.0, with UTF-8 as the default charset/encoding.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: [slightly OT] FORM based authentication and utf-8 encoding of credentials

Reply via email to