Mark Thomas wrote:
On 21/01/2010 06:55, André Warnier wrote:
Mark Thomas wrote:
The authorisation header is base64
encoded so it is automatically compliant with RFC2616.
Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)
I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about
one another, and that the server gets such an Authentication header from
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a
string in Java (which is Unicode). Which encoding should be chosen to
decode the byte array ?
If you use the default platform JVM encoding, you are making the
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which
encoding this string was, before it encoded it using Base64, or did it ?
RFC2617 to the rescue...
basic-credentials = base64-user-pass
base64-user-pass = <base64 [4] encoding of user-pass,
except not limited to 76 char/line>
user-pass = userid ":" password
userid = *<TEXT excluding ":">
password = *TEXT
*TEXT is defined in RFC2616
TEXT = <any OCTET except CTLs,
but including LWS>
and finally
OCTET = <any 8-bit sequence of data>
CTL = <any US-ASCII control character
(octets 0 - 31) and DEL (127)>
So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.
Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.
Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.
Mark
Let me be even more pernickety :
According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT
portions representing character sets other than US-ASCII.
But then, such header field values MUST be encoded according to the
rules of RFC 2047.
RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this
should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :
Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for
the charset.)
(NDLR: That is something one does find regularly in email headers; but I
have never seen it used in HTTP headers until now.)
On the other hand, regarding authentication mechanisms, RFC 2616 refers
to RFC 2617, which itself indicates the following format for an
authorization header sent by the browser to the server :
Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==
When base64-decoded, the above string should look like "userid:password".
I did not find in RFC 2617 any specific mention of character set
encoding, but it itself refers back to RFC 2616 as being the "base
rules". And the base rules in RFC 2616 seem to be that header values are
US-ASCII unless otherwise indicated.
In other words, my contention is as follows :
- if the "userid:password" above contain only US-ASCII characters, then
the above simple form of the header is fine.
- if the "userid:password" string above contain characters other than
US-ASCII however, then they should be further encoded, using the rules
of RFC 2047.
This would mean that you should have something like :
Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=
(or, maybe, the other way around : it is the
"QWxhZGRpbjpvcGVuIHNlc2FtZQ" string which, when base64-decoded, should
yield a new string of the form
"=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=", which should then be decoded
once more to give the "userid:password" string).
Now, I am not sure that if you pass such a HTTP header, encoded as
above, from Apache to Tomcat, that the Tomcat getHeader() call will
properly decode it, using the indicated charset.
And I am not sure either that there exists any browser on the market
that will encode a userid:password string that way.
---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org