Mark Thomas wrote:
On 21/01/2010 06:55, André Warnier wrote:
Mark Thomas wrote:
The authorisation header is base64
encoded so it is automatically compliant with RFC2616.

Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about
one another, and that the server gets such an Authentication header from
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a
string in Java (which is Unicode).  Which encoding should be chosen to
decode the byte array ?
If you use the default platform JVM encoding, you are making the
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which
encoding this string was, before it encoded it using Base64, or did it ?

RFC2617 to the rescue...

      basic-credentials = base64-user-pass
      base64-user-pass  = <base64 [4] encoding of user-pass,
                          except not limited to 76 char/line>
      user-pass         = userid ":" password
      userid            = *<TEXT excluding ":">
      password          = *TEXT

*TEXT is defined in RFC2616

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

and finally

       OCTET          = <any 8-bit sequence of data>
       CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark

Let me be even more pernickety :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXT portions representing character sets other than US-ASCII. But then, such header field values MUST be encoded according to the rules of RFC 2047.

RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that this should be done using the form :
encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=
(I am not quite sure here of the "utf-8" part as the correct name for the charset.)

(NDLR: That is something one does find regularly in email headers; but I have never seen it used in HTTP headers until now.)

On the other hand, regarding authentication mechanisms, RFC 2616 refers to RFC 2617, which itself indicates the following format for an authorization header sent by the browser to the server :

Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

When base64-decoded, the above string should look like "userid:password".

I did not find in RFC 2617 any specific mention of character set encoding, but it itself refers back to RFC 2616 as being the "base rules". And the base rules in RFC 2616 seem to be that header values are US-ASCII unless otherwise indicated.

In other words, my contention is as follows :

- if the "userid:password" above contain only US-ASCII characters, then the above simple form of the header is fine. - if the "userid:password" string above contain characters other than US-ASCII however, then they should be further encoded, using the rules of RFC 2047.
This would mean that you should have something like :

Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=

(or, maybe, the other way around : it is the "QWxhZGRpbjpvcGVuIHNlc2FtZQ" string which, when base64-decoded, should yield a new string of the form "=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=", which should then be decoded once more to give the "userid:password" string).

Now, I am not sure that if you pass such a HTTP header, encoded as above, from Apache to Tomcat, that the Tomcat getHeader() call will properly decode it, using the indicated charset.

And I am not sure either that there exists any browser on the market that will encode a userid:password string that way.


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to