Re: Basic Authentication Failed with multibyte username

André Warnier Thu, 21 Jan 2010 06:12:00 -0800

Mark Thomas wrote:

On 21/01/2010 06:55, André Warnier wrote:

Mark Thomas wrote:

The authorisation header is base64
encoded so it is automatically compliant with RFC2616.

Yes, it sounds like you're right; my mistake.
(Also for Gabor, I admit my mistake.)

I agree that the HTTP header itself is correct.
But there is still somethig which puzzles me in the absolute.
Suppose that the browser and the server know nothing particular about
one another, and that the server gets such an Authentication header from
the browser.
The Base64 decoding is done, and yields a series of bytes.
Now this series of bytes have to be interpreted, to be translated into a
string in Java (which is Unicode).  Which encoding should be chosen to
decode the byte array ?
If you use the default platform JVM encoding, you are making the
assumption that the browser knew what this encoding is, aren't you ?
On the other hand, the browser sent nothing to indicate in which
encoding this string was, before it encoded it using Base64, or did it ?


RFC2617 to the rescue...

      basic-credentials = base64-user-pass
      base64-user-pass  = <base64 [4] encoding of user-pass,
                          except not limited to 76 char/line>
      user-pass         = userid ":" password
      userid            = *<TEXT excluding ":">
      password          = *TEXT

*TEXT is defined in RFC2616

       TEXT           = <any OCTET except CTLs,
                        but including LWS>

and finally

       OCTET          = <any 8-bit sequence of data>
       CTL            = <any US-ASCII control character
                        (octets 0 - 31) and DEL (127)>

So actually, Tomcat is correct in the current treatment of credentials.
Therefore, not a bug.

Also André's comments regarding ISO-8859-1 were right if considering the
actual user name and password rather than the header.

Supporting other encodings would be a useful enhancement but the default
will have to be ISO-8859-1 to remain spec compliant. What the browsers
will do for user names and passwords in other encodings is not defined
so it will be a case of YMMV.

Mark

Let me be even more pernickety :

According to the HTTP 1.1 RFC 2616, HTTP header fields MAY contain *TEXTportions representing character sets other than US-ASCII.But then, such header field values MUST be encoded according to therules of RFC 2047.

RFC 2047 in turn, in "2. Syntax of encoded-words ", indicates that thisshould be done using the form :

encoded-word = "=?" charset "?" encoding "?" encoded-text "?="
for example :

Header-name: =?iso-8859-1?B?some iso-8859-1 text, base-64 encoded?=
or
Header-name: =?utf-8?B?some unicode/utf-8 text, base-64 encoded?=

(I am not quite sure here of the "utf-8" part as the correct name forthe charset.)

(NDLR: That is something one does find regularly in email headers; but Ihave never seen it used in HTTP headers until now.)

On the other hand, regarding authentication mechanisms, RFC 2616 refersto RFC 2617, which itself indicates the following format for anauthorization header sent by the browser to the server :


Authorization: Basic QWxhZGRpbjpvcGVuIHNlc2FtZQ==

When base64-decoded, the above string should look like "userid:password".

I did not find in RFC 2617 any specific mention of character setencoding, but it itself refers back to RFC 2616 as being the "baserules". And the base rules in RFC 2616 seem to be that header values areUS-ASCII unless otherwise indicated.


In other words, my contention is as follows :

- if the "userid:password" above contain only US-ASCII characters, thenthe above simple form of the header is fine.- if the "userid:password" string above contain characters other thanUS-ASCII however, then they should be further encoded, using the rulesof RFC 2047.

This would mean that you should have something like :

Authorization: Basic =?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=

(or, maybe, the other way around : it is the"QWxhZGRpbjpvcGVuIHNlc2FtZQ" string which, when base64-decoded, shouldyield a new string of the form"=?utf-8?B?QWxhZGRpbjpvcGVuIHNlc2FtZQ==?=", which should then be decodedonce more to give the "userid:password" string).

Now, I am not sure that if you pass such a HTTP header, encoded asabove, from Apache to Tomcat, that the Tomcat getHeader() call willproperly decode it, using the indicated charset.

And I am not sure either that there exists any browser on the marketthat will encode a userid:password string that way.



---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Basic Authentication Failed with multibyte username

Reply via email to