On Sun, 13 May 2001, kevin seguin wrote:
> in the java side of ajp13, it is pretty much assumed that strings are
> iso-8859-1 encoded. (i'm not sure how things that deal with
> MessageBytes that come out of ajp13 deal with encoding...).
MessageBytes is supposed to delay the conversion from bytes to strings
until an encoding is found ( that can be as late as servlet execution time
in servlet2.3 ).
The default is 8859-1, as required by the servlet specs ( if no other
encoding is specified ).
I think the "design" is right - but there are many details that need
to be resolved before this will work as expected. Few modules ( like the
mapper ) are forcing a conversion to String for the URI, and very little
testing has been done.
BTW, there are few major problems I don't know how to resolve, like
the (stupid) behavior of MSIE in the case of UTF8 in javascript ( they
send %XXXX instead of %XX%XX - as EcmaScript requires ).
I spent a lot of time reading and thinking about how to resolve the
i18n, but it's a nightmare - and I'm not sure I have the energy to do
it.
> is this a potential problem? i realize that for things like standard
> header names this will generally not be a problem. but would it be
> worthwhile to send an encoding across from the webserver to the
> container in ajp14? or, can iso-8859-1 be assumed, and if a
> content-type header is present and specifies an encoding, it can be
> pulled out of there?
The webserver doesn't know the encoding ( unless it reads the encoding
header ), it works with byte[]. It would be great if it can send the
encoding in advance ( as parsing Content-Type is very expensive ),
but most browsers do not send it...
Costin