On Sun, 13 May 2001, kevin seguin wrote:

> in the java side of ajp13, it is pretty much assumed that strings are
> iso-8859-1 encoded.  (i'm not sure how things that deal with
> MessageBytes that come out of ajp13 deal with encoding...).

MessageBytes is supposed to delay the conversion from bytes to strings
until an encoding is found ( that can be as late as servlet execution time
in servlet2.3 ). 

The default is 8859-1, as required by the servlet specs ( if no other
encoding is specified ).

I think the "design" is right - but there are many details that need
to be resolved before this will work as expected. Few modules ( like the
mapper ) are forcing a conversion to String for the URI, and very little
testing has been done.

BTW, there are few major problems I don't know how to resolve, like
the (stupid) behavior of MSIE in the case of UTF8 in javascript ( they
send %XXXX instead of %XX%XX - as EcmaScript requires ). 

I spent a lot of time reading and thinking about how to resolve the 
i18n, but it's a nightmare - and I'm not sure I have the energy to do
it.  


> is this a potential problem?  i realize that for things like standard
> header names this will generally not be a problem.  but would it be
> worthwhile to send an encoding across from the webserver to the
> container in ajp14?  or, can iso-8859-1 be assumed, and if a
> content-type header is present and specifies an encoding, it can be
> pulled out of there?

The webserver doesn't know the encoding ( unless it reads the encoding
header ), it works with byte[]. It would be great if it can send the
encoding in advance ( as parsing Content-Type is very expensive ), 
but most browsers do not send it...

Costin

Reply via email to