Re: Character encoding

Christopher Schultz Sat, 07 Jul 2007 13:41:34 -0700

Lightbulb,

lightbulb432 wrote:
> Why is the URIEncoding attribute specified on the connector rather than on a
> host, for example?


Because the host doesn't handle connections... the connectors do.

> Does this mean that the number of virtual hosts that can
> listen on the same port on the same box are limited by whether they all use
> the same encodings in their URIs?

Yes, all virtual hosts listening on the same port will have to have the
same encoding. Fortunately, UTF-8 works for all languages that I know of.

> Now that I think about it, wouldn't it be
> at the context level, not even at the host level?

If you had a connector-per-context, yes, but that's no the case.

> In Tomcat 6, should the useBodyEncodingForURI be used if not needing
> compatibility with 4.1, as the documentation mentions? 

I would highly recommend following that recommendation.

> To see if I have things straight, is HttpServletRequest's
> get/setCharacterEncoding used for both the request parameters from a GET
> request AND the contents of the POST?

No. GET requests have request parameters encoded as part of the URL,
which is affected by the <Connector>'s URIEncoding parameter. POST
requests always use the request's "body" encoding, which is specified in
the HTTP header (and can be overridden by using
request.setCharacterEncoding). Some broken clients don't provide the
character encoding of the request, which makes things difficult sometimes.

> How are multipart POST requests dealt with?

Typically, each part of a multipart request contains its own character
encoding, so a multipart POST would follow the encoding for the part
you're reading at the time.

> And HttpServletResponse's get/setCharacterEncoding is used for the contents
> of the response header and the meta tags?

Only for the header field, not META tags. If you want to emit META tags,
you'll have to do them yourself.

> Does it also encode the page content itself? 

Nope. If you change the character encoding for a response after the
response has already had some data written to it, I think you'll send an
incorrect header. For instance:

response.setCharacterEncoding("ISO-8859-1");
PrintWriter out = response.getOutputWriter();

response.setCharacterEncoding("Big5");

out.print("abcdef");
out.flush();

Your client will not receive a sane response. Setting the character
encoding only sets the HTTP response header and configures the
response's Writer, if used, but only /before/ calling getWriter the
first time.

> What about the encoding of cookies for both incoming requests and outgoing
> responses?

See the HTTP spec, section 4.2 ("Message Headers"). It references RFC
822 (ARPA Internet text messages) which does not actually specify a
character encoding. From what I can see, low ASCII is the encoding used.
You shouldn't have to worry about cookie encoding, since you can always
call request.getCookies() and get them "correctly" interpreted for you.

-chris

signature.asc
Description: OpenPGP digital signature

Re: Character encoding

Reply via email to