Lightbulb, lightbulb432 wrote: > Why is the URIEncoding attribute specified on the connector rather than on a > host, for example?
Because the host doesn't handle connections... the connectors do. > Does this mean that the number of virtual hosts that can > listen on the same port on the same box are limited by whether they all use > the same encodings in their URIs? Yes, all virtual hosts listening on the same port will have to have the same encoding. Fortunately, UTF-8 works for all languages that I know of. > Now that I think about it, wouldn't it be > at the context level, not even at the host level? If you had a connector-per-context, yes, but that's no the case. > In Tomcat 6, should the useBodyEncodingForURI be used if not needing > compatibility with 4.1, as the documentation mentions? I would highly recommend following that recommendation. > To see if I have things straight, is HttpServletRequest's > get/setCharacterEncoding used for both the request parameters from a GET > request AND the contents of the POST? No. GET requests have request parameters encoded as part of the URL, which is affected by the <Connector>'s URIEncoding parameter. POST requests always use the request's "body" encoding, which is specified in the HTTP header (and can be overridden by using request.setCharacterEncoding). Some broken clients don't provide the character encoding of the request, which makes things difficult sometimes. > How are multipart POST requests dealt with? Typically, each part of a multipart request contains its own character encoding, so a multipart POST would follow the encoding for the part you're reading at the time. > And HttpServletResponse's get/setCharacterEncoding is used for the contents > of the response header and the meta tags? Only for the header field, not META tags. If you want to emit META tags, you'll have to do them yourself. > Does it also encode the page content itself? Nope. If you change the character encoding for a response after the response has already had some data written to it, I think you'll send an incorrect header. For instance: response.setCharacterEncoding("ISO-8859-1"); PrintWriter out = response.getOutputWriter(); response.setCharacterEncoding("Big5"); out.print("abcdef"); out.flush(); Your client will not receive a sane response. Setting the character encoding only sets the HTTP response header and configures the response's Writer, if used, but only /before/ calling getWriter the first time. > What about the encoding of cookies for both incoming requests and outgoing > responses? See the HTTP spec, section 4.2 ("Message Headers"). It references RFC 822 (ARPA Internet text messages) which does not actually specify a character encoding. From what I can see, low ASCII is the encoding used. You shouldn't have to worry about cookie encoding, since you can always call request.getCookies() and get them "correctly" interpreted for you. -chris
signature.asc
Description: OpenPGP digital signature