It's not a bug, it's a feature ;). Seriously, if you open a bug report for this, it will be closed quickly as either INVALID or as DUPLICATE to a bug that was closed as INVALID.
The HTTP spec specifies that header information is encoded in iso-latin-1, so this is what Tomcat uses by default when parsing the query-string. If you want the non-default behavior, then simply set useBodyEncodingForURI="true" in the <Connector ... /> element of server.xml. "Lajos Papp" <[EMAIL PROTECTED]> wrote in message news:[EMAIL PROTECTED] > hi, > > i think there is a bug at handling utf-8 encoded request parameters sent > by a html form with "get" method. > i created a simple jsp page: > === encTest.jsp === > <[EMAIL PROTECTED] contentType="text/html" pageEncoding="UTF-8"%> > > <% > String query = request.getQueryString(); > String queryDecoded = "-"; > if (query != null) { > queryDecoded = java.net.URLDecoder.decode(query,"utf-8"); > } > > request.setCharacterEncoding("UTF-8"); > String reqParam = request.getParameter("param"); > %> > > <br> query = <%= query %> > <br> queryDecoded = <%= queryDecoded %> > <br> reqParam = <%= reqParam %> > > > <form action="encTest.jsp" method="get"> > <input name="param" /> > <input type="submit" value="send" /> > </form> > === end of jsp === > > When i fill out the form with some non US characters (in this case with a > hungarian name), the browser urlencodes it correctly which i can see > from the url: > http://localhost:8080/struts/encTest.jsp?param=b%C3%A9la > > when i decode the query string by hand: > queryDecoded = java.net.URLDecoder.decode(query,"utf-8"); > i get the correct string, but when i call the getParameter() method on > the request: > request.setCharacterEncoding("UTF-8"); > String reqParam = request.getParameter("param"); > i get a miscoded string as the request.setCharacterEncoding("UTF-8") > wouldn't > be there. > > i checked the sourcecode of tomcat 6.0.16 and found that > the Parameters.handleQueryParameters() does the real job, which is called > by > Request. parseParameters() > the request has the correct encoding (utf-8) but the parameter has 2 > different > properties which store information about encoding: encoding and > queryStringEncoding. in case of a "GET" the useBodyEncodingForURI is > false, and therefore only parameters.setEncoding("utf-8") is called > but parameters.setQueryStringEncoding("utf-8") isn't. > so when request.parseParameters() calls > parameters.handleQueryParameters() > than queryStringEncoding is still null, and of course will return > miscoded paramter. > > Do you agree that it's a bug, or i miss something? > cheers, > lajos > > === org.apache.catalina.connector.Request === > > protected void parseParameters() { > > ... > String enc = getCharacterEncoding(); > > boolean useBodyEncodingForURI = > connector.getUseBodyEncodingForURI(); > if (enc != null) { > parameters.setEncoding(enc); > if (useBodyEncodingForURI) { > parameters.setQueryStringEncoding(enc); > } > } > ... > parameters.handleQueryParameters(); > > ... > if (!getMethod().equalsIgnoreCase("POST")) > return; > > > === org.apache.tomcat.util.http.Parameters === > public void handleQueryParameters() { > ... > handleQueryParameters(decodedQuery, queryStringEncoding); > } > > > > --------------------------------------------------------------------- > To start a new topic, e-mail: users@tomcat.apache.org > To unsubscribe, e-mail: [EMAIL PROTECTED] > For additional commands, e-mail: [EMAIL PROTECTED] > > --------------------------------------------------------------------- To start a new topic, e-mail: users@tomcat.apache.org To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]