Re: Encoding Issue on POST

vwu98034 Sun, 02 Jan 2011 22:50:43 -0800

Thanks very much André for your detailed information.

I got the Httpfox for my Firefox today. For some reasons, the POST data section 
always be blank (a message like "request doesn't contain any POST data"). I 
can't figure out the reason so I download Charles Proxy. It shows the form data 
is as the followings:

^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
POST /vsm/categories/12/edit HTTP/1.1
Host    localhost:8080
User-Agent      Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.13) 
Gecko/20101203 Firefox/3.6.13 ( .NET CLR 3.5.30729)
Accept  text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8
Accept-Language en-us,en;q=0.5
Accept-Encoding gzip,deflate
Accept-Charset  ISO-8859-1,utf-8;q=0.7,*;q=0.7
Keep-Alive      115
Referer http://localhost:8080/vsm/categories/12/edit
Cookie  JSESSIONID=706C1F28E75C8BCB3F1DEE8E2EDFB67B
Content-Type    application/x-www-form-urlencoded
Content-Length  95

After I copied the request data into a unicode coverter 
(http://rishida.net/tools/conversion/), the converted data is the original one. 
So, the data is sent correctly. Now, I need to dip into the framework code to 
figure out where it messes up the encoding.

----- Original Message -----
From: "André Warnier" <a...@ice-sa.com>
To: "Tomcat Users List" <users@tomcat.apache.org>
Sent: Sunday, January 2, 2011 4:38:29 AM GMT -08:00 US/Canada Pacific
Subject: Re: Encoding Issue on POST

Below.

vwu98...@lycos.com wrote:
> Please see below.
> 
> 
> ----- Original Message -----
> From: "André Warnier" <a...@ice-sa.com>
> To: "Tomcat Users List" <users@tomcat.apache.org>
> Sent: Saturday, January 1, 2011 11:46:40 AM GMT -08:00 US/Canada Pacific
> Subject: Re: Encoding Issue on POST
> 
> vwu98...@lycos.com wrote:
>> Thanks very much André. Please see the below
>>
>> ----- Original Message -----
>> From: "André Warnier" <a...@ice-sa.com>
>> To: "Tomcat Users List" <users@tomcat.apache.org>
>> Sent: Saturday, January 1, 2011 4:26:28 AM GMT -08:00 US/Canada Pacific
>> Subject: Re: Encoding Issue on POST
>>
>> vwu98...@lycos.com wrote:
>>> The followings are what I have done for a project in regarding of encoding.
>>>
>>> * <%...@page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%> 
>>> on the first line of JSP files
>>> * <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
>> That looks fine.
>>
>>> * Set URIEncoding="UTF-8" on your <Connector> in server.xml
>> That matters for a GET, but plays no role for a POST.
>>
>>
>>> * An encoding filter for utf-8. 
>> Can you expand on that ?
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> The code can be found on the follow page: 
>>
>> http://www.devdaily.com/java/jwarehouse/spring-framework-2.5.3/src/org/springframework/web/filter/CharacterEncodingFilter.java.shtml
>>
>> and the configuration in web.xml is the following:
>>
>> <filter>
>> <filter-name>CharacterEncodingFilter</filter-name>
>> <filter-class>org.springframework.web.filter.CharacterEncodingFilter</filter-class>
>> <init-param>
>> <param-name>encoding</param-name>
>> <param-value>UTF-8</param-value>
>> </init-param>
>> <init-param>
>> <param-name>forceEncoding</param-name>
>> <param-value>true</param-value>
>> </init-param>
>> </filter>
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>>   
>>> While the form GET method works correctly for encoding, the form POST 
>>> doesn't. I need the following codes to get the right encoding for data 
>>> coming from a form POST method:
>>>
>>>             try{
>>>
>>>                     tmp = new String(str.getBytes("ISO-8859-1"), "UTF-8");
>>>             }catch(Exception ex){}
>>>
>> How exactly are you obtaining "str" in the above ?
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>> The str is the input string which carries data from the form.
>>
>> +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
>>
> 
> I believe that you do not need that filter, and that it is even 
> counter-productive.
> I believe that the effect of the filter is to result in a double-encoding of 
> the data 
> coming from the form, which is why you are having those problems.
> 
> According to what you already set with
> 
> <%...@page contentType="text/html;charset=UTF-8" pageEncoding="UTF-8"%> on 
> the first line of 
> JSP files
> <meta http-equiv="Content-Type" content="text/html;charset=UTF-8">
> 
> the POST data from your form should already come to the server as UTF-8.
> 
> I do not know how Spring works, but assuming it does something like 
> request.getParameter() 
> to get the form input values, these values should already be properly read as 
> UTF-8, and 
> your internal String should receive the correct Unicode values.
> 
> I have not looked at the filter code, but normally this type of filter will 
> act on the 
> "content", or "body" of a request, not on the URL.
> So in this case, the filter only runs when the request has a body, which is 
> only in the 
> case of a POST.  For a GET (where the form parameters arrive in the 
> query-string part of 
> the URL, and there is no body), the filter is not in the picture.
> That is why you are seeing this problem with POST, but not with GET.
> 
> So, take out the filter and let's see what happens.
> (And also take out your added code above, it should not be there and will 
> corrupt the 
> data, when the data is ok.)
> 
> If you want a more detailed explanation, come back here.
> 
> ©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©
>  
> This TC document (http://wiki.apache.org/tomcat/FAQ/CharacterEncoding) says a 
> such filter needed. I didn't think the problem will be solved by removing the 
> encoding filter. The cause of the problem is the string in the ISO-8859-1 
> encoding instead of utf-8. The filter is used to make utf-8 encoding happen.  
> 
> I removed the filter and nothing changes.
> 
Well, I would say that by removing the filter nothing changes, then that proves 
at least 
that the filter is useless, doesn't it ?
And then, according to Occam's principle, you should remove it.

...
Actually, maybe not.  From what I understand of the filter code, it only sets 
the 
request's character encoding if it is not already set. (It does not do any 
character 
translation itself).
And, if your form and your browser do their job, the request encoding should 
already be 
set, to "UTF-8".

So I think that in order to do the following check, you should remove the 
filter, 
otherwise it will be confusing.  You can also just set the "ignore" 
Init-Parameter of the 
filter to "true", for it to do nothing.

> ©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©©
>  

But let's take the problem from the other side.

These character set issues are complex, and what you see is not necessarily 
what you 
really get (*).  So you have to be very systematic.

The very first thing to verify, is what the browser actually sends to the 
server.
Not just guess, but be sure.

For that, you can use a browser plugin, which acts as a kind of "protocol 
analyser" and 
shows the contents of the packets that the browser sends to the server.
If you are using Firefox, there is the HttpFox plugin.
If you are using IE, there is "Fiddler2".
Install one of these, start it up (easy to figure out), call up the form from 
your system. 
  In the plugin window, you can see the request to call up the form, and the 
response.
You can see the data at any level of detail : request/response line, HTTP 
headers, contents.
Carefully look at it to see if it has the type and charset which you expect.
(In some browsers, there is also a "page information" menu item which shows 
what the 
browser thinks of the current page's character set).

Now fill-in the form (with some "accented" characters) and press submit.
Again, you can see what the browser sends, HTTP headers and contents.
That will tell you /for sure/ if the browser is sending the POST parameters as 
UTF-8.
For example, if you started one parameter with a letter such as "á" or "à" or 
"ä", then 
you should see this letter being sent as 2 bytes, of which the first one is 
always \xC3 
(which always looks like "capital A with tilde" is seen as an iso-8859-1 
character).
Also, assuming that you do not use a "enctype" attribute in the <form> tag, 
there should 
be a request HTTP header : Content-type: indicating the character set.
(And if it is not, then there is something wrong with your form, or with your 
browser).

If it is, then the server, at the deepest level, also receives the parameters 
as UTF-8.
(By "at the deepest level" in this case I mean a request.getParameter() call).

And then, there should be no filter needed to make it UTF-8, since it already 
is.
If a filter is needed, then it means that something else in-between, has 
already converted 
it from UTF-8 to something else, not so ?
Or can you find a flaw in that logic ?

(*) for example, if you are looking at a logfile, you have to ask yourself the 
questions :
- how (with which charset) is the logfile being written ?
- how am I viewing it ? with a Unicode-capable editor which detects the charset 
automatically ?
- between what my editor edits, and what I see on the screen, is there another 
layer of 
translation happening ?
Gives you some idea..

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Re: Encoding Issue on POST

Reply via email to