This is, unfortunately, a complex topic. There is an FAQ on this topic:
https://cwiki.apache.org/confluence/display/TOMCAT/Character+Encoding

but even that could probably do with some updates to reflect the
addition of #setRequestCharacterEncoding() and
#setResponseCharacterEncoding().

I'll work through your test case and see if I can figure out where
things are going wrong.


On 12/05/2019 05:51, Tomoki Sato wrote:
> Hello,
> 
> The reader that HttpServletRequest#getReader returns
> seems to decode characters not using the character encoding
> set by ServletContext#setRequestCharacterEncoding(since Servlet 4.0).
> 
> My questions are:
> 1. Is this behavior intentional(e.g. for backward compatibility)?
> 2. If this behavior is intentional, is there any specification
> describing such ServletContext#setRequestCharacterEncoding
> and HttpServletRequest#getReader behaviors?

<snip/>

> Case 1:
> When I submit the form with a parameter 'hello',

There is no form in the example code. Ah. The StackOverflow question has
a complete example.

On to the tests...

> the value of 'hello' is successfully decoded as follows.
> #####################################################################
> requestCharacterEncoding : UTF-8
> req.getCharacterEncoding() : UTF-8
> hello : あ
> #####################################################################

Yes, this works for me too.

> Case 2:
> When I click 'post' and send text content,
> the request body cannot be successfully decoded as follows.
> #####################################################################
> requestCharacterEncoding : UTF-8
> req.getCharacterEncoding() : UTF-8
> body : ???
> #####################################################################

I see corruption too.

Before I dig into what is going on, I do want to point out that writing
to stdout can itself be problematic if your platform default encoding is
not UTF-8 (I don't believe it is for Windows). I'm testing on a platform
with UTF-8 is the default so I'm going to ignore this for now.

OK. The Reader used to read the request body is created using ISO-8859-1
rather than UTF-8. Hmm. Need to dig into that some more. Ah. I think we
have a bug. The way the Reader is created bypasses the check for a value
set by ServletContext#setRequestCharacterEncoding

I'll look into a fix.

Mark

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to