-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA256

Thorsten,

On 11/26/18 08:45, Thorsten Schöning wrote:
> Hi all,
> 
> I'm currently testing migration of a legacy web app from Tomcat 7
> to 8 to 8.5 and ran into problems regarding character encoding in
> 8.5 only. That app uses JSP pages and declares all of those to be
> stored in UTF-8, does really do so :-), and declares a HTTP-Content
> type of "text/html; charset=UTF-8" as well. Textual content at
> HTML-level is properly encoded using UTF-8 and looks properly in
> the browser etc.
> 
> In Tomcat 8.5 the following is introducing encoding problems,
> though:
> 
>> <jsp:include page="/WEB-INF/jsp/includes/search.jsp"> <jsp:param
>> name="chooseSearchInputTitle" value="Benutzer wählen" /> 
>> </jsp:include>
> 
> "search.jsp" simply outputs the value of the param as the "title" 
> attribute of some HTML-link and the character "ä" is replaced 
> somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD.
> But really only in Tomcat 8.5, not in 8 and not in 7.

Have you been able to determine if the problem is on input or output?

> I can fix that problem using either "SetCharacterEncodingFilter"
> or the following line, which simply results in the same I guess:
> 
>> <% request.setCharacterEncoding("UTF-8"); %>

FYI the SetCharacterEncodingFilter only modifies request encoding and
not response encoding. Also, it only changes the encoding of the
request *body* (e.g. PUT/POST), and not the encoding used to decode
the URI. That's configured in <Connector>'s URIEncoding. There is also
useBodyEncodingForURI which inherits the request body's encoding if
it's present. I recommend using useBodyEncodingForURI="true".

I recommend *always* using SetCharacterEncodingFilter, since web
browsers both habitually refuse to send a correct content/type and
often use UTF-8 in URLs in violation of the HTTP spec. The result is
essentially that everything works the way you *want* it to work,
except that you just have to "hope" it works instead of being able to
prove that it will.

> Looking at the generated Java code for the JSP I get the
> following:
> 
>> org.apache.jasper.runtime.JspRuntimeLibrary.include(request,
>> response, "/WEB-INF/jsp/includes/search.jsp" + "?" +
>> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchIn
putTitle",
>> request.getCharacterEncoding())+ "=" +
>> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer
>> wählen", request.getCharacterEncoding()), out, false);
> 
> The "ä" is properly encoded using UTF-8 in all versions of Tomcat
> and the generated code seems to be the same in all versions as
> well, especially regarding "request.getCharacterEncoding()".
> 
> "getCharacterEncoding" in Tomcat 8.8 has changed, the former 
> implementation didn't take the context into account:
> 
>> @Override public String getCharacterEncoding() { String
>> characterEncoding = coyoteRequest.getCharacterEncoding(); if
>> (characterEncoding != null) { return characterEncoding; }
>> 
>> Context context = getContext(); if (context != null) { return
>> context.getRequestCharacterEncoding(); }
>> 
>> return null; }

This is just a fall-back for when there is no character encoding
defined in the request (because the browser didn't send one).

> My connector in server.xml is configured to use "URIEncoding" as
> UTF-8 in all versions of Tomcat, but that doesn't make a difference
> to 8.5. So I understand that using "setCharacterEncoding", I set
> the value actually used in the generated Java now, even though the
> following is documented for character encoding filter:
> 
>> Note that the encoding for GET requests is not set here, but on a
>> Connector
> 
> https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Charac
ter_Encoding_Filter/Introduction
>
>  Now I'm wondering about multiple things...
> 
> 1. Doesn't "getCharacterEncoding" provide the encoding of the 
> HTTP-body?

Yes, but it comes directly from the browser, who often doesn't provide
it. There is no encoding-detection going on, so it's often "null" or
ISO-8859-1, which is the spec-defined default.

> My JSP is called using GET and the Java quoted above seems to build
> a query string as well. So why does it depend on some body encoding
> instead of e.g. URIEncoding of the connector?

Good question. Might be  a bug, here.

> 2. Is my former approach wrong or did changes in Tomcat 8.5
> introduce some regression? There is some conversion somewhere which
> was not present in the past.

Tomcat 8.5 follows the servlet spec, which in v4.0 added the
<web-app><request-character-encoding> to make things even more fun.
Actually, this can replace the use of the SetCharacterEncodingFilter.
Thanks for pointing this out; I wasn't aware of this feature of the
4.0 spec.

> 3. What is the correct fix I need now? The character encoding
> filter, even though it only applies to bodies per documentation?

Try setting <request-character-encoding> in your <web-app> like this:

web.xml
- -------
<web-app>
  <request-character-encoding>UTF-8</request-character-encoding>
</web-app>

- -chris
-----BEGIN PGP SIGNATURE-----
Comment: Using GnuPG with Thunderbird - https://www.enigmail.net/

iQIzBAEBCAAdFiEEMmKgYcQvxMe7tcJcHPApP6U8pFgFAlv8DEYACgkQHPApP6U8
pFjbihAAuX3vNtHpJ2qLpIofvz83wFbCxyVsgnRPGIQsqT/wxskOizwkKCmxnITc
pYEJHOEjF5U+C9QJtyC4iPz/Dj9MOfk8986NZ/9bhxFuGJsAifO1HKZ2vTvf9dYD
s5yAPJryQYaShgiDRPopYDgCOWi6a9mQMjvQeYclQjFAOa3MWMa4tlnKD2mOL4GQ
X/PuUiKA97XMmj6LZTwh9dGJwU2Fi6LlWOIXXP2qAB8RmcfIlDr20/m1OKg4l0Z3
dVzbD0rWM7tNCtDhnybclamdKv+apDJGS3NtTHzScXlqT51EdUiKup+mTJbaRncD
okL9MKlGLZYe5ankTGHaNH5P4BfhSv1BUYwiTXpUMgVpuAl5AMxEwu5ZHdoyeSJm
+B27/RLXMFue25Qtni6op06ssJGjQZyR5AxAN4qO/k3eTJUzAp5tLiJlbpJbMIzd
fEiL2kIkvIeHUE6Iz39deaWsFqu6m1hweSGcTXsvky0mEi20QZ9Pa+1E9UTvii20
HL0h/MxKlfJFc7yXmLU2SpTho4lTLUIMD57XOuYPQTkHBcW0QoHJLSCymANx/wpv
OdPjXsqGDBAKWteRTaB7caqU0Fb+Z3UHA8PUIjT4sPW88uHkRGA5XRLMWWlXe+Cx
DVwykOEkBaKXLWzZ51R+cYoWEWKtbR0pzEW+dA9JEMClWMrovkg=
=pfKy
-----END PGP SIGNATURE-----

---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to