Hi all,

I'm currently testing migration of a legacy web app from Tomcat 7 to 8
to 8.5 and ran into problems regarding character encoding in 8.5 only.
That app uses JSP pages and declares all of those to be stored in
UTF-8, does really do so :-), and declares a HTTP-Content type of
"text/html; charset=UTF-8" as well. Textual content at HTML-level is
properly encoded using UTF-8 and looks properly in the browser etc.

In Tomcat 8.5 the following is introducing encoding problems, though:

> <jsp:include page="/WEB-INF/jsp/includes/search.jsp">
>       <jsp:param      name="chooseSearchInputTitle"
>                       value="Benutzer wählen"
>       />
> </jsp:include>

"search.jsp" simply outputs the value of the param as the "title"
attribute of some HTML-link and the character "ä" is replaced
somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But
really only in Tomcat 8.5, not in 8 and not in 7.

I can fix that problem using either "SetCharacterEncodingFilter" or
the following line, which simply results in the same I guess:

> <% request.setCharacterEncoding("UTF-8"); %>

Looking at the generated Java code for the JSP I get the following:

> org.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, 
> "/WEB-INF/jsp/includes/search.jsp" + "?" + 
> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle",
>  request.getCharacterEncoding())+ "=" + 
> org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", 
> request.getCharacterEncoding()), out, false);

The "ä" is properly encoded using UTF-8 in all versions of Tomcat and
the generated code seems to be the same in all versions as well,
especially regarding "request.getCharacterEncoding()".

"getCharacterEncoding" in Tomcat 8.8 has changed, the former
implementation didn't take the context into account:

>    @Override
>    public String getCharacterEncoding() {
>        String characterEncoding = coyoteRequest.getCharacterEncoding();
>        if (characterEncoding != null) {
>            return characterEncoding;
>        }
>
>        Context context = getContext();
>        if (context != null) {
>            return context.getRequestCharacterEncoding();
>        }
>
>        return null;
>    }

My connector in server.xml is configured to use "URIEncoding" as UTF-8
in all versions of Tomcat, but that doesn't make a difference to 8.5.
So I understand that using "setCharacterEncoding", I set the value
actually used in the generated Java now, even though the following is
documented for character encoding filter:

> Note that the encoding for GET requests is not set here, but on a Connector

https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Character_Encoding_Filter/Introduction

Now I'm wondering about multiple things...

1. Doesn't "getCharacterEncoding" provide the encoding of the
   HTTP-body? My JSP is called using GET and the Java quoted above
   seems to build a query string as well. So why does it depend on
   some body encoding instead of e.g. URIEncoding of the connector?

2. Is my former approach wrong or did changes in Tomcat 8.5 introduce
   some regression? There is some conversion somewhere which was not
   present in the past.

3. What is the correct fix I need now? The character encoding filter,
   even though it only applies to bodies per documentation?

Thanks!

Mit freundlichen Grüßen,

Thorsten Schöning

-- 
Thorsten Schöning       E-Mail: thorsten.schoen...@am-soft.de
AM-SoFT IT-Systeme      http://www.AM-SoFT.de/

Telefon...........05151-  9468- 55
Fax...............05151-  9468- 88
Mobil..............0178-8 9468- 04

AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln
AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow


---------------------------------------------------------------------
To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org
For additional commands, e-mail: users-h...@tomcat.apache.org

Reply via email to