Hi all, I'm currently testing migration of a legacy web app from Tomcat 7 to 8 to 8.5 and ran into problems regarding character encoding in 8.5 only. That app uses JSP pages and declares all of those to be stored in UTF-8, does really do so :-), and declares a HTTP-Content type of "text/html; charset=UTF-8" as well. Textual content at HTML-level is properly encoded using UTF-8 and looks properly in the browser etc.
In Tomcat 8.5 the following is introducing encoding problems, though: > <jsp:include page="/WEB-INF/jsp/includes/search.jsp"> > <jsp:param name="chooseSearchInputTitle" > value="Benutzer wählen" > /> > </jsp:include> "search.jsp" simply outputs the value of the param as the "title" attribute of some HTML-link and the character "ä" is replaced somewhere with the Unicode character REPLACEMENT CHARACTER 0xFFFD. But really only in Tomcat 8.5, not in 8 and not in 7. I can fix that problem using either "SetCharacterEncodingFilter" or the following line, which simply results in the same I guess: > <% request.setCharacterEncoding("UTF-8"); %> Looking at the generated Java code for the JSP I get the following: > org.apache.jasper.runtime.JspRuntimeLibrary.include(request, response, > "/WEB-INF/jsp/includes/search.jsp" + "?" + > org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("chooseSearchInputTitle", > request.getCharacterEncoding())+ "=" + > org.apache.jasper.runtime.JspRuntimeLibrary.URLEncode("Benutzer wählen", > request.getCharacterEncoding()), out, false); The "ä" is properly encoded using UTF-8 in all versions of Tomcat and the generated code seems to be the same in all versions as well, especially regarding "request.getCharacterEncoding()". "getCharacterEncoding" in Tomcat 8.8 has changed, the former implementation didn't take the context into account: > @Override > public String getCharacterEncoding() { > String characterEncoding = coyoteRequest.getCharacterEncoding(); > if (characterEncoding != null) { > return characterEncoding; > } > > Context context = getContext(); > if (context != null) { > return context.getRequestCharacterEncoding(); > } > > return null; > } My connector in server.xml is configured to use "URIEncoding" as UTF-8 in all versions of Tomcat, but that doesn't make a difference to 8.5. So I understand that using "setCharacterEncoding", I set the value actually used in the generated Java now, even though the following is documented for character encoding filter: > Note that the encoding for GET requests is not set here, but on a Connector https://tomcat.apache.org/tomcat-8.5-doc/config/filter.html#Set_Character_Encoding_Filter/Introduction Now I'm wondering about multiple things... 1. Doesn't "getCharacterEncoding" provide the encoding of the HTTP-body? My JSP is called using GET and the Java quoted above seems to build a query string as well. So why does it depend on some body encoding instead of e.g. URIEncoding of the connector? 2. Is my former approach wrong or did changes in Tomcat 8.5 introduce some regression? There is some conversion somewhere which was not present in the past. 3. What is the correct fix I need now? The character encoding filter, even though it only applies to bodies per documentation? Thanks! Mit freundlichen Grüßen, Thorsten Schöning -- Thorsten Schöning E-Mail: thorsten.schoen...@am-soft.de AM-SoFT IT-Systeme http://www.AM-SoFT.de/ Telefon...........05151- 9468- 55 Fax...............05151- 9468- 88 Mobil..............0178-8 9468- 04 AM-SoFT GmbH IT-Systeme, Brandenburger Str. 7c, 31789 Hameln AG Hannover HRB 207 694 - Geschäftsführer: Andreas Muchow --------------------------------------------------------------------- To unsubscribe, e-mail: users-unsubscr...@tomcat.apache.org For additional commands, e-mail: users-h...@tomcat.apache.org