Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Mark Juszczec Thu, 20 Oct 2016 07:00:39 -0700

On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) <a...@ice-sa.com>
wrote:


>
> Can you tell us (or remind us) exactly how the browser is sending this
> request for the parameter "JOEL" (with dieraesis on the E) to the server ?
> Is it a part of the query-string of the URL, or is it in the body of a
> POST request ?
>
> The following on-line documentation describes precisely how this should
> work :
> http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes
> (See "URIEncoding", but also "useBodyEncodingForURI", and follow the link
> provided to the same attributes in the HTTP Connector :
> http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes
> )
>
> So check exactly what you are doing, and if that matches these rules
> somehow.
>
> Personal rant :
> Unfortunately, this is is still a big mess in the HTTP protocol.
> And the people in charge of the design of the protocol missed a golden
> opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the
> default, instead of clinging to iso-8859-1. Thus condemning all web
> programmers worldwide to another 20 years of obscure bugs and clunky
> work-arounds.
>
> (s) Andr%C3%A9
>
>
The data is being returned by Shibboleth and passed to Tomcat in the body
of an HTTP GET request.

This is by design of the application and there's nothing I can do about it.

As such, my only options for enforcing UTF-8 are by using "URIEncoding"
and/or "useBodyEncodingForURI" as described in the links.

I've done this and it has not had any impact on the problem.

Last night, I found these bits of information:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

My interpretation (and PLEASE tell me if I'm wrong) is, since at least
2007, headers have been locked in to the ISO-8859-1 charset due to specs
that govern how the world wide web is going to work.

This:

https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess

goes on to reiterate what the first link says and propose a workaround (see
the Java link at the end of the page)

"Shibboleth attributes are by default UTF-8 encoded. However, depending on
the servlet contaner configuration they are interpreted as ISO-8859-1
values. This causes problems with non-ASCII characters. The solution is to
re-encode attributes, e.g. with:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");"


Although MY data is delivered as attributes (so I have to use
request.getAttribute("FirstName") )  this works

ISO-8859-1 is the default used by ByteChunk and I've verified it is not
reset/changed to UTF-8 despite having specified it in server.xml per Tomcat
documentation.

I found this:

https://issues.shibboleth.net/jira/browse/SSPCPP-2

which says this problem has been around since at least 2007

Then I found this:

https://wiki.shibboleth.net/confluence/plugins/servlet/mobil
e#content/view/4358180

which suggests the following solution:

String value= request.getHeader("givenName");
value= new String( value.getBytes("ISO-8859-1"), "UTF-8");

I have to get my data via request.getAttribute("key")

Is the solution appropriate for data delivered as attributes?
I have read the information that says its a dangerous hack and is the main
reason I have not implemented it.

However, given the Shibboleth forum posts and what I've discovered about
ByteChunk seems to cast this in a different light.

Any thoughts, comments would be greatly appreciated.

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Reply via email to