On Thu, Oct 20, 2016 at 4:21 AM, André Warnier (tomcat) <a...@ice-sa.com> wrote:
> > Can you tell us (or remind us) exactly how the browser is sending this > request for the parameter "JOEL" (with dieraesis on the E) to the server ? > Is it a part of the query-string of the URL, or is it in the body of a > POST request ? > > The following on-line documentation describes precisely how this should > work : > http://tomcat.apache.org/tomcat-8.0-doc/config/ajp.html#Attributes > (See "URIEncoding", but also "useBodyEncodingForURI", and follow the link > provided to the same attributes in the HTTP Connector : > http://tomcat.apache.org/tomcat-8.0-doc/config/http.html#Common_Attributes > ) > > So check exactly what you are doing, and if that matches these rules > somehow. > > Personal rant : > Unfortunately, this is is still a big mess in the HTTP protocol. > And the people in charge of the design of the protocol missed a golden > opportunity of cleaning this up in HTTP 2.x and making Unicode/UTF-8 the > default, instead of clinging to iso-8859-1. Thus condemning all web > programmers worldwide to another 20 years of obscure bugs and clunky > work-arounds. > > (s) Andr%C3%A9 > > The data is being returned by Shibboleth and passed to Tomcat in the body of an HTTP GET request. This is by design of the application and there's nothing I can do about it. As such, my only options for enforcing UTF-8 are by using "URIEncoding" and/or "useBodyEncodingForURI" as described in the links. I've done this and it has not had any impact on the problem. Last night, I found these bits of information: https://issues.shibboleth.net/jira/browse/SSPCPP-2 My interpretation (and PLEASE tell me if I'm wrong) is, since at least 2007, headers have been locked in to the ISO-8859-1 charset due to specs that govern how the world wide web is going to work. This: https://wiki.shibboleth.net/confluence/display/SHIB2/NativeSPAttributeAccess goes on to reiterate what the first link says and propose a workaround (see the Java link at the end of the page) "Shibboleth attributes are by default UTF-8 encoded. However, depending on the servlet contaner configuration they are interpreted as ISO-8859-1 values. This causes problems with non-ASCII characters. The solution is to re-encode attributes, e.g. with: String value= request.getHeader("givenName"); value= new String( value.getBytes("ISO-8859-1"), "UTF-8");" Although MY data is delivered as attributes (so I have to use request.getAttribute("FirstName") ) this works ISO-8859-1 is the default used by ByteChunk and I've verified it is not reset/changed to UTF-8 despite having specified it in server.xml per Tomcat documentation. I found this: https://issues.shibboleth.net/jira/browse/SSPCPP-2 which says this problem has been around since at least 2007 Then I found this: https://wiki.shibboleth.net/confluence/plugins/servlet/mobil e#content/view/4358180 which suggests the following solution: String value= request.getHeader("givenName"); value= new String( value.getBytes("ISO-8859-1"), "UTF-8"); I have to get my data via request.getAttribute("key") Is the solution appropriate for data delivered as attributes? I have read the information that says its a dangerous hack and is the main reason I have not implemented it. However, given the Shibboleth forum posts and what I've discovered about ByteChunk seems to cast this in a different light. Any thoughts, comments would be greatly appreciated.