On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec <mark.juszc...@gmail.com> wrote:
> > > On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec <mark.juszc...@gmail.com> > wrote: >> >> >> Some questions (if these are not relevant, please disregard): >> >> I'm loading a whole bunch of modules. Could some of them be incompatible? >> >> DocumentRoot refers to a directory that does not exist. Is that a >> problem? >> >> What does AddLanguage do? >> >> Is AddDefaultCharset redundant? >> >> Are +ForwardKeySize and -ForwardDirectories somehow disabling what >> +ForwardURIEscaped does? >> >> I have verified the data coming out of Shibboleth is what we expect. >> > > I think I've found where the byte data is coming in. > > AjpAprProcessor.java's method: > > protected boolean read(byte[] buf, int pos, int n, boolean block) throws > IOException > > This ultimately gives me a great big buffer of bytes. Spring Tool Suite > shows me the relevant ones: > > 74 79 -61 -117 76 > > I think I have found where these bytes are interpreted improperly and my problems start. In AbstractAjpProcessor.java there is a method named protected void prepareRequest() // Decode extra attributes boolean secret = false; byte attributeCode; while ((attributeCode = requestHeaderMessage.getByte()) != Constants.SC_A_ARE_DONE) { switch (attributeCode) { case Constants.SC_A_REQ_ATTRIBUTE : requestHeaderMessage.getBytes(tmpMB); String n = tmpMB.toString(); requestHeaderMessage.getBytes(tmpMB); String v = tmpMB.toString(); I have debugged and gotten to the point where n="FirstName" - the bit of data giving me fits After requestHeaderMessage.getBytes(tmpMB); (the one after String n = ....) tmpMB shows "JOËL" tmpMB is a MessageByte. It contains a ByteChunk.which is the array of bytes I posted yesterday. The ByteChunk has a start=1049 and an end=1054. That is bytes 1049: 5 1050: 74 J 1051: 79 O 1052: -61 0xF....C3 1053: -117 0xF....8B 1054: 76 L The ByteChunk has a charset and it is set to ISO-8859-1 So, that explains - at least to me - where things go wrong. Now, the question is why. Looking at ByteChunk.java, I see it has the following: /** Default encoding used to convert to strings. It should be UTF8, as most standards seem to converge, but the servlet API requires 8859_1, and this object is used mostly for servlets. */ public static final Charset DEFAULT_CHARSET = StandardCharsets.ISO_8859_1; private Charset charset; public void setCharset(Charset charset) { this.charset = charset; } public Charset getCharset() { if (charset == null) { charset = DEFAULT_CHARSET; } return charset; } I set a breakpoint on ByteChunk.setCharset(Charset) and it is never executed. ByteChunk.getCharset() is called from MessageBytes.toBytes() which is called from AjpMessage.appendBytes(MessageBytes) So, I think this explains why my data is being interpreted incorrectly. Now, the question becomes why isn't this line in server.xml: <Connector port="XXXX" emptySessionPath="true" enableLookups="false" redirectPort="YYYY" protocol="AJP/1.3" maxThreads="300" URIEncoding="UTF-8" connectionTimeout="600000" /> enough to cause ByteChunk.charset to be set to "UTF-8" Does anyone have any thoughts as to how to proceed?