Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Mark Juszczec Wed, 19 Oct 2016 11:43:05 -0700

On Tue, Oct 18, 2016 at 4:45 PM, Mark Juszczec <mark.juszc...@gmail.com>
wrote:


>
>
> On Tue, Oct 18, 2016 at 2:58 PM, Mark Juszczec <mark.juszc...@gmail.com>
> wrote:
>>
>>
>> Some questions (if these are not relevant, please disregard):
>>
>> I'm loading a whole bunch of modules.  Could some of them be incompatible?
>>
>> DocumentRoot refers to a directory that does not exist.  Is that a
>> problem?
>>
>> What does AddLanguage do?
>>
>> Is AddDefaultCharset redundant?
>>
>> Are +ForwardKeySize and -ForwardDirectories somehow disabling what
>> +ForwardURIEscaped does?
>>
>> I have verified the data coming out of Shibboleth is what we expect.
>>
>
> I think I've found where the byte data is coming in.
>
> AjpAprProcessor.java's method:
>
> protected boolean read(byte[] buf, int pos, int n, boolean block) throws
> IOException
>
> This ultimately gives me a great big buffer of bytes. Spring Tool Suite
> shows me the relevant ones:
>
> 74 79 -61 -117 76
>
>
I think I have found where these bytes are interpreted improperly and my
problems start.

In AbstractAjpProcessor.java there is a method named  protected void
prepareRequest()

        // Decode extra attributes
        boolean secret = false;
        byte attributeCode;
        while ((attributeCode = requestHeaderMessage.getByte())
                != Constants.SC_A_ARE_DONE) {

            switch (attributeCode) {

            case Constants.SC_A_REQ_ATTRIBUTE :
                requestHeaderMessage.getBytes(tmpMB);
                String n = tmpMB.toString();
                requestHeaderMessage.getBytes(tmpMB);
                String v = tmpMB.toString();

I have debugged and gotten to the point where n="FirstName" - the bit of
data giving me fits

After  requestHeaderMessage.getBytes(tmpMB); (the one after String n =
....) tmpMB shows "JOÃ‹L"

tmpMB is a MessageByte.  It contains a ByteChunk.which is the array of
bytes I posted yesterday.

The ByteChunk has a start=1049 and an end=1054.  That is bytes

1049: 5
1050: 74        J
1051: 79        O
1052: -61        0xF....C3
1053: -117      0xF....8B
1054: 76       L

The ByteChunk has a charset and it is set to ISO-8859-1

So, that explains - at least to me - where things go wrong.

Now, the question is why.

Looking at ByteChunk.java, I see it has the following:

    /** Default encoding used to convert to strings. It should be UTF8,
        as most standards seem to converge, but the servlet API requires
        8859_1, and this object is used mostly for servlets.
    */
    public static final Charset DEFAULT_CHARSET =
StandardCharsets.ISO_8859_1;

    private Charset charset;

    public void setCharset(Charset charset) {
        this.charset = charset;
    }

    public Charset getCharset() {
        if (charset == null) {
            charset = DEFAULT_CHARSET;
        }
        return charset;
    }

I set a breakpoint on ByteChunk.setCharset(Charset) and it is never
executed.

ByteChunk.getCharset() is called from MessageBytes.toBytes() which is
called from AjpMessage.appendBytes(MessageBytes)

So, I think this explains why my data is being interpreted incorrectly.

Now, the question becomes why isn't this line in server.xml:

 <Connector port="XXXX"
                  emptySessionPath="true"
                  enableLookups="false"
                  redirectPort="YYYY"
                  protocol="AJP/1.3"
                  maxThreads="300"
                  URIEncoding="UTF-8"
                  connectionTimeout="600000" />

enough to cause ByteChunk.charset to be set to "UTF-8"

Does anyone have any thoughts as to how to proceed?

Re: Tomcat 8, AJP 1.3 UTF-8/ISO-8859-1 conversion problem

Reply via email to