I'm tracking down a deployment issue that is incorrectly classifying a series of bytes as correctly parsed HTTP request.

Walking through http_parser_parse_req(), it seems that to be marked as a correctly formatted request you need

<bytes excluding white space>+ <white space>+ <bytes excluding white space>+\n

In my case this matches the method as a bunch of control characters and some ascii characters and the URI as a set of ascii characters. Instead of failing in the parsing, this request fails in the DNS lookup since my URI isn't a valid domain name. But the resulting error sent back talks about DNS resolution and is misleading.

Looking at the W3 specs, it looks like HTTP 1.1 has the most lax rules for what characters can form a method token. From my reading, a method can be any token (http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.1), and any character but white space and control characters are allowed to be in a token (http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2).

I'd like to change the parsing of the method token in http_parser_parse_req() to restrict control characters from the method token as well as the white space characters. Since this seems like a rather key part of the ATS processing and has been this way for quite sometime, I wanted to get confirmation from folks that this is a reasonable change.

Thanks,
Susan

Reply via email to