I'm tracking down a deployment issue that is incorrectly classifying a
series of bytes as correctly parsed HTTP request.
Walking through http_parser_parse_req(), it seems that to be marked as a
correctly formatted request you need
<bytes excluding white space>+ <white space>+ <bytes excluding white
space>+\n
In my case this matches the method as a bunch of control characters and
some ascii characters and the URI as a set of ascii characters. Instead
of failing in the parsing, this request fails in the DNS lookup since my
URI isn't a valid domain name. But the resulting error sent back talks
about DNS resolution and is misleading.
Looking at the W3 specs, it looks like HTTP 1.1 has the most lax rules
for what characters can form a method token. From my reading, a method
can be any token
(http://www.w3.org/Protocols/rfc2616/rfc2616-sec5.html#sec5.1.1), and
any character but white space and control characters are allowed to be
in a token (http://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html#sec2.2).
I'd like to change the parsing of the method token in
http_parser_parse_req() to restrict control characters from the method
token as well as the white space characters. Since this seems like a
rather key part of the ATS processing and has been this way for quite
sometime, I wanted to get confirmation from folks that this is a
reasonable change.
Thanks,
Susan