Nobody wrote: > Bryan wrote: > > this is a case where we might want to be better > > than correct. BaseHTTPRequestHandler in the Python standard library > > accommodates clients that incorrectly omit the '\r' and end header lines > > with just '\n'. Such apps have been seen in the wild. Since bare '\n' > > never appears in correctly formed HTTP headers, interpreting it as > > equivalent to '\r\n' doesn't break anything. > > Yes it does. It breaks upstream filtering rules which are intended to > prohibit, remove or modify certain headers. > > This class of attack is known as "HTTP request smuggling". By > appending a header preceded by a bare '\r' or '\n' to the end of > another header, the header can be "smuggled" past a filter which > parses headers using the correct syntax,
How does a bare '\r' or '\n' get past a filter which parses headers using the correct syntax? I don't see where the correct syntax of the HTTP protocol allows that. > but will still be treated as a > header by software which incorrectly parses headers using bare '\r' or > '\n' as separators. Why blame software that incorrectly accepts '\n' as a line break, and not the filter that incorrectly accepted '\n' in the middle of a header? Both are accepting incorrect syntax, but only the former has good reason to do so. > The safest solution would be to simply reject any request (or response) > which contains bare '\r' or '\n' characters within headers, at least by > default. Force the programmer to read the documentation (where the risks > would be described) if they want the "fault tolerant" behaviour. The Internet has a tradition of protocols above the transport level being readable by eye and writable by hand. The result has been quick development, but many mistakes that can induce unforeseen consequences. This case is somewhat subtle. Within a text entity-body, HTTP allows any one of the three end-of-line delimiters. That's just the body; the header portion is more rigid. In HTTP 1.0: "This flexibility regarding line breaks applies only to text media in the Entity-Body; a bare CR or LF should not be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries)." -- RFC 1945 While in HTTP 1.1: "This flexibility regarding line breaks applies only to text media in the entity-body; a bare CR or LF MUST NOT be substituted for CRLF within any of the HTTP control structures (such as header fields and multipart boundaries)." -- RFC 2616 Note the change from "should not" to "MUST NOT". In reality our code might be called upon to work with apps that botch the technically- correct HTTP end-of-line marker. Rejecting bare '\n' may be safe from a technical security perspective, but if our safe code breaks a previously working system, then it will appear in a bug database and not in production. 'Nobody' makes a fair point. I'd love to see Internet protocols defined with mechanical rigor. Our discipline commonly specifies programming language syntax formally, and Internet protocols are syntactically simpler than programming languages. For now, HTTP is a bit of a mess, so write it absolutely correctly but read it a bit flexibly. -- http://mail.python.org/mailman/listinfo/python-list