Senthil Kumaran <sent...@uthcode.com> added the comment:

The concern here is if the request line had something like this.

    Method SP Request-URI SP HTTP-Version <ANY_\r_\n_\r\n_Combination>\r\n

The previous behavior would have resulted in 

    Method SP Request-URI SP HTTP-Version <ANY_\r_\n_\r\n_Combination>

That is removing only the final \r\n, whereas the current change would make it

    Method SP Request-URI SP HTTP-Version 

That is removes all the trailing \r\n combination.

BTW, thing to note this, this is only for request line and not the header
lines.  And for request-line, both HTTP 1.0 and HTTP 1.1 spec has this in 
section
5.1

    5.1  Request-Line

       The Request-Line begins with a method token, followed by the
       Request-URI and the protocol version, and ending with CRLF. The
       elements are separated by SP characters. No CR or LF are allowed
       except in the final CRLF sequence.

       Request-Line = Method SP Request-URI SP HTTP-Version CRLF

Which leads me to believe that, removing all the trailing \r\n is a fine thing
to do and should not be harmful.

Just to augment this with few other things I found while (re-)reading the spec.
This advise is different from Header's trailing whitespace, which is called
Linear White space (LWS).  If the Host Header looks like, e.g.  "Host:
www.foo.com \r\n" (notice the trailing white space), 

According to RFC 2616 (HTTP 1.1), section 4.2 Message Headers:

   The field-content does not include any leading or trailing LWS:
   linear white space occurring before the first non-whitespace
   character of the field-value or after the last non-whitespace
   character of the field-value. Such leading or trailing LWS MAY be
   removed without changing the semantics of the field value.

RFC 1945 (HTTP 1.0), section 4.2 Message Headers does not make such an explicit
statement.

My guess on the former behavior in http/server.py is that it was thought that
Request-Line was following something like section 4.2 on HTTP 1.0 spec and only
the last two characters were removed. But actually, the request-line as per
spec should have only one CRLF as end char. In the Docstring of the
BaseHTTPServer class, there is a mention about preserving the trailing
white-space, but it does not point to any authoritative reference, so I am sure
taking docstring as reference to preserve the behavior is a good idea.

Before dwelling to find the reason, I was thinking if reverting the patch in
2.7 and 3.1 would be a good idea.  But give that change has support from older
specs to new ones, I am inclined to think that leave the change as such
(without reverting) should be fine as well. 

Only if we find a stronger backwards compatibility argument for leaving
trailing \r\n in request-line, then we should remove it in 2.7 and 3.2,
otherwise we can leave it as such.

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue13294>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to