[issue1555570] email parser incorrectly breaks headers with a CRLF at 8192

Karen Tracey Wed, 02 Apr 2008 08:38:05 -0700

Karen Tracey <[EMAIL PROTECTED]> added the comment:

Opening the file in universal newline mode doesn't work for cases where
the 'file' contains multipart MIME data (eg. multipart/form-data) where
one of the included parts is binary data (eg. application/octet-stream).
 In that case, blind translation of CRLF to LF may corrupt the binary
data.  (Thanks to Thomas Guettler for pointing that out to me.)


FeedParser goes to considerable trouble to split on any conceivable line
boundary but retain whatever line boundary existed in the stream when
putting things back together.  (Look at BufferedSubFile's push() code in
feedparser.py.)  It was not written on the assumption that it would be
getting LFs only.  

The only code that knows enough to know which CRLFs are really line
breaks is the code that is breaking the stream up based on the boundary
markers -- that is the FeedParser code.  It isn't safe for the caller to
do any CRLF conversions before calling the Parser.  Therefore I believe
the fix needs to be made to the parser.py code, not the docs.

Two people that I know of independently re-discovered this bug in the
last couple of weeks (running Django), after I re-discovered it about
three months ago after Jeremy Dunck re-discovered it a year earlier,
three months after it was originally opened.  Maybe a corner case, but
it would be nice, since it is quite difficult for people to track down,
and the fix is so trivial, if the fix could be put in.

_____________________________________
Tracker <[EMAIL PROTECTED]>
<http://bugs.python.org/issue1555570>
_____________________________________
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue1555570] email parser incorrectly breaks headers with a CRLF at 8192

Reply via email to