[ python-Bugs-1467619 ] Header.decode_header eats up spaces

SourceForge.net Wed, 16 May 2007 06:08:44 -0700

Bugs item #1467619, was opened at 2006-04-10 06:33
Message generated for change (Comment added) made by bwarsaw
You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1467619&group_id=5470


Please note that this message will contain a full copy of the comment thread,
including the initial issue submission, for this request,
not just the latest update.
Category: Python Library
Group: Python 2.3
Status: Open
Resolution: None
Priority: 6
Private: No
Submitted By: Mathieu Goutelle (mgoutell)
Assigned to: Barry A. Warsaw (bwarsaw)
Summary: Header.decode_header eats up spaces

Initial Comment:
The Header.decode_header function eats up spaces in
non-encoded part of a header.

See the following source:
# -*- coding: iso-8859-1 -*-
from email.Header import Header, decode_header
h = Header('Essai ', None)
h.append('éè', 'iso-8859-1')
print h
print decode_header(h)

This prints:
Essai =?iso-8859-1?q?=E9=E8?=
[('Test', None), ('\xe9\xe8', 'iso-8859-1')]

This should print:
Essai =?iso-8859-1?q?=E9=E8?=
[('Test ', None), ('\xe9\xe8', 'iso-8859-1')]
       ^ This space disappears

This appears in Python 2.3 but the source code of the
function didn't change in 2.4 so the same problem
should still exist. Bug "[ 1372770 ] email.Header
should preserve original FWS" may be linked to that one
although I'm not sure this is exactly the same.

This patch (not extensively tested though) seems to
solve this problem:

--- /usr/lib/python2.3/email/Header.py  2005-09-05
00:20:03.000000000 +0200
+++ Header.py   2006-04-10 12:27:27.000000000 +0200
@@ -90,7 +90,7 @@
             continue
         parts = ecre.split(line)
         while parts:
-            unenc = parts.pop(0).strip()
+            unenc = parts.pop(0).rstrip()
             if unenc:
                 # Should we continue a long line?
                 if decoded and decoded[-1][1] is None:


----------------------------------------------------------------------

>Comment By: Barry A. Warsaw (bwarsaw)
Date: 2007-05-16 09:08

Message:
Logged In: YES 
user_id=12800
Originator: NO

IIRC, I tried the OP's patch and it broke too many of the email package's
test suite.  I made an attempt at fixing the problem to be much more RFC
compliant, but couldn't get the test suite to pass completely.  This points
to a much deeper problem with email package header management.  I don't
think the problem is a bug, I think it's a design flaw.


----------------------------------------------------------------------

Comment By: Georg Brandl (gbrandl)
Date: 2007-05-16 08:51

Message:
Logged In: YES 
user_id=849994
Originator: NO

I propose the attached patch. RFC 2047 specifies to ignore whitespace
between encoded-words, but IMHO not between ordinary text and
encoded-words.
File Added: emailheader.diff

----------------------------------------------------------------------

Comment By: Mathieu Goutelle (mgoutell)
Date: 2007-05-16 05:25

Message:
Logged In: YES 
user_id=719862
Originator: YES

Hello,
Any news about this bug. It seems still there in 2.5 after a one year
notice...
Regards,

----------------------------------------------------------------------

Comment By: Alexander Schremmer (alexanderweb)
Date: 2006-05-12 18:28

Message:
Logged In: YES 
user_id=254738

I can confirm this bug and have been bitten by it as well.

----------------------------------------------------------------------

You can respond by visiting: 
https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1467619&group_id=5470
_______________________________________________
Python-bugs-list mailing list 
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[ python-Bugs-1467619 ] Header.decode_header eats up spaces

Reply via email to