[issue35547] email.parser / email.policy does not correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

R. David Murray Sat, 22 Dec 2018 05:12:16 -0800


R. David Murray <[email protected]> added the comment:


Here's a patch that makes the example work correctly.  This is not a fix, a 
real fix will be more complicated.  This just demonstrates the kind of thing 
that needs fixing and where.

The existing parser produces a sub-optimal parse tree as its result...the parse 
tree is hard to inspect and manipulate because there are so many special cases. 
 A good fix here would create some sort of function that could be passed an 
existing TokenList, the new token to add to that list, and the function would 
check all the special cases and do the EWWhiteSpaceTerminal substitution when 
and as appropriate.  This could then be used in the unstructured parser as well 
as Phrase...and some thought should be given to where else it might be needed.  
It has been long enough since I've held the RFCs in my head that I don't 
remember if there is anywhere else.

I haven't looked at the actual character string, so I don't know if we need to 
also be detecting and posting a defect about a split character or not, but we 
don't *have* to answer that question to fix this.

diff --git a/Lib/email/_header_value_parser.py 
b/Lib/email/_header_value_parser.py
index e805a75..d5d5986 100644
--- a/Lib/email/_header_value_parser.py
+++ b/Lib/email/_header_value_parser.py
@@ -199,6 +199,10 @@ class CFWSList(WhiteSpaceTokenList):
 
 class Atom(TokenList):
 
+    @property
+    def has_encoded_word(self):
+        return any(t.token_type=='encoded-word' for t in self)
+
     token_type = 'atom'
 
 
@@ -1382,6 +1386,12 @@ def get_phrase(value):
                         "comment found without atom"))
                 else:
                     raise
+            if token.has_encoded_word:
+                assert phrase[-1].token_type == 'atom', phrase[-1]
+                assert phrase[-1][-1].token_type == 'cfws'
+                assert phrase[-1][-1][-1].token_type == 'fws'
+                if phrase[-1].has_encoded_word:
+                    phrase[-1][-1] = EWWhiteSpaceTerminal(phrase[-1][-1][-1], 
'fws')
             phrase.append(token)
     return phrase, value

----------

______________________________________________
Python tracker <[email protected]>
<https://bugs.python.org/issue35547>
______________________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue35547] email.parser / email.policy does not correctly handle multiple RFC2047 encoded-word tokens across RFC5322 folded headers

Reply via email to