Ashwin Ramaswami <aramaswa...@gmail.com> added the comment:

Oh, both the Travis links I sent actually ended up reproducing the bug.

I've made a PR that fixes with an even smaller test case:

get_unstructured('=?utf-8?q?somevalue?=aa')

It looks like this is caused because "aa" is thought to be an encoded word 
escape in 
https://github.com/python/cpython/blob/fd5a82a7685d1599aab12e722a383cb0a2adfd8a/Lib/email/_header_value_parser.py#L1042
 -- thus, get_encoded_word fails, which ends up making get_unstructured go in 
an infinite loop.

My PR makes the parser parse "=?utf-8?q?somevalue?=aa" as 
"=?utf-8?q?somevalue?=aa". However, the existing test cases make sure it parses 
"=?utf-8?q?somevalue?=nowhitespace" as "somevaluenowhitespace". I'm not too 
familiar with RFC 2047, but why are "aa" and "nowhitespace" treated 
differently? Should they be?

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue37764>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to