[issue34222] Email message serialization enters an infinite loop when folding non-ASCII headers with long words
New submission from Grigory Statsenko : (Discovered together with https://bugs.python.org/msg322348) Email message serialization (in function _fold_as_ew) enters an infinite loop when folding non-ASCII headers whose words (after encoding) are longer than the given maxlen. Besides being stuck in an infinite loop, it keeps appending to the `lines` list, so its memory usage keeps on growing also infinitely. The code keeps appending encoded empty strings to the list like this: lines: [ 'Subject: =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' =?utf-8?q??=', ' ' ] (and it keeps on growing) Here is my code that can reproduce this issue (as a unittest): import email.generator import email.policy from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from unittest import TestCase def create_message(subject, sender, recipients, body): msg = MIMEMultipart() msg.set_charset('utf-8') msg.policy = email.policy.SMTP msg.attach(MIMEText(body, 'html')) msg['Subject'] = subject msg['From'] = sender msg['To'] = ';'.join(recipients) return msg class TestEmailMessage(TestCase): def _make_message(self, subject): return create_message( subject=subject, sender='m...@site.com', recipients=['m...@site.com'], body='Some text', ) def test_ascii_message_with_len_limit(self): # very long subject consisting of a single word subject = 'Q' * 100 msg = self._make_message(subject) self.assertTrue(msg.as_string(maxheaderlen=76)) def test_non_ascii_message_with_len_limit(self): # very long subject consisting of a single word subject = 'Ц' * 100 msg = self._make_message(subject) self.assertTrue(msg.as_string(maxheaderlen=76)) The ASCII test passes, but the non-ASCII one never finishes. >From what I can tell, the problem is in line 2728 of >email/_header_value_parser.py: first_part = first_part[:-excess] where `excess` is calculated from the encoded string (which is several times longer than the original one), but it truncates the original (non-encoded string). The problem arises when `excess` is actually greater than `first_part` So, it attempts to encode the exact same part of the header and fails in every iteration, instead appending an empty string to the list and encoding it as ' =?utf-8?q??=' What this amounts to is that it's now practically impossible to send emails with non-ACSII subjects without either disregarding the RFC recommendations and requirements for line length or risking hangs and memory leaks. Just like in https://bugs.python.org/msg322348, this behavior is new in Python 3.6. Also does not work in 3.7 and 3.8 -- components: email messages: 322351 nosy: altvod, barry, r.david.murray priority: normal severity: normal status: open title: Email message serialization enters an infinite loop when folding non-ASCII headers with long words versions: Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue34222> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue34220] Serialization of email message without header line length limit and a non-ASCII subject fails with TypeError
New submission from Grigory Statsenko : I have the following code that creates a simple email message with a) a pure-ASCII subject, b) non-ASCII subject (I made it into a unittest): import email.generator import email.policy from email.mime.multipart import MIMEMultipart from email.mime.text import MIMEText from unittest import TestCase def create_message(subject, sender, recipients, body): msg = MIMEMultipart() msg.set_charset('utf-8') msg.policy = email.policy.SMTP msg.attach(MIMEText(body, 'html')) msg['Subject'] = subject msg['From'] = sender msg['To'] = ';'.join(recipients) return msg class TestEmailMessage(TestCase): def _make_message(self, subject): return create_message( subject=subject, sender='m...@site.com', recipients=['m...@site.com'], body='Some text', ) def test_ascii_message_no_len_limit(self): # very long subject consisting of a single word subject = 'Q' * 100 msg = self._make_message(subject) self.assertTrue(str(msg)) def test_non_ascii_message_no_len_limit(self): # very long subject consisting of a single word subject = 'Ц' * 100 msg = self._make_message(subject) self.assertTrue(str(msg)) The ASCII one passes, while the non-ASCII version fails with the following exception: Traceback (most recent call last): File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", line 59, in testPartExecutor yield File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", line 605, in run testMethod() File "/home/grigory/PycharmProjects/smtptest/test_message.py", line 36, in test_non_ascii_message_no_len_limit self.assertTrue(str(msg)) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", line 135, in __str__ return self.as_string() File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", line 158, in as_string g.flatten(self, unixfrom=unixfrom) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 116, in flatten self._write(msg) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 195, in _write self._write_headers(msg) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", line 222, in _write_headers self.write(self.policy.fold(h, v)) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", line 183, in fold return self._fold(name, value, refold_binary=True) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", line 205, in _fold return value.fold(policy=self) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/headerregistry.py", line 258, in fold return header.fold(policy=policy) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 144, in fold return _refold_parse_tree(self, policy=policy) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 2645, in _refold_parse_tree part.ew_combine_allowed, charset) File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py", line 2722, in _fold_as_ew first_part = to_encode[:text_space] TypeError: slice indices must be integers or None or have an __index__ method The problem is that _fold_as_ew treats maxlen as an integer, but it can also have inf and None as valid values. In my case it's inf, but None can also get there if the HTTP email policy is used and its max_line_length value is not overridden when serializing. I am supposing that the correct behavior in both of these cases should be no wrapping at all. And/or maybe one of these (inf & None) should be converted to the other at some point, so only one special case has to handled in the low-level code This behavior is new in Python 3.6. It works in 3.5. Also fails in 3.7 and 3.8 -- components: email messages: 322348 nosy: altvod, barry, r.david.murray priority: normal severity: normal status: open title: Serialization of email message without header line length limit and a non-ASCII subject fails with TypeError type: behavior versions: Python 3.6, Python 3.7, Python 3.8 ___ Python tracker <https://bugs.python.org/issue34220> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
New submission from Grigory Statsenko: JSONEncoder.iterencode doesn't work with empty iterators correctly. Steps: 1. Define an iterator that is recognized by json as a list (inherit from list and define nonzero __len__). 2. Use json.dump with data containing an empty iterator defined as described in step#1 (but doesn't generate any items) Expected result: it should be rendered as an empty list: '[]' Actual result: it is rendered as ']' (only the closing bracket) interestingly enough this behavior is not reproduced when using the dumps function. I tried other alternatives to the standard json module: simplejson, ujson, hjson All of them work as expected in this case (both brackets are rendered). Here is an example of the code that demonstrates this error (compares the results of the dump and dumps functions): import json as json import io class EmptyIterator(list): def __iter__(self): while False: yield 1 def __len__(self): return 1 def dump_to_str(data): return json.dumps(data) def dump_to_file(data): stream = io.StringIO() json.dump(data, stream) return stream.getvalue() data = {'it': EmptyIterator()} print('to str: {0}'.format(dump_to_str(data))) print('to file: {0}'.format(dump_to_file(data))) This prints: to str: {"it": []} to file: {"it": ]} -- messages: 271249 nosy: altvod priority: normal severity: normal status: open title: Empty iterator is rendered as a single bracket ] when using json's iterencode type: behavior versions: Python 3.5 ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
Grigory Statsenko added the comment: I can't do that if I don't know how many entries there will be ahead of time. In my real-life situation I'm fetching the data from a database not knowing how many entries I'll get before I actually get them (in the iterator). In most cases there are huge amounts of entries that take up too much memory - that's why I need to stream it. But sometimes the result set is empty - and that's when everything fails. -- ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
Grigory Statsenko added the comment: Actually, it does work with len = 0 even if the iterator is not empty. So, I guess that is a solution. But still, I think the more correct way would be to make it work with > 0 -- ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
Grigory Statsenko added the comment: My bad - it doesn't work with non-empty iterators if you set len to 0, so not a solution -- ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
Grigory Statsenko added the comment: If __len__ is not defined, then the iterator is considered empty and is always rendered as [] even if it really isn't empty -- ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode
Grigory Statsenko added the comment: With streaming you never know the real length before you're done iterating. Anyway, the fix really shouldn't be that complicated: In _iterencode_list just yield the '[' instead of saving it to buf -- ___ Python tracker <http://bugs.python.org/issue27613> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com