from:"Grigory Statsenko"

[issue34222] Email message serialization enters an infinite loop when folding non-ASCII headers with long words

2018-07-25 Thread Grigory Statsenko


New submission from Grigory Statsenko :

(Discovered together with https://bugs.python.org/msg322348)

Email message serialization (in function _fold_as_ew) enters an infinite loop 
when folding non-ASCII headers whose words (after encoding) are longer than the 
given maxlen.

Besides being stuck in an infinite loop, it keeps appending to the `lines` 
list, so its memory usage keeps on growing also infinitely.
The code keeps appending encoded empty strings to the list like this:

lines: [
'Subject: =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' =?utf-8?q??=',
' '
]
(and it keeps on growing)

Here is my code that can reproduce this issue (as a unittest):


import email.generator
import email.policy
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from unittest import TestCase


def create_message(subject, sender, recipients, body):
msg = MIMEMultipart()
msg.set_charset('utf-8')
msg.policy = email.policy.SMTP
msg.attach(MIMEText(body, 'html'))
msg['Subject'] = subject
msg['From'] = sender
msg['To'] = ';'.join(recipients)
return msg


class TestEmailMessage(TestCase):
def _make_message(self, subject):
return create_message(
subject=subject, sender='m...@site.com',
recipients=['m...@site.com'], body='Some text',
)

def test_ascii_message_with_len_limit(self):
# very long subject consisting of a single word
subject = 'Q' * 100
msg = self._make_message(subject)
self.assertTrue(msg.as_string(maxheaderlen=76))

def test_non_ascii_message_with_len_limit(self):
# very long subject consisting of a single word
subject = 'Ц' * 100
msg = self._make_message(subject)
self.assertTrue(msg.as_string(maxheaderlen=76))


The ASCII test passes, but the non-ASCII one never finishes.

>From what I can tell, the problem is in line 2728 of 
>email/_header_value_parser.py:

first_part = first_part[:-excess]

where `excess` is calculated from the encoded string
(which is several times longer than the original one),
but it truncates the original (non-encoded string).
The problem arises when `excess` is actually greater than `first_part`
So, it attempts to encode the exact same part of the header and fails in every 
iteration,
instead appending an empty string to the list and encoding it as ' =?utf-8?q??='

What this amounts to is that it's now practically impossible to send emails 
with non-ACSII subjects without either disregarding the RFC recommendations and 
requirements for line length or risking hangs and memory leaks.

Just like in https://bugs.python.org/msg322348, this behavior is new in Python 
3.6. Also does not work in 3.7 and 3.8

--
components: email
messages: 322351
nosy: altvod, barry, r.david.murray
priority: normal
severity: normal
status: open
title: Email message serialization enters an infinite loop when folding 
non-ASCII headers with long words
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue34222>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34220] Serialization of email message without header line length limit and a non-ASCII subject fails with TypeError

2018-07-25 Thread Grigory Statsenko


New submission from Grigory Statsenko :

I have the following code that creates a simple email message with a) a 
pure-ASCII subject, b) non-ASCII subject
(I made it into a unittest):


import email.generator
import email.policy
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from unittest import TestCase


def create_message(subject, sender, recipients, body):
msg = MIMEMultipart()
msg.set_charset('utf-8')
msg.policy = email.policy.SMTP
msg.attach(MIMEText(body, 'html'))
msg['Subject'] = subject
msg['From'] = sender
msg['To'] = ';'.join(recipients)
return msg

class TestEmailMessage(TestCase):
def _make_message(self, subject):
return create_message(
subject=subject, sender='m...@site.com',
recipients=['m...@site.com'], body='Some text',
)

def test_ascii_message_no_len_limit(self):
# very long subject consisting of a single word
subject = 'Q' * 100
msg = self._make_message(subject)
self.assertTrue(str(msg))

def test_non_ascii_message_no_len_limit(self):
# very long subject consisting of a single word
subject = 'Ц' * 100
msg = self._make_message(subject)
self.assertTrue(str(msg))


The ASCII one passes, while the non-ASCII version fails with the following 
exception:

Traceback (most recent call last):
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", 
line 59, in testPartExecutor
yield
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/unittest/case.py", 
line 605, in run
testMethod()
  File "/home/grigory/PycharmProjects/smtptest/test_message.py", line 36, in 
test_non_ascii_message_no_len_limit
self.assertTrue(str(msg))
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", 
line 135, in __str__
return self.as_string()
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/message.py", 
line 158, in as_string
g.flatten(self, unixfrom=unixfrom)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", 
line 116, in flatten
self._write(msg)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", 
line 195, in _write
self._write_headers(msg)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/generator.py", 
line 222, in _write_headers
self.write(self.policy.fold(h, v))
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", 
line 183, in fold
return self._fold(name, value, refold_binary=True)
  File "/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/policy.py", 
line 205, in _fold
return value.fold(policy=self)
  File 
"/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/headerregistry.py", 
line 258, in fold
return header.fold(policy=policy)
  File 
"/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py",
 line 144, in fold
return _refold_parse_tree(self, policy=policy)
  File 
"/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py",
 line 2645, in _refold_parse_tree
part.ew_combine_allowed, charset)
  File 
"/home/grigory/.pyenv/versions/3.6.4/lib/python3.6/email/_header_value_parser.py",
 line 2722, in _fold_as_ew
first_part = to_encode[:text_space]
TypeError: slice indices must be integers or None or have an __index__ method


The problem is that _fold_as_ew treats maxlen as an integer, but it can also 
have inf and None as valid values. In my case it's inf, but None can also get 
there if the HTTP email policy is used and its max_line_length value is not 
overridden when serializing.
I am supposing that the correct behavior in both of these cases should be no 
wrapping at all. And/or maybe one of these (inf & None) should be converted to 
the other at some point, so only one special case has to handled in the 
low-level code

This behavior is new in Python 3.6. It works in 3.5.
Also fails in 3.7 and 3.8

--
components: email
messages: 322348
nosy: altvod, barry, r.david.murray
priority: normal
severity: normal
status: open
title: Serialization of email message without header line length limit and a 
non-ASCII subject fails with TypeError
type: behavior
versions: Python 3.6, Python 3.7, Python 3.8

___
Python tracker 
<https://bugs.python.org/issue34220>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


New submission from Grigory Statsenko:

JSONEncoder.iterencode doesn't work with empty iterators correctly.
Steps:
1. Define an iterator that is recognized by json as a list (inherit from list 
and define nonzero __len__).
2. Use json.dump with data containing an empty iterator defined as described in 
step#1 (but doesn't generate any items)

Expected result: it should be rendered as an empty list: '[]'

Actual result: it is rendered as ']' (only the closing bracket)
interestingly enough this behavior is not reproduced when using the dumps 
function.
I tried other alternatives to the standard json module: simplejson, ujson, hjson
All of them work as expected in this case (both brackets are rendered).

Here is an example of the code that demonstrates this error (compares the 
results of the dump and dumps functions):


import json as json
import io

class EmptyIterator(list):
def __iter__(self):
while False:
yield 1
def __len__(self):
return 1

def dump_to_str(data):
return json.dumps(data)

def dump_to_file(data):
stream = io.StringIO()
json.dump(data, stream)
return stream.getvalue()


data = {'it': EmptyIterator()}
print('to str: {0}'.format(dump_to_str(data)))
print('to file: {0}'.format(dump_to_file(data)))



This prints:
to str: {"it": []}
to file: {"it": ]}

--
messages: 271249
nosy: altvod
priority: normal
severity: normal
status: open
title: Empty iterator is rendered as a single bracket ] when using json's 
iterencode
type: behavior
versions: Python 3.5

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


Grigory Statsenko added the comment:

I can't do that if I don't know how many entries there will be ahead of time. 
In my real-life situation I'm fetching the data from a database not knowing how 
many entries I'll get before I actually get them (in the iterator). In most 
cases there are huge amounts of entries that take up too much memory - that's 
why I need to stream it. But sometimes the result set is empty - and that's 
when everything fails.

--

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


Grigory Statsenko added the comment:

Actually, it does work with len = 0
even if the iterator is not empty. So, I guess that is a solution.
But still, I think the more correct way would be to make it work with > 0

--

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


Grigory Statsenko added the comment:

My bad - it doesn't work with non-empty iterators if you set len to 0, so not a 
solution

--

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


Grigory Statsenko added the comment:

If __len__ is not defined, then the iterator is considered empty and is always 
rendered as [] even if it really isn't empty

--

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

2016-07-25 Thread Grigory Statsenko


Grigory Statsenko added the comment:

With streaming you never know the real length before you're done iterating.

Anyway, the fix really shouldn't be that complicated:
In _iterencode_list just yield  the '[' instead of saving it to buf

--

___
Python tracker 
<http://bugs.python.org/issue27613>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue34222] Email message serialization enters an infinite loop when folding non-ASCII headers with long words

[issue34220] Serialization of email message without header line length limit and a non-ASCII subject fails with TypeError

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

[issue27613] Empty iterator is rendered as a single bracket ] when using json's iterencode

8 matches

Site Navigation

Mail list logo

Footer information