[issue31820] Calling email.message.set_payload twice produces an invalid eml

2017-10-19 Thread Zirak

New submission from Zirak :

Example:

In [52]: import email.message

In [53]: m = email.message.Message()

In [54]: m.set_payload('abc', 'utf8')

In [55]: m.get_payload() # correctly encoded
Out[55]: 'YWJj\n'

In [56]: m.set_payload('abc', 'utf8')

In [57]: m.get_payload() # no more encoding?
Out[57]: 'abc'

In [58]: m.get_payload(decode=True) # wut?
Out[58]: b'i\xb7'

In [59]: print(str(m))
MIME-Version: 1.0
Content-Type: text/plain; charset="utf8"
Content-Transfer-Encoding: base64

abc


While the first `set_payload` correctly encodes and sets the message's
Content-Transfer-Encoding, the second call doesn't properly encode the
payload according to its existing Content-Transfer-Encoding.
Tested on 3.6, 3.5 and 2.7.

`email.message.set_payload` does not directly encode the payload,
instead `email.message.set_charset` does, around line 353:
https://github.com/python/cpython/blob/b067c8fdd1e205bd0411417b6d5e4b832c3773fc/Lib/email/message.py#L353-L368

In both invocations of `set_payload`, the payload is not encoded
according to the encoding. On the first invocation, the `CTE` header
is correctly set according to `charset.get_body_encoding` (#354 and
#368) and the payload is encoded (#356 or #367, the latter in this
case).

On the second invocation, the `CTE` header is already set, so the
payload is never encoded.

This is especially dangerous when passing `decode=True` to
`get_payload` after the 2nd `set_payload`, as that may throw an error
in some cases (trying to base64 decode a string which makes no sense
to it. that's how I arrived on this bug, but I can't for the life of
me replicate an exception).

This is a bit finicky to fix. If we change `set_charset` to always
encode the current payload, we risk double-encoding when `set_charset`
is not called through `set_payload`. However if `set_charset` tries to
decode the payload, it will produce incorrect results when *it is*
called through `set_payload`. urgh.

We can move the actual encoding code away from `set_charset`, either
into `set_payload` or a third function, but that'll break any code
calling `set_payload` without a charset and then calling
`set_charset`. urgh.

One possible solution is for both `set_charset` and `set_payload` to
call a third function, e.g. `_encode_payload`. Perhaps something like
(pseudocode):


def set_payload(self, payload, charset):
# ...
if 'Content-Transfer-Encoding' in self:
self._payload = self._encode_payload(payload)
self.set_charset(charset)
# ...

def set_charset(self, charset):
# ...
if 'Content-Transfer-Encoding' not in self:
self._payload = self._encode_payload()
self.add_header(...)

def _encode_payload(self, payload):
# code in lines 353-366



This way, `set_charset` handles the cases where CTE was never defined,
and `set_payload` makes sure to encode the payload when a CTE is
present. It may work, but something about this gives me unrest. For
example, if you `set_charset` once and it encodes your payload as
base64, and you `set_charset` again with a charset whose
`get_body_encoding` returns qp, the payload would still be base64 even
though it *should be* qp. urgh.

Is this a big enough concern? Is there a superior approach I'm
missing?

Thanks in advance!

--
components: email
messages: 304628
nosy: Zirak Ertan, barry, r.david.murray
priority: normal
severity: normal
status: open
title: Calling email.message.set_payload twice produces an invalid eml
type: behavior
versions: Python 2.7, Python 3.5, Python 3.6

___
Python tracker 
<https://bugs.python.org/issue31820>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue31820] Calling email.message.set_payload twice produces an invalid eml

2017-10-19 Thread Zirak

Zirak  added the comment:

On irc, bitmancer suggested that this problem is already solved by the
email.message.EmailMessage class, as it is:


In [119]: m = email.message.EmailMessage()

In [120]: m.set_content('abc', 'utf8', cte='base64')

In [121]: m.get_payload()
Out[121]: 'YWJjCg==\n'

In [122]: m.set_content('abc', 'utf8', cte='base64')

In [123]: m.get_payload()
Out[123]: 'YWJjCg==\n'

In [124]: m.get_payload(decode=True)
Out[124]: b'abc\n'

In [125]: print(m)
MIME-Version: 1.0
Content-Type: text/utf8; charset="utf-8"
Content-Transfer-Encoding: base64

YWJjCg==


Because this isn't a critical bug and `email.message.Message` is quite
deprecated, and this is solved by a newer API, this bug may not need
addressing.

--
resolution:  -> wont fix
stage:  -> resolved
status: open -> closed

___
Python tracker 
<https://bugs.python.org/issue31820>
___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com