New submission from Alexander Kruppa: This is a follow-up to #16564. In that issue, BytesGenerator was changed to accept a bytes payload, however processing binary data that way leads to data corruption.
Repost of the update I posted in #16564: *********************************************************** ~/build/Python-3.3.2$ ./python --version Python 3.3.2 When modifying the test case in Lib/test/test_email/test_email.py like this: --- Lib/test/test_email/test_email.py 2013-05-15 18:32:55.000000000 +0200 +++ Lib/test/test_email/test_email_mine.py 2013-09-10 14:22:08.160089440 +0200 @@ -1461,17 +1461,17 @@ # Issue 16564: This does not produce an RFC valid message, since to be # valid it should have a CTE of binary. But the below works in # Python2, and is documented as working this way. - bytesdata = b'\xfa\xfb\xfc\xfd\xfe\xff' + bytesdata = b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' msg = MIMEApplication(bytesdata, _encoder=encoders.encode_noop) # Treated as a string, this will be invalid code points. - self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) + # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) self.assertEqual(msg.get_payload(decode=True), bytesdata) s = BytesIO() g = BytesGenerator(s) g.flatten(msg) wireform = s.getvalue() msg2 = email.message_from_bytes(wireform) - self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) + # self.assertEqual(msg.get_payload(), '\uFFFD' * len(bytesdata)) self.assertEqual(msg2.get_payload(decode=True), bytesdata) then running: ./python ./Tools/scripts/run_tests.py test_email results in: ====================================================================== FAIL: test_binary_body_with_encode_noop (test_email_mine.TestMIMEApplication) ---------------------------------------------------------------------- Traceback (most recent call last): File "/localdisk/kruppaal/build/Python-3.3.2/Lib/test/test_email/test_email_mine.py", line 1475, in test_binary_body_with_encode_noop self.assertEqual(msg2.get_payload(decode=True), bytesdata) AssertionError: b'\x0b\n\xfa\xfb\xfc\xfd\xfe\xff' != b'\x0b\xfa\xfb\xfc\xfd\xfe\xff' The '\x0b' byte is incorrectly translated to '\x0b\n', i.e., a New Line character is inserted. Encoding the bytes array: bytes(range(256)) results output data (MIME Header stripped): 0000000: 0001 0203 0405 0607 0809 0a0b 0a0c 0a0a ................ 0000010: 0e0f 1011 1213 1415 1617 1819 1a1b 1c0a ................ 0000020: 1d0a 1e0a 1f20 2122 2324 2526 2728 292a ..... !"#$%&'()* 0000030: 2b2c 2d2e 2f30 3132 3334 3536 3738 393a +,-./0123456789: 0000040: 3b3c 3d3e 3f40 4142 4344 4546 4748 494a ;<=>?@ABCDEFGHIJ 0000050: 4b4c 4d4e 4f50 5152 5354 5556 5758 595a KLMNOPQRSTUVWXYZ 0000060: 5b5c 5d5e 5f60 6162 6364 6566 6768 696a [\]^_`abcdefghij 0000070: 6b6c 6d6e 6f70 7172 7374 7576 7778 797a klmnopqrstuvwxyz 0000080: 7b7c 7d7e 7f80 8182 8384 8586 8788 898a {|}~............ 0000090: 8b8c 8d8e 8f90 9192 9394 9596 9798 999a ................ 00000a0: 9b9c 9d9e 9fa0 a1a2 a3a4 a5a6 a7a8 a9aa ................ 00000b0: abac adae afb0 b1b2 b3b4 b5b6 b7b8 b9ba ................ 00000c0: bbbc bdbe bfc0 c1c2 c3c4 c5c6 c7c8 c9ca ................ 00000d0: cbcc cdce cfd0 d1d2 d3d4 d5d6 d7d8 d9da ................ 00000e0: dbdc ddde dfe0 e1e2 e3e4 e5e6 e7e8 e9ea ................ 00000f0: ebec edee eff0 f1f2 f3f4 f5f6 f7f8 f9fa ................ 0000100: fbfc fdfe ff ..... That is, a '\n' is inserted after '\x0b', '\x1c', '\x1d', and '\x1e', and '\x0d' is replaced by '\n\n'. *********************************************************** I suspect this is due to the use of self._write_lines(msg._payload) in BytesGenerator._handle_text(); since _write_lines() mangles line endings. ---------- components: email messages: 197476 nosy: Alexander.Kruppa, barry, r.david.murray priority: normal severity: normal status: open title: email.generator.BytesGenerator corrupts data by changing line endings type: behavior versions: Python 3.2, Python 3.3 _______________________________________ Python tracker <rep...@bugs.python.org> <http://bugs.python.org/issue19003> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com