New submission from robin <robin.eriks...@f-secure.com>:
Attached please find a script which takes on the order of 1 minute to parse even though the embedded message is reasonably trivial. The main flaw is that the Content-Type: header has a long string of redundant ;;;; which is something some spammers apparently use to bypass some content filters or analyzers. The script is short enough to inline here for what I hope is convenience for most visitors. from email import message_from_bytes #from email.parser import BytesParser from email.policy import default ##from cProfile import run as cprun b = (b'''From: me <s...@example.com> To: you <recipi...@example.net> Subject: sample with ;;;;;;;;; stuffing in MIME Content-Type: header Content-type: text/plain''' + b';' * 54 + b'\n' + 36 * (b' ' + b';' * 990 + b'\n') + b'''\ Content-transfer-encoding: 7bit MIME-Version: 1.0 Hello. ''') ## cprun('message_from_bytes(b, policy=default)', sort=1) m = message_from_bytes(b, policy=default) #m = BytesParser(policy=default).parsebytes(b) print(m.as_string()) I have commented out two sets of changes; the ones with a single # demonstrate that the same error happens with BytesParser, and the ones with ## are for profiling the code. For what it's worth, profiling consistently indicates that it gets stuck in builtins.sum while attempting to parse the message. Here is a partial cProfile result from Python 3.7.2: 2148205 function calls (2004560 primitive calls) in 34.533 seconds Ordered by: internal time ncalls tottime percall cumtime percall filename:lineno(function) 160/4 32.356 0.202 32.728 8.182 {built-in method builtins.sum} 4 0.560 0.140 1.661 0.415 _header_value_parser.py:2380(parse_mime_parameters) 142772 0.230 0.000 0.230 0.000 {method 'format' of 'str' objects} 142932 0.203 0.000 0.203 0.000 _header_value_parser.py:866(all_defects) 142772 0.197 0.000 0.288 0.000 errors.py:84(__init__) 143248/24 0.167 0.000 32.727 1.364 _header_value_parser.py:126(<genexpr>) 142772 0.151 0.000 0.456 0.000 _header_value_parser.py:2126(get_attribute) 285848 0.146 0.000 0.146 0.000 _header_value_parser.py:109(__init__) 142772 0.102 0.000 0.631 0.000 _header_value_parser.py:2241(get_parameter) 142772 0.091 0.000 0.091 0.000 errors.py:36(__init__) 143076 0.088 0.000 0.153 0.000 _header_value_parser.py:854(__new__) 143080 0.065 0.000 0.065 0.000 {built-in method __new__ of type object at 0x101c20640} 12 0.044 0.004 0.086 0.007 _header_value_parser.py:716(params) 285839 0.042 0.000 0.042 0.000 {method 'endswith' of 'str' objects} 3 0.030 0.010 27.512 9.171 message.py:588(get_content_maintype) 286482 0.029 0.000 0.029 0.000 {method 'append' of 'list' objects} 2 0.014 0.007 0.014 0.007 <frozen importlib._bootstrap_external>:914(get_data) 6 0.008 0.001 25.369 4.228 feedparser.py:218(_parsegen) 4 0.001 0.000 0.001 0.000 {method 'split' of 're.Pattern' objects} 160/4 0.001 0.000 32.728 8.182 _header_value_parser.py:124(all_defects) 288 0.001 0.000 0.002 0.000 _header_value_parser.py:1000(get_fws) 288 0.001 0.000 0.003 0.000 _header_value_parser.py:1217(get_cfws) Starting the code and doing a KeyboardInterrupt after a few, or many, seconds tends to get a traceback like this, which also points to roughly the same culprit: ^CTraceback (most recent call last): File "repro.py", line 18, in <module> m = message_from_bytes(b, policy=default) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/__init__.py", line 46, in message_from_bytes return BytesParser(*args, **kws).parsebytes(s) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/parser.py", line 124, in parsebytes return self.parser.parsestr(text, headersonly) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/parser.py", line 68, in parsestr return self.parse(StringIO(text), headersonly=headersonly) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/parser.py", line 57, in parse feedparser.feed(data) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/feedparser.py", line 176, in feed self._call_parse() File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/feedparser.py", line 180, in _call_parse self._parse() File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/feedparser.py", line 256, in _parsegen if self._cur.get_content_type() == 'message/delivery-status': File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/message.py", line 578, in get_content_type value = self.get('content-type', missing) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/message.py", line 471, in get return self.policy.header_fetch_parse(k, v) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/policy.py", line 162, in header_fetch_parse return self.header_factory(name, value) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/headerregistry.py", line 589, in __call__ return self[name](name, value) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/headerregistry.py", line 197, in __new__ cls.parse(value, kwds) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/headerregistry.py", line 448, in parse kwds['defects'].extend(parse_tree.all_defects) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/_header_value_parser.py", line 126, in all_defects return sum((x.all_defects for x in self), self.defects) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/_header_value_parser.py", line 126, in <genexpr> return sum((x.all_defects for x in self), self.defects) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/_header_value_parser.py", line 126, in all_defects return sum((x.all_defects for x in self), self.defects) File "/Users/eriker/.pyenv/versions/3.7.2/lib/python3.7/email/_header_value_parser.py", line 126, in <genexpr> return sum((x.all_defects for x in self), self.defects) KeyboardInterrupt I have been able to repro on Python 3.9 and 3.10rc as well by way of the official Docker images rc-buster and 3.9-slim. ---------- components: email files: repro.py messages: 384931 nosy: barry, eriker, r.david.murray priority: normal severity: normal status: open title: Email header with ;;;; stuffing takes very long to parse type: resource usage versions: Python 3.10, Python 3.6, Python 3.7, Python 3.8, Python 3.9 Added file: https://bugs.python.org/file49736/repro.py _______________________________________ Python tracker <rep...@bugs.python.org> <https://bugs.python.org/issue42909> _______________________________________ _______________________________________________ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com