[issue5803] email/quoprimime: encode and decode are very slow on large messages
New submission from Dave Baggett : The implementation of encode and decode are slow, and scale nonlinearly in the size of the message to be encoded/decoded. A simple fix is to use an array.array('c') and append to it, rather than using string concatenation. This change makes the code more than an order of magnitude faster. -- components: Library (Lib) messages: 86203 nosy: dmbaggett severity: normal status: open title: email/quoprimime: encode and decode are very slow on large messages versions: Python 2.6 ___ Python tracker <http://bugs.python.org/issue5803> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5803] email/quoprimime: encode and decode are very slow on large messages
Changes by Dave Baggett : -- type: -> performance ___ Python tracker <http://bugs.python.org/issue5803> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5803] email/quoprimime: encode and decode are very slow on large messages
Dave Baggett added the comment: I can certainly generate a patch for you. What form would you like it in, and against what source tree? Also, do you have a preference between the use of array.array vs. standard arrays? (I don't know whether it's good or bad to depend on "import array" in quoprimime.) -- ___ Python tracker <http://bugs.python.org/issue5803> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue5803] email/quoprimime: encode and decode are very slow on large messages
Dave Baggett added the comment: Yes, sorry, I meant "built-in list type" not "array". Your point about using lists this way is valid, and is why I used array.array('c'). I will do as you suggest and try all three methods. I did time the array.array approach vs. the one currently in the code and it was about 30 times faster. -- ___ Python tracker <http://bugs.python.org/issue5803> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue917120] imaplib: incorrect quoting in commands
Dave Baggett added the comment: I'm not sure this causes the behavior reported here, but I believe there really is a bug in imaplib. In particular, it seems wrong to me that this line: mustquote = re.compile(r"[^\w!#$%&'*+,.:;<=>?^`|~-]") has \w in it. Should that be \s? I found this when I noticed that SELECT commands on mailboxes with spaces in their names failed. -- nosy: +dmbaggett ___ Python tracker <http://bugs.python.org/issue917120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue917120] imaplib: incorrect quoting in commands
Dave Baggett added the comment: OK, I missed the initial caret in the regex. The mustquote regex is listing everything that needn't be quoted, and then negating. I still think it's wrong, though. According to BNF given in the Formal Syntax section of RFC 3501, you must must quote atom-specials, which are defined thus: atom-specials = "(" / ")" / "{" / SP / CTL / list-wildcards / quoted-specials / resp-specials list-wildcards = "%" / "*" quoted-specials = DQUOTE / "\" resp-specials = "]" So I think this regex should do it: mustquote = re.compile(r'[()\s%*"]|"{"|"\\"|"\]"') Changing status to bug. -- type: feature request -> behavior ___ Python tracker <http://bugs.python.org/issue917120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue917120] imaplib: incorrect quoting in commands
Dave Baggett added the comment: Piers Lauder, author of imaplib, emailed me the following comment about this bug: The regex for "mustquote_cre" looks bizarre, and I regret to say I can no longer remember its genesis. Note however, that the term CTL in the RFC definition for "atom-specials" means "any ASCII control character and DEL, 0x00 - 0x1f, 0x7f", and so maybe defining what is NOT an atom-special was considered easier. The suggested replacement regex may not match these...? - It seems like we need to enumerate the control characters in the regex to be absolutely correct here. -- ___ Python tracker <http://bugs.python.org/issue917120> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com