Feature Requests item #1285086, was opened at 2005-09-08 11:37 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1285086&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Python Library >Group: None Status: Open Resolution: None >Priority: 2 Submitted By: Tres Seaver (tseaver) Assigned to: Nobody/Anonymous (nobody) Summary: urllib.quote is too slow Initial Comment: 'urllib.quote' delegates to '_fast_quote' for the common case that the user has passed no 'safe' argument. However, '_fast_quote' isn't really very fast, especially for the case that it doesn't need to quote anything. Zope (and presumably other web frameworks) can end up calling 'quote' dozens, hundreds, even thousands of times to render a page, which makes this a potentially big win for them. I will attach a speed test script which demonstrates the speed penalty, along with a patch which implements the speedup. ---------------------------------------------------------------------- >Comment By: Raymond Hettinger (rhettinger) Date: 2005-09-09 22:45 Message: Logged In: YES user_id=80475 Checked in a speed-up for Py2.5. See Lib/urllib.py 1.169. The check-in provides fast-quoting for all cases (not just for the default safe argument). Even the fast path is quicker. With translation for both safe and unsafe characters, it saves len(s) trips through the eval loop, computes of non-safe replacements just once, and eliminates the if-logic. The new table is collision free and has no failed lookups, so each lookup requires exactly one probe. One my machine, timings improved by a factor of two to three depending on the length of input and number of escaped characters. The check-in also simplifies and speeds-up quote_plus() by using str.replace() instead of a split Leaving this SF report open because the OP's idea may possibly provide further improvement -- the checkin itself was done because it is a clear win over the existing version. The OP's patch uses regexps to short-circuit when no changes are needed. Unless the regexp is cheap and short-circuits often, the cost of testing will likely exceed the average amount saved. Determining whether the regexp is cheaper than the checked-in version just requires a few timings. But, determining the short-circuit percentage requires collecting statistics from real programs with real data. For the idea to be a winner, regexps have to be much faster than the map/lookup/join step AND the short-circuit case must occur frequently. Am lowering the priority until a better patch is received along with timings and statistical evidence demonstrating a significant improvement. Also, reclassifying as a Feature Request because the existing code is functioning as documented and passing tests. ---------------------------------------------------------------------- Comment By: Tres Seaver (tseaver) Date: 2005-09-08 21:35 Message: Logged In: YES user_id=127625 Note that the speed test script shows equivalent speedups for both 2.3 and 2.4, ranging from 90% (for the empty string) down to 73% (for a string with a single character). The more "normal" cases range from 82% to 89% speedups. ---------------------------------------------------------------------- Comment By: Tres Seaver (tseaver) Date: 2005-09-08 21:30 Message: Logged In: YES user_id=127625 I'm attaching a patch against 2.4's version ---------------------------------------------------------------------- Comment By: Jeff Epler (jepler) Date: 2005-09-08 20:01 Message: Logged In: YES user_id=2772 Tested on Python 2.4.0. The patch fails on the first chunk because the list of imports don't match. The urllib_fast_quote_speed_test.py doesn't run once urllib has been patched. I reverted the patch to urllib.py and re-ran. I got "faster" values from 0.758 to 0.964. ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=355470&aid=1285086&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com