Bugs item #1613130, was opened at 2006-12-11 14:03 Message generated for change (Settings changed) made by pitrou You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Performance Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Antoine Pitrou (pitrou) Assigned to: Nobody/Anonymous (nobody) Summary: str.split creates new string even if pattern not found Initial Comment: Hello, Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs: $ python Python 2.5 (r25:51908, Oct 6 2006, 15:22:41) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "abcde" * 2 >>> id(s) 3084139400L >>> id(str(s)) 3084139400L >>> id("" + s) 3084139400L >>> id(s.strip()) 3084139400L >>> id(s.replace("g", "h")) 3084139400L >>> [id(x) for x in s.partition("h")] [3084139400L, 3084271768L, 3084271768L] >>> [id(x) for x in s.split("h")] [3084139360L] ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2006-12-12 12:35 Message: Logged In: YES user_id=133955 Originator: YES Ok, I did a patch which partially adds the optimization (the patch is at home, I can't post it right now). I have a few questions though: - there is a USE_FAST flag which can bring some speedups when a multicharacter separator is used; however, it is not enabled by default, is there a reason for this? - where and by whom is maintained stringbench.py, so that I can propose additional tests for it (namely, tests for unmatched split())? - split() implementation is duplicated between str and unicode (the unicode versions having less optimizations), would it be useful to "stringlib'ify" split()? - rsplit() does quite similar things as split(), has anyone tried to factor similar parts? do you see any caveats doing so? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com