Bugs item #1613130, was opened at 2006-12-11 14:03 Message generated for change (Comment added) made by pitrou You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470
Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Performance Group: Python 2.5 Status: Open Resolution: None Priority: 1 Private: No Submitted By: Antoine Pitrou (pitrou) Assigned to: Fredrik Lundh (effbot) Summary: str.split creates new string even if pattern not found Initial Comment: Hello, Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs: $ python Python 2.5 (r25:51908, Oct 6 2006, 15:22:41) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "abcde" * 2 >>> id(s) 3084139400L >>> id(str(s)) 3084139400L >>> id("" + s) 3084139400L >>> id(s.strip()) 3084139400L >>> id(s.replace("g", "h")) 3084139400L >>> [id(x) for x in s.partition("h")] [3084139400L, 3084271768L, 3084271768L] >>> [id(x) for x in s.split("h")] [3084139360L] ---------------------------------------------------------------------- >Comment By: Antoine Pitrou (pitrou) Date: 2007-04-12 11:19 Message: Logged In: YES user_id=133955 Originator: YES Hi, > Dropping the priority. This pay-off is near zero and likely not worth the > cost of making the code more complex than it already is. No problem! The more interesting question actually was whether it made any sense to factor out the split() implementation in "stringlib" so as to share the implementation between str and unicode. Also, as for the USE_FAST question you asked on python-dev, I may have an answer: if you try to enable USE_FAST you'll see that some operations are indeed faster on large strings (say 100s or 1000s of characters), but they become slower on small strings because of the larger overhead of the search algorithm. Thus USE_FAST could negatively impact Python programs which process a lot of small strings. ---------------------------------------------------------------------- Comment By: Raymond Hettinger (rhettinger) Date: 2007-04-11 19:09 Message: Logged In: YES user_id=80475 Originator: NO Dropping the priority. This pay-off is near zero and likely not worth the cost of making the code more complex than it already is. ---------------------------------------------------------------------- Comment By: Georg Brandl (gbrandl) Date: 2006-12-12 17:21 Message: Logged In: YES user_id=849994 Originator: NO Sounds like this is best assigned to Fredrik. ---------------------------------------------------------------------- Comment By: Antoine Pitrou (pitrou) Date: 2006-12-12 12:35 Message: Logged In: YES user_id=133955 Originator: YES Ok, I did a patch which partially adds the optimization (the patch is at home, I can't post it right now). I have a few questions though: - there is a USE_FAST flag which can bring some speedups when a multicharacter separator is used; however, it is not enabled by default, is there a reason for this? - where and by whom is maintained stringbench.py, so that I can propose additional tests for it (namely, tests for unmatched split())? - split() implementation is duplicated between str and unicode (the unicode versions having less optimizations), would it be useful to "stringlib'ify" split()? - rsplit() does quite similar things as split(), has anyone tried to factor similar parts? do you see any caveats doing so? ---------------------------------------------------------------------- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 _______________________________________________ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com