[ python-Bugs-1613130 ] str.split creates new string even if pattern not found
Bugs item #1613130, was opened at 2006-12-11 14:03 Message generated for change (Comment added) made by pitrou You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Antoine Pitrou (pitrou) Assigned to: Nobody/Anonymous (nobody) Summary: str.split creates new string even if pattern not found Initial Comment: Hello, Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs: $ python Python 2.5 (r25:51908, Oct 6 2006, 15:22:41) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "abcde" * 2 >>> id(s) 3084139400L >>> id(str(s)) 3084139400L >>> id("" + s) 3084139400L >>> id(s.strip()) 3084139400L >>> id(s.replace("g", "h")) 3084139400L >>> [id(x) for x in s.partition("h")] [3084139400L, 3084271768L, 3084271768L] >>> [id(x) for x in s.split("h")] [3084139360L] -- >Comment By: Antoine Pitrou (pitrou) Date: 2006-12-12 12:35 Message: Logged In: YES user_id=133955 Originator: YES Ok, I did a patch which partially adds the optimization (the patch is at home, I can't post it right now). I have a few questions though: - there is a USE_FAST flag which can bring some speedups when a multicharacter separator is used; however, it is not enabled by default, is there a reason for this? - where and by whom is maintained stringbench.py, so that I can propose additional tests for it (namely, tests for unmatched split())? - split() implementation is duplicated between str and unicode (the unicode versions having less optimizations), would it be useful to "stringlib'ify" split()? - rsplit() does quite similar things as split(), has anyone tried to factor similar parts? do you see any caveats doing so? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1613130 ] str.split creates new string even if pattern not found
Bugs item #1613130, was opened at 2006-12-11 14:03 Message generated for change (Settings changed) made by pitrou You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. >Category: Performance Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Antoine Pitrou (pitrou) Assigned to: Nobody/Anonymous (nobody) Summary: str.split creates new string even if pattern not found Initial Comment: Hello, Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs: $ python Python 2.5 (r25:51908, Oct 6 2006, 15:22:41) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "abcde" * 2 >>> id(s) 3084139400L >>> id(str(s)) 3084139400L >>> id("" + s) 3084139400L >>> id(s.strip()) 3084139400L >>> id(s.replace("g", "h")) 3084139400L >>> [id(x) for x in s.partition("h")] [3084139400L, 3084271768L, 3084271768L] >>> [id(x) for x in s.split("h")] [3084139360L] -- Comment By: Antoine Pitrou (pitrou) Date: 2006-12-12 12:35 Message: Logged In: YES user_id=133955 Originator: YES Ok, I did a patch which partially adds the optimization (the patch is at home, I can't post it right now). I have a few questions though: - there is a USE_FAST flag which can bring some speedups when a multicharacter separator is used; however, it is not enabled by default, is there a reason for this? - where and by whom is maintained stringbench.py, so that I can propose additional tests for it (namely, tests for unmatched split())? - split() implementation is duplicated between str and unicode (the unicode versions having less optimizations), would it be useful to "stringlib'ify" split()? - rsplit() does quite similar things as split(), has anyone tried to factor similar parts? do you see any caveats doing so? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1613130 ] str.split creates new string even if pattern not found
Bugs item #1613130, was opened at 2006-12-11 13:03 Message generated for change (Comment added) made by gbrandl You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Performance Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Antoine Pitrou (pitrou) >Assigned to: Fredrik Lundh (effbot) Summary: str.split creates new string even if pattern not found Initial Comment: Hello, Several string methods avoid allocating a new string when the operation result is trivially the same as one of the parameters (e.g. replacing a non-existing substring). However, split() does not exhibit this optimization, it always constructs a new string even if no splitting occurs: $ python Python 2.5 (r25:51908, Oct 6 2006, 15:22:41) [GCC 4.1.2 20060928 (prerelease) (Ubuntu 4.1.1-13ubuntu4)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> s = "abcde" * 2 >>> id(s) 3084139400L >>> id(str(s)) 3084139400L >>> id("" + s) 3084139400L >>> id(s.strip()) 3084139400L >>> id(s.replace("g", "h")) 3084139400L >>> [id(x) for x in s.partition("h")] [3084139400L, 3084271768L, 3084271768L] >>> [id(x) for x in s.split("h")] [3084139360L] -- >Comment By: Georg Brandl (gbrandl) Date: 2006-12-12 16:21 Message: Logged In: YES user_id=849994 Originator: NO Sounds like this is best assigned to Fredrik. -- Comment By: Antoine Pitrou (pitrou) Date: 2006-12-12 11:35 Message: Logged In: YES user_id=133955 Originator: YES Ok, I did a patch which partially adds the optimization (the patch is at home, I can't post it right now). I have a few questions though: - there is a USE_FAST flag which can bring some speedups when a multicharacter separator is used; however, it is not enabled by default, is there a reason for this? - where and by whom is maintained stringbench.py, so that I can propose additional tests for it (namely, tests for unmatched split())? - split() implementation is duplicated between str and unicode (the unicode versions having less optimizations), would it be useful to "stringlib'ify" split()? - rsplit() does quite similar things as split(), has anyone tried to factor similar parts? do you see any caveats doing so? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613130&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1607951 ] mailbox.Maildir re-reads directory too often
Bugs item #1607951, was opened at 2006-12-03 12:28 Message generated for change (Settings changed) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1607951&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matthias Klose (doko) >Assigned to: A.M. Kuchling (akuchling) Summary: mailbox.Maildir re-reads directory too often Initial Comment: [forwarded from http://bugs.debian.org/401395] Various functions in mailbox.Maildir call self._refresh, which always re-reads the cur and new directories with os.listdir. _refresh should stat each of the two directories first to see if they changed. This cuts processing time of a series of lookups down by a factor of the number of messages in the folder, a potentially large number. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1607951&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1600152 ] mailbox: Maildir.get_folder does not inherit factory
Bugs item #1600152, was opened at 2006-11-20 22:33 Message generated for change (Settings changed) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1600152&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Tetsuya Takatsuru (taka2ru) >Assigned to: A.M. Kuchling (akuchling) Summary: mailbox: Maildir.get_folder does not inherit factory Initial Comment: mailbox.Maildir.get_folder does not inherit _factory. >>> import mailbox >>> >>> mbox = mailbox.Maildir('/home/xxx/Maildir', mailbox.MaildirMessage) >>> >>> subfolder = mbox.get_folder(mbox.list_folders()[0]) >>> >>> for key, mail in subfolder.iteritems(): >>> print mail.__class__ >>> break from this example, i got the following output: rfc822.Message 'mailbox.MaildirMessage' should be gotten instead. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1600152&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1599254 ] mailbox: other programs' messages can vanish without trace
Bugs item #1599254, was opened at 2006-11-19 11:03 Message generated for change (Settings changed) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Watson (baikie) >Assigned to: A.M. Kuchling (akuchling) Summary: mailbox: other programs' messages can vanish without trace Initial Comment: The mailbox classes based on _singlefileMailbox (mbox, MMDF, Babyl) implement the flush() method by writing the new mailbox contents into a temporary file which is then renamed over the original. Unfortunately, if another program tries to deliver messages while mailbox.py is working, and uses only fcntl() locking, it will have the old file open and be blocked waiting for the lock to become available. Once mailbox.py has replaced the old file and closed it, making the lock available, the other program will write its messages into the now-deleted "old" file, consigning them to oblivion. I've caused Postfix on Linux to lose mail this way (although I did have to turn off its use of dot-locking to do so). A possible fix is attached. Instead of new_file being renamed, its contents are copied back to the original file. If file.truncate() is available, the mailbox is then truncated to size. Otherwise, if truncation is required, it's truncated to zero length beforehand by reopening self._path with mode wb+. In the latter case, there's a check to see if the mailbox was replaced while we weren't looking, but there's still a race condition. Any alternative ideas? Incidentally, this fixes a problem whereby Postfix wouldn't deliver to the replacement file as it had the execute bit set. -- Comment By: David Watson (baikie) Date: 2006-11-19 15:44 Message: Logged In: YES user_id=1504904 Originator: YES This is a bug. The point is that the code is subverting the protection of its own fcntl locking. I should have pointed out that Postfix was still using fcntl locking, and that should have been sufficient. (In fact, it was due to its use of fcntl locking that it chose precisely the wrong moment to deliver mail.) Dot-locking does protect against this, but not every program uses it - which is precisely the reason that the code implements fcntl locking in the first place. -- Comment By: Martin v. Löwis (loewis) Date: 2006-11-19 15:02 Message: Logged In: YES user_id=21627 Originator: NO Mailbox locking was invented precisely to support this kind of operation. Why do you complain that things break if you deliberately turn off the mechanism preventing breakage? I fail to see a bug here. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1613573 ] xmlrpclib ServerProxy uses old httplib interface
Bugs item #1613573, was opened at 2006-12-11 18:17 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613573&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Matt Brown (mattgbrown) Assigned to: Nobody/Anonymous (nobody) Summary: xmlrpclib ServerProxy uses old httplib interface Initial Comment: The ServerProxy class of the xmlrpclib module uses the old style HTTP / HTTPS classes of the httplib module rather than the newer HTTPConnection and HTTPConnection classes. The practical result of this is that xmlrpc connections are not able to make use of HTTP/1.1 functionality. Please update the xmlrpclib module to use the newer API provided by httplib so that the advanced functionality of HTTP/1.1 is available. -- >Comment By: A.M. Kuchling (akuchling) Date: 2006-12-12 15:58 Message: Logged In: YES user_id=11375 Originator: NO Can you provide a patch to make this change? -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1613573&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1599254 ] mailbox: other programs' messages can vanish without trace
Bugs item #1599254, was opened at 2006-11-19 11:03 Message generated for change (Comment added) made by akuchling You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.5 Status: Open Resolution: None Priority: 5 Private: No Submitted By: David Watson (baikie) Assigned to: A.M. Kuchling (akuchling) Summary: mailbox: other programs' messages can vanish without trace Initial Comment: The mailbox classes based on _singlefileMailbox (mbox, MMDF, Babyl) implement the flush() method by writing the new mailbox contents into a temporary file which is then renamed over the original. Unfortunately, if another program tries to deliver messages while mailbox.py is working, and uses only fcntl() locking, it will have the old file open and be blocked waiting for the lock to become available. Once mailbox.py has replaced the old file and closed it, making the lock available, the other program will write its messages into the now-deleted "old" file, consigning them to oblivion. I've caused Postfix on Linux to lose mail this way (although I did have to turn off its use of dot-locking to do so). A possible fix is attached. Instead of new_file being renamed, its contents are copied back to the original file. If file.truncate() is available, the mailbox is then truncated to size. Otherwise, if truncation is required, it's truncated to zero length beforehand by reopening self._path with mode wb+. In the latter case, there's a check to see if the mailbox was replaced while we weren't looking, but there's still a race condition. Any alternative ideas? Incidentally, this fixes a problem whereby Postfix wouldn't deliver to the replacement file as it had the execute bit set. -- >Comment By: A.M. Kuchling (akuchling) Date: 2006-12-12 16:04 Message: Logged In: YES user_id=11375 Originator: NO I agree with David's analysis; this is in fact a bug. I'll try to look at the patch. -- Comment By: David Watson (baikie) Date: 2006-11-19 15:44 Message: Logged In: YES user_id=1504904 Originator: YES This is a bug. The point is that the code is subverting the protection of its own fcntl locking. I should have pointed out that Postfix was still using fcntl locking, and that should have been sufficient. (In fact, it was due to its use of fcntl locking that it chose precisely the wrong moment to deliver mail.) Dot-locking does protect against this, but not every program uses it - which is precisely the reason that the code implements fcntl locking in the first place. -- Comment By: Martin v. Löwis (loewis) Date: 2006-11-19 15:02 Message: Logged In: YES user_id=21627 Originator: NO Mailbox locking was invented precisely to support this kind of operation. Why do you complain that things break if you deliberately turn off the mechanism preventing breakage? I fail to see a bug here. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1599254&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1614387 ] AttributesImpl does not implement __contains__ on Linux
Bugs item #1614387, was opened at 2006-12-13 00:24 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614387&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: XML Group: Platform-specific Status: Open Resolution: None Priority: 5 Private: No Submitted By: Jason Briggs (jrbriggs) Assigned to: Nobody/Anonymous (nobody) Summary: AttributesImpl does not implement __contains__ on Linux Initial Comment: Hi there Had an odd error trying to run a utility called SVGMath: File "/home/jason/downloads/SVGMath-0.3.1/svgmath/mathconfig.py", line 54, in startElement elif u"afm" in attributes: File "/usr/lib/python2.5/site-packages/_xmlplus/sax/xmlreader.py", line 316, in __getitem__ return self._attrs[name] KeyError: 0 It appears that AttributesImpl in the sax package (xmlreader.py) doesn't implement __contains__, so the 'in' operator throws an error. This is on Kubuntu/Linux, so I'm not sure if it's distro-specific or all Linux versions of Python. In any case, if you add: def __contains__(self, name): return self._attrs.has_key(name) to AttributesImpl in xmlreader.py (as per the Windows version of Python), the problem goes away. Kind regards Jason -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614387&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1614429 ] dict throwing inaccurate KeyError on small tuple keys
Bugs item #1614429, was opened at 2006-12-13 02:48 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614429&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None Status: Open Resolution: None Priority: 5 Private: No Submitted By: toidinamai (toidinamai) Assigned to: Nobody/Anonymous (nobody) Summary: dict throwing inaccurate KeyError on small tuple keys Initial Comment: When using tuples of length one as keys for the builtin dictionary the Python runtime raises an inaccurate KeyError exception that makes some errors hard to find: Python 2.5 (r25:51908, Dec 12 2006, 18:39:30) [GCC 4.1.1 (Gentoo 4.1.1-r1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> {"foo":"bar"}[("foo",)] Traceback (most recent call last): File "", line 1, in KeyError: 'foo' Also the error messages for the empty tuple and None are indistinguishable: Python 2.5 (r25:51908, Dec 12 2006, 18:39:30) [GCC 4.1.1 (Gentoo 4.1.1-r1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> {None:"bar"}[()] Traceback (most recent call last): File "", line 1, in KeyError >>> {():"bar"}[None] Traceback (most recent call last): File "", line 1, in KeyError This also seems to be the case for earlier versions of Python. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614429&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1614429 ] dict throwing inaccurate KeyError on small tuple keys
Bugs item #1614429, was opened at 2006-12-12 20:48 Message generated for change (Comment added) made by rhettinger You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614429&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: None Group: None >Status: Closed >Resolution: Duplicate Priority: 5 Private: No Submitted By: toidinamai (toidinamai) Assigned to: Nobody/Anonymous (nobody) Summary: dict throwing inaccurate KeyError on small tuple keys Initial Comment: When using tuples of length one as keys for the builtin dictionary the Python runtime raises an inaccurate KeyError exception that makes some errors hard to find: Python 2.5 (r25:51908, Dec 12 2006, 18:39:30) [GCC 4.1.1 (Gentoo 4.1.1-r1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> {"foo":"bar"}[("foo",)] Traceback (most recent call last): File "", line 1, in KeyError: 'foo' Also the error messages for the empty tuple and None are indistinguishable: Python 2.5 (r25:51908, Dec 12 2006, 18:39:30) [GCC 4.1.1 (Gentoo 4.1.1-r1)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> {None:"bar"}[()] Traceback (most recent call last): File "", line 1, in KeyError >>> {():"bar"}[None] Traceback (most recent call last): File "", line 1, in KeyError This also seems to be the case for earlier versions of Python. -- >Comment By: Raymond Hettinger (rhettinger) Date: 2006-12-12 21:46 Message: Logged In: YES user_id=80475 Originator: NO This is a duplicate of bug #1576657 when Georg just fixed. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614429&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[ python-Bugs-1614460 ] python-logging compatability with Zope.
Bugs item #1614460, was opened at 2006-12-13 14:02 Message generated for change (Tracker Item Submitted) made by Item Submitter You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614460&group_id=5470 Please note that this message will contain a full copy of the comment thread, including the initial issue submission, for this request, not just the latest update. Category: Python Library Group: Python 2.4 Status: Open Resolution: None Priority: 5 Private: No Submitted By: Simon Hookway (shookway) Assigned to: Nobody/Anonymous (nobody) Summary: python-logging compatability with Zope. Initial Comment: I'm using Zope2.8.x and python2.4. On shutdown removing the handlers causes a KeyError because the new _handlersList is not correctly updated and thus has a now non-existant handler in it. This double list/dict thing is a little cumbersome. I'm not sure either why it's a dict but i have replaced it with an OrderedDict so that old 2.3 logging behaviour works without modification. See the included patch. -- You can respond by visiting: https://sourceforge.net/tracker/?func=detail&atid=105470&aid=1614460&group_id=5470 ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com