[issue7782] new test for test_iter.py
New submission from showell : I would like to submit the following test to be part of the test_iter.py test suite: def test_extends(self): # This test would break on an incomplete patch to listobject.c def gen(): for i in range(500): yield i lst = [0] * 500 for i in range(240): lst.pop(0) lst.extend(gen()) The history behind it is that I made a patch to listobject.c that obviously broke listextend(), but the tests did not catch it. This was my failing test to improve my patch. Regardless of what happens to the patch, I think it's a good idea to hammer on listextend() when it accepts an iterator, as it's a fairly tricky problem to extend a list when you do not know in advance how long it will be until the iterator gets exhausted. -- components: Tests messages: 98320 nosy: Steve Howell severity: normal status: open title: new test for test_iter.py type: behavior versions: Python 3.2 ___ Python tracker <http://bugs.python.org/issue7782> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
New submission from showell : I am attaching a patch to improve the performance of list operations that insert or delete elements at the front of the list. The patch has had some discussion on python-dev in the thread entitled "patch to make list.pop(0) work in O(1) time." So far the verdict is not to touch list internals to achieve this performance gain, but I am looking to improve the patch regardless and possibly include it in a PEP that documents the decision not to incorporate the patch, with the hope that it either prevents future duplication of effort or eventually gets accepted. At a bare minimum, the patch needs extensive code review, as it touches a lot of internals, and it needs stylistic cleanup. But it does pass all the tests, and it has been benchmarked to show 100X speed improvement on a small test case. -- components: Interpreter Core files: DIFF messages: 98324 nosy: Steve Howell severity: normal status: open title: patch for making list/insert at the top of the list avoid memmoves type: performance versions: Python 3.2 Added file: http://bugs.python.org/file16008/DIFF ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: --- On Tue, 1/26/10, Florent Xicluna wrote: > From: Florent Xicluna > Subject: [issue7784] patch for making list/insert at the top of the list > avoid memmoves > To: showel...@yahoo.com > Date: Tuesday, January 26, 2010, 3:53 AM > > Florent Xicluna > added the comment: > > Steve, thank you for your patch. > > IMHO, it's better to provide a patch against trunk (Py2) > when the optimization is not specific to 3.x. > > I merged it manually on my working copy, and all tests pass > without leaking. Patch for Py2 attached. > > However, I am not an expert. I just cleaned the tab/space > mix to conform with what is used in the listobject.c file. > Thank you! My only rationale for applying the patch to 3.x was that it would minimize backward compatibility concerns, but I do not have a strong opinion either way. You probably noticed a few stylistic mistakes, such as these lines: a->orphans += (-1 * d); a->ob_item += (-1 * d); All feedback on ways to improve this code would be greatly welcome. Thanks again. -- ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: The next stage for the patch is that I need to switch from using orphans count to using ob_allocated pointer. -- ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: I will attempt to fix insert(0, pop) in the next patch I submit (using pointer vs. orphans count). Just so we are clear on the performance that I'm promising, I expect benchmarks to prove out these facts: 1) New-list will always perform comparably to old-list, particularly for access. 2) New-list will outperform old-list for pre-pop. 3) New-list will perform comparably to deque for pre-pop. 4) Although not a super common use case, prepends that follow prepops should reclaim space and perform better than old list and comparably to deque. 5) For prepends that exceed prepops, I expect deque to greatly outperform new lists, just as it did old lists. As I explained on python-dev, the new patch should give lists the same performance profile as a pencil/paper todo list. Prepop works in O(1) with occasional compacting. Prepend generally works in O(N) unless you can reclaim space from prior prepops. To measure memory wastage, the two worst case scenarios are a bunch of tiny lists or a list from you which you prepop just under 50% of the elements (although that 50% can be tuned later if that's the only showstopper). You can certainly construct a benchmark that brings those negative aspects to light, but I think it would be more convincing to find an actual real-world program that performs worse or exhausts memory under the limitation. In general, it would be nice to find a Python program that makes extensive use of lists to prove out that the new list implementation at least does not degrade performance. To the extent that test suite all passes, we at least have some empirical evidence that "correctness" has not been jeopardized, but the more we can hammer on this, the better, of course. Does anybody have a suggestion for a real-world list benchmark program? -- ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: I am attaching a new patch that does not add a new element to PyListObject, roughly following a technique that Antoine Pitrou suggested on python-dev. When I want to lazily avoid a memmove under the new patch, I set the MSB on allocated and store the original ob_item pointer in the new ob_item[-1]. On the advice of Daniel, I ran the new patch against the Unladen benchmark suite. The results were pretty neutral--never more than a 1% penalty but no significant gains either. I did not expect to see gains, for the obvious reason that I am improving performance on an operation that folks have been encouraged to work around. The new patch continues to do well on microbenchmarks. The new patch fails one test in test_sys related to the 12 byte garbage collection header. The failure is definitely introduced by my patch, but I am not sure what it's doing wrong. All other tests pass. Because the new code piggybacks on top of of allocated instead of creating a new variable in PyListObject, the new code is a bit more complex than the original patch, which is unfortunate. There are probably some opportunities for making the new code simpler. -- Added file: http://bugs.python.org/file16033/DIFF_NO_EXTRA_MEM ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: Ok, found the offending line, now all tests pass. The use case where this patch will pay off the most is slicing your way through a list of tasks. The toy program below gets about a 50x speedup. import time n = 80 lst = [] for i in range(n): lst.append(i) t = time.time() for i in range(n): x = lst[:10] del lst[:10] print('time = ' + str(time.time() - t)) print(len(lst)) -- Added file: http://bugs.python.org/file16034/no_mem_penalty.diff ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
Changes by showell : Removed file: http://bugs.python.org/file16033/list_top_no_extra_mem.diff ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7782] new test for test_iter.py
showell added the comment: Per Ezio's suggestions, I added clearer comments and an assert, and now the attached diff applies to trunk. -- keywords: +patch Added file: http://bugs.python.org/file16037/extend_test.diff ___ Python tracker <http://bugs.python.org/issue7782> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7784] patch for making list/insert at the top of the list avoid memmoves
showell added the comment: I am closing this due to mostly unanimous rejection on python-dev. If folks reopen this in the future, the last patch that I submitted has been reasonably well tested, but it has not been code reviewed. The 1% speed penalty could probably be driven down without too much effort, but not totally eliminated. The constraint not to add a member to PyListObject complicates the code considerably. In diving deep into listobject.c, I noticed a couple things unrelated to my patch: 1) There might be a refactoring available for calls to list_resize, where callers just pass in the delta, instead of the new list size. Lots of the callers do addition that could be pushed into list_resize. Not sure it would lead to speed ups, but it would probably reduce code size. 2) The optimistic realloc scheme is probably needlessly wasteful for really large lists. If you have a million elements, I am not sure you need to allocate 12% extra space, and I think that's what the current algorithm does (untested). 3) There might be some merit in splitting out list_resize into list_resize_bigger and list_resize_smaller. I think the callers generally know when they are shrinking/expanding the list, so callers that are shrinking the list could call a leaner method that just shrinks the list without having to execute other instructions. -- status: open -> closed ___ Python tracker <http://bugs.python.org/issue7784> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7770] sin/cos function in decimal-docs
showell added the comment: +1 on showing off remainder_near. I recently wrote a program where I reinvented the logic (on a unit circle too), not knowing it was already implemented in Decimal. -- nosy: +Steve Howell ___ Python tracker <http://bugs.python.org/issue7770> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue7782] new test for test_iter.py
showell added the comment: My proposed test is final. Please either accept or reject it as is. I assume it runs pretty quickly, so I am not sure what the cost fear is. The benefit of accepting the test is that it would potentially catch bugs on changes to the implementation of list. -- ___ Python tracker <http://bugs.python.org/issue7782> ___ ___ Python-bugs-list mailing list Unsubscribe: http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com