[issue46302] IndexError inside list comprehension + workaround
Stefan Pochmann added the comment: The error occurs when you do code.strip()[0] when code is " ", not "u2". -- nosy: +Stefan Pochmann ___ Python tracker <https://bugs.python.org/issue46302> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue46747] bisect.bisect/insort don't document key parameter
New submission from Stefan Pochmann : The signatures for the versions without "_right" suffix are missing the key parameter: bisect.bisect_right(a, x, lo=0, hi=len(a), *, key=None) bisect.bisect(a, x, lo=0, hi=len(a))¶ bisect.insort_right(a, x, lo=0, hi=len(a), *, key=None) bisect.insort(a, x, lo=0, hi=len(a))¶ https://docs.python.org/3/library/bisect.html#bisect.bisect_right https://docs.python.org/3/library/bisect.html#bisect.insort_right -- assignee: docs@python components: Documentation messages: 413213 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: bisect.bisect/insort don't document key parameter versions: Python 3.10, Python 3.11 ___ Python tracker <https://bugs.python.org/issue46747> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45346] Some "truthy" crept in
New submission from Stefan Pochmann : Python consistently says true/false, not truthy/falsy. But a few "truthy" have crept in now: https://github.com/python/cpython/search?q=truthy - Two in Doc/reference/compound_stmts.rst - One in Doc/library/ast.rst - One in Lib/test/test_builtin.py Evidence for consistent true/false usage, if needed: https://docs.python.org/3/library/stdtypes.html#truth-value-testing https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not https://docs.python.org/3/reference/compound_stmts.html#the-if-statement https://docs.python.org/3/reference/compound_stmts.html#the-while-statement https://docs.python.org/3/reference/expressions.html#boolean-operations -- assignee: docs@python components: Documentation messages: 403056 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: Some "truthy" crept in versions: Python 3.10, Python 3.11 ___ Python tracker <https://bugs.python.org/issue45346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45346] Some "truthy" crept in
Change by Stefan Pochmann : -- type: -> enhancement ___ Python tracker <https://bugs.python.org/issue45346> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: That's how Elliot did it in the original proposal: https://bugs.python.org/issue28685 Back then you pointed out that "Else we can assume u[0] == v[0]" isn't true for float('nan') values: https://github.com/python/cpython/pull/582#issuecomment-285838656 https://github.com/python/cpython/pull/582#issuecomment-285841789 If you sort objects that always return True for both `<` and `==`, this would also have the opposite problem, considering tuple u smaller than v when it shouldn't. That said, maybe the function could be optimized for known "well-behaving" types? -- nosy: +Stefan Pochmann ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: > What justifies "shouldn't"? I based that on your old comments. Looked like tuple comparison results in sorting shouldn't differ from tuple comparison results elsewhere. I had actually proposed this exact method in a comment under your stackoverflow answer, but deleted it when I found that it had been discussed and discarded back then. But if it feels ok now, then good :-) -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: I misinterpreted you then, sorry. I guess also because Elliot shortly after retrated from the approach, saying he "rewrote unsafe_tuple_compare to move the less-than after the equality testing, to make sure it's 100% consistent". I'd say the inconsistency the program showed was different. Yes, the xs got sorted differently than the ys, but the xs differed from the ys. The proposed method would change comparisons of *the same* tuples. You'd for example no longer have "sorted(tuples) == [min(tuples), max(tuples)]" for a list of two tuples. Also curious factoid: I recently saw a case where despite the "only <" promise, sorting caused an "x > y" comparison (because "y < x" wasn't implemented). (tiny note: tupsort.py does "range(2", should be "range(1" I think) -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: Ok, I agree, the change only possibly "breaks" expectations that were already broken (including my naive sorted-vs-minmax). Yes, I know sort itself only asks `<` (I've actually read most of its code and all its documentation). That's why I was irritated for a moment when I saw a `>`. -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: Hmm. What do you think about using "<" inside the loop for the remaining tuple elements as well? It's even less likely that both the first and the second elements are equal. When the second elements differ, this saves 0.5 PyObject_RichCompareBool (average 1.5 "<" instead of one "==" and one "<"). When the second elements are equal, this costs 1 PyObject_RichCompareBool more (two "<" instead of one "=="). That's fewer PyObject_RichCompareBool calls if they're equal less than 1/3 of the time. -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: I see you mentioned that PyObject_RichCompareBool(..., Py_EQ) might be faster for example because it checks identity. Why not do an identity check before the ms->tuple_elem_compare calls? Too expensive and rarely successful? > Extending the idea to positions beyond the first is dubious on those grounds: > if the first tuple positions in fact often match, then the optimization has > already backfired. Time to admit defeat then, not double down on failed > arrogance ;-) I don't see that. First and second positions are quite different. For example I sorted a million tuples where both first and second element are randomly chosen from a population of 10,000. So their amounts of duplication were the same. But these are the statistics from sorting: - First position: 18,603,981 equality comparisons, 29.87% equal - Second position: 5,556,203 equality comparisons, 0.09% equal Many first positions matched, almost no second positions matched. One more idea: What do you think of sorting lists of tuples (or tuple keys) in two stages? 1) Sort the list only by first position (comparing with only one tuple_elem_compare). 2) Identify equal-first-position-runs (with tuple_elem_compare) and sort each run independently (comparing only positions 1+). Some experiments I did with this looked good, not sure if too off-topic to post here... -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: > It's not surprising the 2nd positions are rarely equal _given_ that the > universe has been reduced to the 100 tuples with the same first element. Yes, exactly. > In any case, I don't see the point to this exercise ;-) Just to illustrate why I disagree with what I responded to there. I'm not convinced that many matches at the first tuple position are really an indication that there'll be many matches at the second position as well, so that one should "admit defeat" and that it's "arrogance" to hope for only few matches at the second position. I'd wager it's rather typical that the matching rate at the second position is significantly lower. If you sort people by last and then first name, I think you'd get the same effect as with my above random example, for the same reason. If you sort a set of 2D coordinates, you'll even *never* get matches at the second position. Even with that SO question's data and its massive duplication (especially at the second position) *and* its correlation between first and second position (since both "sorted(x)" and "x[::2]" are derived from the same x) and its 43% matching rate at the first position, there's still only 27% matching rate at the second position (lower tha n the 1/3 break-even point). > BTW, don't overlook that tuple_elem_compare can _only_ be used on the very > first elements of the tuples. Yes, that's why I only talked about PyObject_RichCompareBool there. > > 1) Sort the list only by first position (comparing with only > >one tuple_elem_compare). > Not sure what that means, exactly. (the "with only one tuple_elem_compare" > part; Just wanted to make clear it wouldn't used two tuple_elem_compare or a PyObject_RichCompareBool. > > 2) Identify equal-first-position-runs (with tuple_elem_compare) > > and sort each run independently (comparing only positions 1+). > What are you aiming at? There was no motivation given here. Much fewer comparisons, especially PyObject_RichCompareBool comparisons. For example for sorting a million pairs with first/second element chosen from a population of 100,000, I get these numbers of comparisons (per element): current groupsort -- == first 18.60 none(PyObject_RichCompareBool) < first 16.19 19.60(tuple_elem_compare) == second 2.41 2.36(PyObject_RichCompareBool) < second 2.41 2.36(PyObject_RichCompareBool) -- total slow 23.42 4.72(PyObject_RichCompareBool) total fast 16.19 19.60(tuple_elem_compare) -- total39.61 24.32 (With "current" I mean before your optimizations, which I can't test yet.) > Will, this issue report is about unsafe_tuple_compare(), so, ya, if the > changes you envision aren't limited to that, I'd say you want to open a > different issue report and a different PR :-) Ok, I might try that. -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: Yes, I'm more familiar with the issue in the context of strings or lists. Your example of strings like "'x' * 10_000 + str(i)" looks like something I almost certainly used before as counterexample to someone's time complexity claim :-) I the context of multi-criteria sort I might not have thought of it before, I guess because you usually don't have many criteria. But now this made me think we can take even more advantage of the existing tuple_elem_compare. I the context of multi-criteria sort I might not have thought of it before, I guess because you usually don't have many criteria. But now the optimized tuple_elem_compare makes me think we can take even more advantage of it. I think those type-specific optimized comparison functions are what I left out when I previously read the sort, that's why it was new to me. I just saw you also use powersort's merge strategy now. Kinda sad, I had seen you lament that your own strategy "remains hard to grasp why it's always correct now", while I think it's rather straightforward and had considered adding a little explanation in the doc. Bucket sort is a name I considered, but I wrote two, so named them after their implementation (groupsort first used itertools.groupby). -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45530] Improve listobject.c's unsafe_tuple_compare()
Stefan Pochmann added the comment: > This is already faster in pure Python than list.sort() for cases like: Seems to depend on the system, it's slower on my laptop but faster on GCE: Python 3.10.0 on my laptop: 7.42 s lexisort 6.83 s sort 5.07 s groupsort Python 3.9.2 on Google Compute Engine: 2.09 s lexisort 2.64 s list.sort 1.52 s groupsort This is my groupsort btw: def groupsort(xs): xs.sort(key=itemgetter(0)) start = 0 n = len(xs) tail = itemgetter(slice(1, None)) for i in range(1, n+1): if i == n or xs[i-1][0] < xs[i][0]: sublist = xs[start:i] sublist.sort(key=tail) xs[start:i] = sublist start = i -- ___ Python tracker <https://bugs.python.org/issue45530> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45752] copy module doc wrongly says it doesn't copy arrays
New submission from Stefan Pochmann : The doc https://docs.python.org/3/library/copy.html says: "This module does not copy types like module, method, stack trace, stack frame, file, socket, window, array, or any similar types." But it does copy arrays just fine: import copy, array a = array.array('i', [1, 2]) b = copy.copy(a) a[0] = 3 print(a) print(b) Output: array('i', [3, 2]) array('i', [1, 2]) -- assignee: docs@python components: Documentation messages: 405962 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: copy module doc wrongly says it doesn't copy arrays versions: Python 3.10, Python 3.11, Python 3.6, Python 3.7, Python 3.8, Python 3.9 ___ Python tracker <https://bugs.python.org/issue45752> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45752] copy module doc wrongly says it doesn't copy arrays
Stefan Pochmann added the comment: Just saw that it's in copy.py's docstring as well: "This version does not copy types like module, class, function, method, nor stack trace, stack frame, nor file, socket, window, nor array, nor any similar types." https://github.com/python/cpython/blob/3.10/Lib/copy.py#L41-L43 -- ___ Python tracker <https://bugs.python.org/issue45752> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37295] Possible optimizations for math.comb()
Stefan Pochmann added the comment: I wrote a Python solution ("mycomb") that computes comb(100_000, 50_000) faster, maybe of interest: 1510.4 ms math.comb(n, k) 460.8 ms factorial(n) // (factorial(k) * factorial(n-k)) 27.5 ms mycomb(n, k) 6.7 ms *estimation* for mycomb if written in C The idea: 13 * 12 * 11 * 10 * 9 * 8 comb(13, 6) = - = 13 * 1 * 11 * 1 * 3 * 4 1 * 2 * 3 * 4 * 5 * 6 It lists the numerator factors, then divides the denominator factors out of them (using primes), then just multiplies. Preparing the factors for the final multiplication took most of the time, about 23.1 ms. That part only needs numbers <= n, so it could be done with C ints and be much faster. If it's ten times faster, then mycomb in C would take 23.1/10 + (27.5-23.1) = 6.71 ms. See the comb_with_primes.py file. ------ nosy: +Stefan Pochmann Added file: https://bugs.python.org/file50439/comb_with_primes.py ___ Python tracker <https://bugs.python.org/issue37295> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37295] Possible optimizations for math.comb()
Stefan Pochmann added the comment: And for Raymond's case 4), about running very long and not responding to SIGINT, with n=1_000_000 and k=500_000: 150.91 seconds math.comb(n, k) 39.11 seconds factorial(n) // (factorial(k) * factorial(n-k)) 0.40 seconds mycomb(n, k) 0.14 seconds *estimation* for mycomb if written in C And for n=10_000_000 and k=5_000_000: ~4 hours*estimation* for math.comb(n, k) ~1 hour *estimation* for factorials solution 8.3 seconds mycomb(n, k) 4.5 seconds *estimation* for mycomb if written in C -- ___ Python tracker <https://bugs.python.org/issue37295> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue37295] Possible optimizations for math.comb()
Stefan Pochmann added the comment: Turns out for n=100_000, k=50_000, about 87% of my factors are 1, so they don't even need to be turned into Python ints for multiplication, improving the multiplication part to 3.05 ms. And a C++ version to produce the factors took 0.85 ms. Updated estimation: 1510.4 ms math.comb(n, k) 460.8 ms factorial(n) // (factorial(k) * factorial(n-k)) 3.9 ms *estimation* for mycomb if written in C -- ___ Python tracker <https://bugs.python.org/issue37295> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)
New submission from Stefan Pochmann : The current implementation is: def multimode(data): counts = Counter(iter(data)).most_common() maxcount, mode_items = next(groupby(counts, key=itemgetter(1)), (0, [])) return list(map(itemgetter(0), mode_items)) The most_common() does a complete sort of Counter item tuples, taking O(n log n) time and quite big O(n) extra space (mostly for all those tuples). When Raymond Hettinger suggested it in https://bugs.python.org/issue35892#msg336338 he said it should have "running speed that is optimal for the desired result". But then he detailed that with "Slow O(n log n), loads all data in memory, full sort". Which seems like a mistake, as that's not optimal. It's easy to do in O(n) time and O(1) extra memory (in addition to the Counter and result, I mean): def multimode(data): counts = Counter(iter(data)) if not counts: return [] maxcount = max(counts.values()) return [value for value, count in counts.items() if count == maxcount] If there are only very few *different* values then the time/space after creating the Counter is insignificant compared to the Counter creation. But if there are many different values, it can be significant. statistics.mode takes O(n) time and O(1) space, which is optimal, but I found an apparently faster way anyway (code at end). For example for data = random.choices(range(n), k=n): | multimode | mode n | current proposal | current proposal ---+---+-- 1 |131%70%|125% 58% 10 |144%73%|119% 53% 100 |126%71%|108% 29% 1,000 |123%65%| 62% 22% 10,000 |172%55%| 53% 18% 100,000 |164%44%| 55% 20% 1,000,000 | 85%20%| 22%4% 10,000,000 | 56%12%| 11%4% All four start with Counter(iter(data)), so I took that as baseline and the above results show relative additional times. For example 55% means if Counter construction alone took 10 seconds, the function took 15.5 seconds. An extreme case, data = list(range(n)): | multimode| mode n | current proposal | current proposal ---+--+- 1 | 128% 67% | 124% 56% 10 | 187% 93% | 141% 52% 100 | 316% 149% | 181% 45% 1,000 | 380% 174% | 213% 46% 10,000 | 349% 111% | 146% 30% 100,000 | 397% 128% | 159% 34% 1,000,000 | 336% 95% | 112% 24% 10,000,000 | 349% 97% | 109% 23% I also tried a bunch of other cases, didn't find one where my versions weren't quite a bit faster. My mode() version: from operator import indexOf from itertools import islice def mode(data): counts = Counter(iter(data)) if not counts: raise StatisticsError('no mode for empty data') from None maxcount = max(counts.values()) index = indexOf(counts.values(), maxcount) return next(islice(counts, index, None)) -- components: Library (Lib) messages: 406638 nosy: Stefan Pochmann priority: normal severity: normal status: open title: statistics.multimode is inefficient (time and space) (mode somewhat, too) type: performance versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue45851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)
Stefan Pochmann added the comment: (somehow the benchmark script didn't get attached, trying again) -- Added file: https://bugs.python.org/file50453/multimode_mode.py ___ Python tracker <https://bugs.python.org/issue45851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45852] statistics.mode test doesn't test what it claims to
New submission from Stefan Pochmann : This test: def test_counter_data(self): # Test that a Counter is treated like any other iterable. data = collections.Counter([1, 1, 1, 2]) # Since the keys of the counter are treated as data points, not the # counts, this should return the first mode encountered, 1 self.assertEqual(self.func(data), 1) If the mode() code *were* wrong this way (used Counter(data) instead of Counter(iter(data))), then the test wouldn't detect it, as mode() would still return 1. The test data should be [1, 2, 2, 2] instead, in which case such wrong mode() would return 2. It used to be correct but wasn't adjusted correctly when mode() switched from raising an error for multiple modes to returning the first. The old code was: def test_counter_data(self): # Test that a Counter is treated like any other iterable. data = collections.Counter([1, 1, 1, 2]) # Since the keys of the counter are treated as data points, not the # counts, this should raise. self.assertRaises(statistics.StatisticsError, self.func, data) -- components: Tests messages: 406642 nosy: Stefan Pochmann priority: normal severity: normal status: open title: statistics.mode test doesn't test what it claims to versions: Python 3.10 ___ Python tracker <https://bugs.python.org/issue45852> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue45851] statistics.multimode is inefficient (time and space) (mode somewhat, too)
Stefan Pochmann added the comment: Ok, thanks, had somewhat expected that, as the multimode proposal was rather clearly better but the mode proposal not so much. -- ___ Python tracker <https://bugs.python.org/issue45851> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39521] reversed(mylist) much slower on Python 3.8.1 32-bit for Windows
New submission from Stefan Pochmann : Somehow `reversed` became much slower than `iter` on lists: List with 1,000 elements: > python -m timeit -s "a = list(range(1000))" "list(iter(a))" 5 loops, best of 5: 5.73 usec per loop > python -m timeit -s "a = list(range(1000))" "list(reversed(a))" 2 loops, best of 5: 14.2 usec per loop List with 1,000,000 elements: > python -m timeit -s "a = list(range(250)) * 4000" "list(iter(a))" 50 loops, best of 5: 7.08 msec per loop > python -m timeit -s "a = list(range(250)) * 4000" "list(reversed(a))" 20 loops, best of 5: 15.5 msec per loop On another machine I tried ten different Python versions and found out it's only version 3.8.1 and only the 32-bit version: 32-bit 64-bit CPython iter reversed iter reversed 3.5.4 19.8 19.9 22.4 22.7 3.6.8 19.8 19.9 22.3 22.6 3.7.6 19.9 19.9 22.3 22.5 3.8.1 19.8 24.9 22.4 22.6 Another time with 3.8.0 instead of 3.8.1: 32-bit 64-bit CPython iter reversed iter reversed 3.5.4 19.5 19.6 21.9 22.2 3.6.8 19.5 19.7 21.8 22.1 3.7.6 19.5 19.6 21.7 22.0 3.8.0 19.4 24.5 21.7 22.1 I used the "Stable Releases" "executable installer"s from here: https://www.python.org/downloads/windows/ More details here: https://stackoverflow.com/q/60005302/12671057 -- components: Build messages: 361195 nosy: Stefan Pochmann priority: normal severity: normal status: open title: reversed(mylist) much slower on Python 3.8.1 32-bit for Windows type: performance versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue39521> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39521] reversed(mylist) much slower on Python 3.8 32-bit for Windows
Change by Stefan Pochmann : -- title: reversed(mylist) much slower on Python 3.8.1 32-bit for Windows -> reversed(mylist) much slower on Python 3.8 32-bit for Windows ___ Python tracker <https://bugs.python.org/issue39521> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39521] reversed(mylist) much slower on Python 3.8 32-bit for Windows
Stefan Pochmann added the comment: Oh right. The times are correct, I just summarized wrongly there. -- ___ Python tracker <https://bugs.python.org/issue39521> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow due to manual memmove
New submission from Stefan Pochmann : Using a list's insert function is much slower than using slice assignment: > python -m timeit -n 10 -s "a=[]" "a.insert(0,0)" 10 loops, best of 5: 19.2 usec per loop > python -m timeit -n 10 -s "a=[]" "a[0:0]=[0]" 10 loops, best of 5: 6.78 usec per loop (Note that the list starts empty but grows to 100,000 elements.) At first I thought maybe it's the attribute lookup or function call overhead or so, but inserting near the end shows that that's negligible: > python -m timeit -n 10 -s "a=[]" "a.insert(-1,0)" 10 loops, best of 5: 79.1 nsec per loop I asked at StackOverflow and someone pointed out that list.insert uses a manual loop instead of memmove: https://stackoverflow.com/a/60466572/12671057 -- components: Interpreter Core messages: 362986 nosy: Stefan Pochmann priority: normal severity: normal status: open title: list.insert is slow due to manual memmove type: performance versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Change by Stefan Pochmann : -- title: list.insert is slow due to manual memmove -> list.insert is slow, likely due to manual memmove ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: I believe it also affects bisect.insort, which I occasionally use when I need a "self-sorting list" (I can't easily test it, as I'm not set up to modify the C version of bisect.insort). And also the popular sortedcontainers package, which relies on such list operations to be fast: http://www.grantjenks.com/docs/sortedcontainers/ -- ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: Benchmarking with two *Python* versions of bisect.insort, the "insert" version takes 1.08 seconds to insort the shuffled range(10**5) while the slice assignment version only takes 0.46 seconds: from timeit import timeit import random from bisect import bisect_right def insort1(a, x): lo = bisect_right(a, x) a.insert(lo, x) def insort2(a, x): lo = bisect_right(a, x) a[lo:lo] = [x] for _ in range(3): a = list(range(10**5)) random.shuffle(a) for insort in insort1, insort2: it = iter(a) s = [] print(timeit(lambda: insort(s, next(it)), number=len(a))) print() -- ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: Good point, Tim. Although binarysort really moves very few slots (I think at most 62, and average like 13). That might not be representative for how people use list.insert, either. I think I mainly use it for the mentioned case of a "self-sorting list", either naively via bisect.insort with a single list or via sortedcontainers' SortedList using multiple lists (used it only once so far). If I'm not mistaken, SortedList has a default load factor of 1000 and splits at double that size. I might do more reasonable benchmarks later (haven't read Raymond's reply yet). In any case, binarysort does its own inserting, so at least it wouldn't be affected if list.insert were changed. -- ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: I have better benchmarks now and am trying some more, though I guess we roughly agree about when memmove is faster and when it's slower but that the real question is how large the common case is. Do you know where I can find those past investigations? Was memmove *faster* for sizes *over* 16? What is the common case, i.e., how do people use list.insert? -- status: pending -> open ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Change by Stefan Pochmann : -- status: open -> pending ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: I think I have a decent way to isolate the memmove vs for-loop times, in Python code. Here are example results, five rounds of inserting with list.insert or with slice assignment. The times for each of the two ways are pretty stable: insertslice 0.024221 0.006754 0.024157 0.006710 0.024015 0.006734 0.024206 0.006731 0.024123 0.006769 For each run, I build a list of length 2**20 and append one value. Then I can insert 2**17 more values without the list internally resizing. Then I measure inserting at index -1 and consider that the baseline, as it includes all the overhead costs like function calls and other differences between the two insertion methods. Then I measure inserting at index -500 and subtract the baseline time. This difference should reflect how long the memmove or the for-loop took, as that's the difference between inserting at -1 and at -500. It's what I showed in those results above. I chose -500 here because the use case I have in mind is sortedcontainers.SortedList, which I think mostly involves lists of length 1000 or more (up to 2000), and insertions somewhere in the middle. Results for index -50 instead: insertslice 0.002479 0.002217 0.002546 0.002113 0.002566 0.002007 0.002425 0.002081 0.002555 0.002261 I'm btw running Python 3.8.1 32-bit on Windows 10 64-bit on an Intel i5-7200U. Rest of this message is the benchmark code: from timeit import repeat statements = ( 'a.insert(-1,0)', 'a.insert(-50,0)', 'a[-1:0] = 0,', 'a[-50:0] = 0,', ) # Verify that the statements work and don't cause internal list resizes. for stmt in statements: a = [0] * 2**20 a.append(0) size_before = a.__sizeof__() stmt = compile(stmt, '', 'single') for _ in range(2**17): exec(stmt) assert len(a) == 2**20 + 1 + 2**17 assert a.__sizeof__() == size_before # Run the benchmark. print(' insertslice') for _ in range(5): times = [] for stmt in statements: t = min(repeat(stmt, 'a=[0]*2**20; a.append(0)', number=2**17, repeat=20)) times.append(t) print('%10.6f' * 2 % (times[1] - times[0], times[3] - times[2])) -- status: pending -> open ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue39801] list.insert is slow, likely due to manual memmove
Stefan Pochmann added the comment: I misspoke. I do the insertions at -1 and -500 not on the same list but each on a fresh list (of length 2**20+1). -- ___ Python tracker <https://bugs.python.org/issue39801> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31071] Bad error message about maps not iterable
New submission from Stefan Pochmann: Python 3.6 makes it sound like maps aren't iterable: >>> map(str, *map(int, [[]])) Traceback (most recent call last): File "", line 1, in map(str, *map(int, [[]])) TypeError: type object argument after * must be an iterable, not map More, including a likely explanation, in my question and its answer here: https://stackoverflow.com/q/45363330/1672429 Apparently the TypeError from int([]) gets mistaken for a TypeError indicating non-iterability of the map object. -- messages: 299402 nosy: Stefan Pochmann priority: normal severity: normal status: open title: Bad error message about maps not iterable type: behavior versions: Python 3.6 ___ Python tracker <http://bugs.python.org/issue31071> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue31082] reduce takes iterable, not just sequence
New submission from Stefan Pochmann: functools.reduce has a parameter called "iterable" and it only needs to be an iterable, not a sequence. The paragraph documenting it says "sequence" instead of "iterable" six times: https://docs.python.org/3/library/functools.html#functools.reduce The help shown by executing "help(functools.reduce)" in Python additionally actually names the parameter "sequence" in the signature. -- assignee: docs@python components: Documentation messages: 299520 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: reduce takes iterable, not just sequence type: enhancement versions: Python 3.6, Python 3.7 ___ Python tracker <http://bugs.python.org/issue31082> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29709] Short-circuiting not only on False and True
New submission from Stefan Pochmann: The notes at https://docs.python.org/3/library/stdtypes.html#boolean-operations-and-or-not say that `or` "only evaluates the second argument if the first one is False" and that `and` "only evaluates the second argument if the first one is True". Should say "false" and "true" instead of "False" and "True". -- assignee: docs@python components: Documentation messages: 21 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: Short-circuiting not only on False and True type: enhancement versions: Python 2.7, Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <http://bugs.python.org/issue29709> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32767] Mutating a list while iterating: clarify the docs
Stefan Pochmann added the comment: Yes, there's also `array.array`. -- nosy: +Stefan Pochmann ___ Python tracker <https://bugs.python.org/issue32767> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32767] Mutating a list while iterating: clarify the docs
Stefan Pochmann added the comment: And `bytearray`. -- ___ Python tracker <https://bugs.python.org/issue32767> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue32341] itertools "generators"?
New submission from Stefan Pochmann : The itertools page https://docs.python.org/3.6/library/itertools.html says "Combinatoric generators:" above the table with product(), permutations(), etc. But as far as I understand, they're not generators and they don't produce generator iterators. I think the table title should be "Combinatoric iterators:" instead, in line with "Infinite Iterators:" and "Iterators terminating on the shortest input sequence:" of the two tables above it. Reference: https://docs.python.org/3/glossary.html#term-generator -- assignee: docs@python components: Documentation messages: 308425 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: itertools "generators"? versions: Python 2.7, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <https://bugs.python.org/issue32341> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29580] "Built-in Functions" not being functions
New submission from Stefan Pochmann: About https://docs.python.org/3/library/functions.html: The title "Built-in Functions", the table header "Built-in Functions" and the "functions" in the URL all suggest that what's on this page are functions. But many things on that page don't appear to be functions, like "range" or "dict". They appear to be callable types instead. For example "range": >>> type(range) >>> import types >>> isinstance(range, types.FunctionType) False >>> isinstance(range, types.BuiltinFunctionType) False Compare with "abs": >>> type(abs) >>> isinstance(abs, types.FunctionType) False >>> isinstance(abs, types.BuiltinFunctionType) True The text between title and table talks about "functions and types", but the title/tableheader/URL disagree. Also, the table is wrong in saying that "abs()" etc are functions. Should say "abs" instead, i.e., remove the "()". Many, like "abs", aren't even allowed to be called like that, as it results in "TypeError: abs() takes exactly one argument (0 given)". -- assignee: docs@python components: Documentation messages: 287961 nosy: Stefan Pochmann, docs@python priority: normal severity: normal status: open title: "Built-in Functions" not being functions type: enhancement versions: Python 3.3, Python 3.4, Python 3.5, Python 3.6, Python 3.7 ___ Python tracker <http://bugs.python.org/issue29580> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue29580] "Built-in Functions" not being functions
Stefan Pochmann added the comment: The page also contains many references like "As repr(), return a string containing..." where the "()" should be removed. -- ___ Python tracker <http://bugs.python.org/issue29580> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com