Raymond Hettinger <raymond.hettin...@gmail.com> added the comment:

Here's a diff you can try out:

diff --git a/Objects/listobject.c b/Objects/listobject.c
index 3c39c6444b..ac6c9cc07a 100644
--- a/Objects/listobject.c
+++ b/Objects/listobject.c
@@ -272,8 +272,7 @@ ins1(PyListObject *self, Py_ssize_t where, PyObject *v)
     if (where > n)
         where = n;
     items = self->ob_item;
-    for (i = n; --i >= where; )
-        items[i+1] = items[i];
+    memmove(items+where+1, items+where+0, (n - where) * sizeof(PyObject *));
     Py_INCREF(v);
     items[where] = v;
     return 0;

For me, that patch makes little difference in the timings.

Another thing that could be tried is to have ins1() call list_ass_slice() to do 
the work.  That way, both paths will use the same underlying insertion code.  
It would be interesting to see if this makes any difference.

A few other thoughts:

* We usually prefer deques to lists when prepending because the former are O(1) 
and the latter are O(n).

* Modern compilers are very good at optimizing data movements in a simple 
for-loop.

* It is very difficult to get good timings for these kind of operations.  
Internally, both make a call to list_resize() which in turn uses realloc().  
The latter function can be very fast or very slow depending solely on whether 
it can resize in-place or has to move the data.  Accordingly, timings are 
easily confounded by minor changes to memory useage.  To get a clearcut timing, 
you would need to isolate just the for-loop or memmove operation and not 
include Python calling overhead, realloc operations, and everything else that 
is going on in the given timings.

* Another source of confounding is the compiler itself.  Changes that makes 
small improvements on Clang may hurt the GCC build or vice-versa.  The 
comparison can also be thrown-off by whether you're timing a PGO build (which 
tends to do an amazing job of optimizing for-foops).  The results may also 
differ between 32-bit builds and 64-builds.

* The timing isn't representative of how people use insert(x, 0).  It would be 
a mistake to optimize an uncommon case if it comes at the expense of the common 
case (insertions into small lists).  To guard against this, you would need to 
measure inserts into a small list to find-out whether memmove() has more setup 
time than the for-loop.  When we've done such investigations in the past, 
for-loops were found to beat memmove() for sizes under 16.  Almost certainly 
this changed as compilers and processors have changed over the years.

* For smaller list sizes, the calling overhead dominates the insertion time.  
Disassembling the two different calls show a good deal of overhead for calls.

>>> from dis import dis
>>> dis('a.insert(0,0)')
  1           0 LOAD_NAME                0 (a)
              2 LOAD_METHOD              1 (insert)
              4 LOAD_CONST               0 (0)
              6 LOAD_CONST               0 (0)
              8 CALL_METHOD              2
             10 RETURN_VALUE
>>> dis('a[0:0]=[0]')
  1           0 LOAD_CONST               0 (0)
              2 BUILD_LIST               1
              4 LOAD_NAME                0 (a)
              6 LOAD_CONST               0 (0)
              8 LOAD_CONST               0 (0)
             10 BUILD_SLICE              2
             12 STORE_SUBSCR
             14 LOAD_CONST               1 (None)
             16 RETURN_VALUE

----------

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue39801>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to