On Mar 12, 2020, at 13:22, Marco Sulla <[email protected]>
wrote:
>
> On Thu, 12 Mar 2020 at 18:42, Andrew Barnert via Python-ideas
> <[email protected]> wrote:
>> What if a for loop, instead of nexting the iterator and binding the result
>> to the loop variable, instead unbound the loop variable, nexted the
>> Iterator, and bound the result to the loop variable?
>
> I missed that. But I do not understand how this can speed up any loop.
> I mean, if Python do this, it does an additional operation at every
> loop cycle, the unbounding. How can it be faster?
Because rebinding includes unbinding if it was already bound, so the unbinding
happens either way.
Basically, instead of this pseudocode:
push it->tp_iternext(it) on the stack
if f_locals[idx]:
decref f_locals[idx]
f_locals[idx] = NULL
f_locals[idx] = stack pop
incref f_locals[idx]
… you’d do this:
if f_locals[idx]:
decref f_locals[idx]
f_locals[idx] = NULL
push it->tp_iternext(it) on the stack
f_locals[idx] = stack pop
incref f_locals[idx]
No extra cost (or, if you don’t optimize it out, the only extra cost is
checking whether the variable is already bound an extra time, which is just
checking a pointer in an array against NULL), and the benefit is that the
object is decref’d before you call tp_iter.
Why does this matter? Well, that’s the whole point of the proposal.
A decref may reduce the count to 0. In this case, the object is freed before
tp_iternext is called, so if tp_iternext needed to do a big allocation for each
value, the object allocator will probably reuse the last one instead of going
back to the heap.
A decref may also reduce the count to 1, if the iterator is storing a copy of
the same object internally. In general this doesn’t help, but if the iterator
is written in C and it knows the object is a special known-safe type like tuple
(which is immutable and has no reference borrowing APIs) it can reuse it
safely. As permutations apparently does.
All that being said, as Guido explained, I don’t think my idea is workable. I
think what we really want is to release the object before tp->iternext iff it’s
not going to raise StopIteration, and there’s no way to predict that in advance
without solving the halting problem, so…
> Furthermore, maybe I can be wrong, but reassigning to a variable
> another object does not automatically unbound the variable from the
> previous object?
> For what I know, Python is a "pass by value", where the value is a
> pointer, like Java.
That’s misleading terminology. Java uses “pass by value” and Ruby uses “call by
reference” to mean doing the same thing Python does, so describing it as either
“by value” or “by reference” is just going to confuse as many people as it
helps. Barbara Liskov explained why it was a meaningless distinction for
languages that aren’t sufficiently ALGOL-like back around 1980, and I don’t
know why people keep confusingly trying to force languages to fit anyway 40
years later. Better to just describe what it does.
> Indeed any python variable is a PyObject*, a
> pointer to a PyObject.
No. Any Python _value_ is a PyObject*. It doesn’t matter whether the value is a
temporary, stored in a variable, stored in a list element, stored in 17
different variables, whatever.
And that’s all specific to the CPython implementation. In Jython or PyPy, a
Python value is a Java object or a Python object in the underlying Python.
So what’s a variable? Well, Python doesn’t have variables in the same sense as
a language like C++. It has namespaces, that map names to values. A variable in
Python’s syntax is just a lookup of a name in a namespace in Python’s
semantics. And a namespace is in general just a dictionary. That’s pretty much
all there is to variables. (There’s an optimization for locals, which are
converted into indexes into a C array of values stored on the frame object
instead, which is why we have all those issues with locals() and exec. And
there’s also the cell thing for closure variables. And there’s nothing stopping
you from replacing a namespace’s __dict__ with an object of a different type
that does almost anything you can imagine. But ignore all of that.) If you
understand dicts, you understand variables, and you don’t need to mention
PyObject* to understand dicts (unless you want to use them from the C API).
> When you assign a new object to a variable, what are you really doing
> is change the value of the variable from a pointer to another.
You’re just updating the namespace dict to map the same key to a different
value.
> So the
> variable now points to a new memory location, and the old object has
> no more references other then itself. Am I right?
Well, the dict entry now holds a new value, and the old value has one reference
fewer, which may be 0, in which case it’s garbage and can be cleaned up. It
doesn’t hold a reference to itself (except in special cases, e.g.,
`self.circular = self` or `xs.append(xs)`).
In CPython, where values are PyObject* under the covers, the hash buckets in a
dict include PyObject* slots for the key and value, and the dict’s __setitem__
takes care of incref’ing the stored value, and incref’ing the key if it’s new
or dec’refing the old value if it’s a replacement. And CPython knows to delete
an object as soon as a decref brings it to 0 refs. (What about fastlocals? The
code to load and store variables to fastlocal slots does the same incref and
decref stuff, but there’s no keys to worry about, because the compiler already
turned them into indexes into an array. And an unbound local variable is a NULL
in the array, as opposed to just not being in the dict. And if you want to dig
into cells, they’re not much more complicated.)
But again, that’s all CPython specific. In, say, Jython, the hash buckets in a
dict just have two Java objects for the key and value, which aren’t pointers
(although under another set of covers your JVM is probably implemented in C
using pointers all over the place), and nobody’s tracking refcounts; the JVM
scans memory whenever it feels like it and deletes any objects (including
Python ones) that aren’t referenced by anyone. This is why any optimizations
like permutations reusing the same tuple if the refcount is 1 only make sense
for CPython (and only from the C API rather than from Python itself).
_______________________________________________
Python-ideas mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-ideas.python.org/
Message archived at
https://mail.python.org/archives/list/[email protected]/message/J6FHWYJAXZV72SB4VUPKG3RKRULE4QQH/
Code of Conduct: http://python.org/psf/codeofconduct/