On Nov 6, 2008, at 10:35 PM, Steve Holden wrote:
That's good to hear. Your arguments are sometimes pretty good, and
usually well made, but there's been far too much insistence on all
sides
about being right and not enough on reaching agreement about how
Python's well-defined semantics for assignment and function calling
should best be described.
In other words, it's a classic communication problem.
That's a fair point. I'll try to do better.
I must say I find it strange when people try to contradict my
assertion
that Python names are references to objects, when the (no pun
intended)
reference implementation of the language uses "reference counting" to
track how many assignments have been made.
I agree. It seems like we should be able to take that as a given.
So any argument that the language "doesn't have the concept of object
reference (in the sense of e.g. C++ reference)" is simply stating the
obvious: that Python has no way to declare reference variables. I
would
argue myself that it has no need of such a mechanism precisely because
names are object references, and I'd like to hear counter-arguments.
Right. I think of it this way: every variable is an object reference;
no special syntax needed for it because that's the only type of
variable there is. (Just as with Java or .NET, when dealing with any
class type; Python is just a little more extreme in that even simple
things like numbers are wrapped in objects.)
Note: I tried to say "name" above instead of "variable" but I couldn't
bring myself to do it -- "name" seems to generic to do that job. Lots
of things have names that are not variables: modules have names,
classes have names, methods have names, and so do variables. If I say
"name," an astute listener would reasonably say "name of what" -- and
I don't want to have to say "name of some thing in a name space which
can be flexibly associated with an object" when the simple term
"variable" seems to work as well.
Well that's not true either. If I remember all the way back to my
computational science degree I seem to remember being taught that
there
was call by *simple reference*, which is what I understand you to
mean.
Suppose I write the following on some not-quite-Python language:
lst = ['one', 'two', 'three']
index = 1
def foo(item, i):
i = 2
item = "ouch"
foo(lst[index], index)
...
With call by simple reference, after the call I would expect the
following conditions to be true:
index == 2
lst == ['one', 'ouch', 'three']
Yes, I guess so, though it would require that lst[index] evaluate to
an lvalue to which the 'item' parameter could be an alias. (With the
second parameter, 'i', the situation is more straightforward because
you're passing in a simple variable rather than a more complex
expression.)
With full call by reference, however, arguably the change to the value
of index would induce the post-conditions
index == 2
lst == ['one', 'two', 'ouch']
because the reference made by the first argument depends on the
value of
a variable mutated inside the function call.
I confess that I've never heard of "call by simple reference" or "call
by full reference" before. What you're describing in the second case
sounds more like call by name to me.
But I think we can agree that neither of these behaviors describes
Python.
Why the resistance to these simple and basic terms that apply to
any OOP
language?
Ideally I'd like to see this discussion concluded without resorting to
democratic appeals. Otherwise, after all, we should all eat shit:
sixty
billion flies can't possibly be wrong.
I think I could make a good argument that the nutritional needs of
flies are different from those of humans. On the other hand, what
argument is there that the Python community should use its own unique
terminology for concepts that apply equally well to other languages?
Wouldn't communication be easier and smoother if we adopted standard
terms for standard behavior?
What does "give a new name to an object" mean? I submit that it
means
exactly the same thing as "assigns the name to refer to the object".
I normally internalize "x = 3" as meaning "store a reference to the
object 3 in the slot named x", and when I see "x" in an expression I
understand it to be a reference to some object, and that the value
will
be used after dereferencing has taken place.
Works for me.
I've seen various descriptions of Python's name binding behavior in
terms of attaching Port-It notes bearing names to the objects
reference
by the names, and I have never found them convincing. The reason for
this is that names live in namespaces, whereas values live in some
other
universe altogether (that I normally describe as "object space" to
beginners, though this is not a term you will come across in the
python
literature).
Agreed. That model implies that all names are global, and completely
fails to explain how one object might be named "x" and a completely
different object might also be "x" (albeit in a different namespace).
I suppose your post-its could be color-coded by namespace, and then
you could add additional warts and caveats and addendums to explain
recursion, or explain why you don't have to search all objects in
existence to find the right one every time a name is dereferenced, but
the whole thing seems like a house of cards to me.
So I see the Post-it as being attached to a portion of some
namespace, and that little fixed-size piece of object space being
attached by a piece of string to a specific object. Of course any
object
can have many piece of string attached, and not all of them come from
names -- some of them come from container elements, for example.
Right.
There certainly is no difference in behavior that anyone has been
able
to point out between what assignment does in Python, and what
assignment
does in RB, VB.NET, Java, or C++ (in the context of object
pointers, of
course). If the behavior is the same, why should we make up our own
unique and different terminology for it?
One reason would be that in the other languages you have other choices
as well, so you need to distinguish between them. Python is simpler,
and
so I don't see us needing the terminological complexity required in
the
other contexts you name, for a start.
OK, that's a fair argument, and I do suspect this is a big part of it
-- when your language clearly supports passing object references and
other types by-ref and by-val, and you can easily demonstrate the
difference, then there is little temptation to claim that it doesn't
do either one. But if your language supports only one of these, and
you have no choices about it and can't (within the language itself)
compare and contrast that one against another, then it is easy to make
all sorts of claims about what that one is.
But getting back to your point: is the standard terminology really
more complex than whatever else we can come up with?
Java messed up the whole deal by
having different kinds of objects as a sacrifice to run-time speed,
thereby breeding a whole generation of programmers with little clue
about these matters, and the .NET environment also has to resort to
"boxing" and "unboxing" from time to time. I say away with comparisons
to such horrendously complex issues. One of the reasons for Python's
continue march towards world domination (allow me my fantasies) is its
consistent simplicity. Those last two words would be my candidate for
the definition of "Pythonicity".
I'm with you there. To me, the consistent simplicity is exactly this:
all variables are object references, and these are always passed by
value.
- the parameters of a function are local names for the call
arguments
Agreed; they're not aliases of the call arguments.
They are actually names local to the function namespace, containing
references to the arguments. Some of those arguments were provided as
names, in which case the local name contains a copy of the reference
bound to the name provided as an argument. This is, however, merely a
degenerate case of the general instance, in which an expression is
provided as an argument and evaluated, yielding (a reference to) an
object which is then bound to the parameter name in the local
namespace.
Quite right.
(I guess 'pass by object' is a good name).
Well, I'm not sure why that would be. What you've just described is
called "pass by value" in every other language.
Sigh. This surely can only be true if you insist that references are
themselves values. I hold that they are not.
Here's an example of the above, I guess. In a language that supports
integers and doubles as simple types, stored directly in a variable,
then it is an obvious generalization that in the case of an object
type, the value is a reference to an object. (Then you can
"dereference" such a value to get to the values stored within the
object.) It is the only simple and consistent description of such a
language (which includes Java, RB, and .NET, as well as C++ if you
consider an object pointer equivalent to a reference in more modern
languages.)
But Python doesn't have those simple types, so there is a temptation
to try to skip this generalization and say that references are not
values, but rather the values are the objects themselves (despite the
dereferencing step that is still required to get any data out of
them). Well, and of course in the case of immutable objects, there is
very little observable difference between references and values.
However, it seems to me that when you start denying that the value of
an object reference is a reference to an object, this is when you get
led into a quagmire of contradictions. Perhaps I'm wrong and I just
haven't explored that path far enough, because it appears dark and
cobwebby to my eyes. I will try to give it a chance.
It seems so transparent to me that the parameters are copies of the
references passed as arguments
I find it difficult to understand how, or why, anyone would
conceptualize it differently.
Now you seem to be saying the same thing I've been saying all along.
But this really is called "pass by value" in at least RB, VB.NET, and
Java. And that makes sense to me.
OK, so above you argue quite cogently that Python uses a reference-
passing mechanism.
Yes, of course.
This make you insistence in the preceding paragraph on calling it
"pass by value" a little stubborn.
Why? Are you really meaning to insist that the RB/VB.NET example:
Function GetAgeInDogYears(ByVal whom As Person) As Integer
return whom.age * 7
End Function
is not actually using a by-value parameter? Or that it's not passing
an object reference?
Sigh again. You appear to want to have your cake and eat it. You
are, if
effect, saying "there are no values in Python, only references",
completely ignoring the fact that it is semantically impossible to
have
a reference without having something to *refer to*
Well of course. I'm pretty sure I've said repeatedly that Python
variables refer to objects on the heap. (Please replace "heap" with
"object space" if you prefer.) I'm only saying that Python variables
don't contain any other type of value than references -- no integers
or doubles, for example. This is unlike the other languages under
discussion (and may be at the root of the confusion).
(which we in the Python world, in our usual sloppy way, often call
"a value").
Yes, and as long as we're agreed that this is only a sloppy shorthand,
I'm OK with it (especially in the case of immutable objects, where the
distinction is irrelevant).
I suspect this may be at the root of our equally stubborn insistence
that calling this mechanism "pass by value" is inviting
misunderstanding. If we didn't want to eliminate misunderstanding we
would all have stopped replying to you long ago.
Ditto right back at you. :) So maybe here's the trouble: since all
Python variables are references, there is no need to distinguish
reference types from any other types (there aren't any other types).
So, with the distinction gone, there is a strong temptation to gloss
over the fact that they are references at all, and try to say that the
variables directly contain their objects.
But it seems to me that this claim quickly breaks down -- even as you
said yourself; you need instead some mental model that shows the
variables as pointing to (tied to via strings, associated via a lookup
table, or whatever) the objects, which exist in object space. In
other words, they're references.
But continuing to attempt to gloss over that fact, when you come to
parameter passing, you're then stuck trying to avoid describing it as
call by value, since if you claim that what a variable contains is the
object itself, then that doesn't fit (since clearly the object itself
is not copied). You also have to describe the assignment operator as
different from all other languages, since clearly that's not copying
the object either.
So you end up in this (to me, very strange) state where you're making
up new terms to describe the parameter behavior, and the assignment
behavior, which behavior is exactly the same as any other modern OOP
language. It makes it (again, IMHO) all seem very much more complex
and mysterious than it really is. And this all results inevitably
from trying to gloss over the fact that Python variables are references.
So, while I'm trying this path on for size (and will continue to mull
it over further), please try on this approach: boldly admit that
they're references, and embrace that fact. An assignment copies the
RHS reference into the LHS variable, nothing more or less. A
parameter copies the argument reference into the formal parameter,
nothing more or less. And all this is exactly the same as in any
other OOP language the reader is likely to know. Isn't that simple,
clear, and far easier to explain?
Well, I started with Simula and SmallTalk back in 1973, so my
experience
may be a bit light. Sorry about that. This terminology wasn't made
up by
Python beginners, but by the people who invented Python.
Was it? Has our BDFL weighed in on this terminology issue anywhere?
So far, the only "official" words I've found related to this
discussion are the ones plainly admitting that Python uses references
(which some in this thread seem to want to deny, though not you Steve).
I believe they did so on the grounds that it's easier for beginners
to understand
Python's semantics without having to reference too many similar in
theory but confusingly different in practice other environments.
I wonder if that could be tested systematically. Perhaps we could
round up 20 newbies, divide them into two groups of 10, give each one
a 1-page explanation either based on passing object references by-
value, or passing values sort-of-kind-of-by-reference, and then check
their comprehension by predicting the output of some code snippets.
That'd be very interesting. It's hard for me to believe that the
glossing-over-references approach really is easier for anybody, but
maybe I'm wrong.
I would even argue that your confusion supports this argument. Your
understanding of Python is perfectly adequate, so get with the program
for Pete's sake!
In my case, my understanding of Python became clear only once I
stopped listening to all the confusing descriptions here, and realized
that Python is no different from other OOP languages I already knew.
Best,
- Joe
--
http://mail.python.org/mailman/listinfo/python-list