Re: Finding the instance reference of an object

Steven D'Aprano Fri, 07 Nov 2008 08:05:51 -0800

On Thu, 06 Nov 2008 21:31:16 -0700, Joe Strout wrote:

>> You're wrong, Python variables don't contain *anything*. Python
>> variables are names in a namespace.
> 
> I think we're saying the same thing.  What's a name?  It's a string of
> characters used to refer to something.  That which refers to something
> is a reference.


In some sense, I have to agree with that. "Reference" as a plain English 
word is very abstract.


> The thing it refers to is a referent.  I was trying to
> avoid saying "the value of an object reference is a reference to an
> object" since that seems tautological and you don't like my use of the
> word "value," but I see you don't like "contains" either.

I'm happier with the idea that a name in Python "refers to" an object 
than your claim that "variables" contain a reference to an object. Let me 
explain:

In languages such as C and Pascal, a variable is a named memory location 
with an implied size. For the sake of the argument, let's assume 
variables are all two bytes in size, i.e. they can hold a single short 
integer. So, if the name 'x' refers to location 0x23A782, and the two 
bytes at that location are 0x0001, then we can legitimately say that the 
location 0x23A782 (otherwise known as 'x') _contains_ 1 because the byte 
pattern representing 1 is at that memory location.

But in Python, what you've been calling "variables" is explicitly a 
*mapping* between a name and a value. Unlike variables above, the 
compiler can't map a name to a memory location. At run time, the VM has 
to search a namespace for the name. If you disassemble Python byte-code, 
you will see things like:

    LOAD_NAME     1 (x)

If you want to talk about something containing the value, that something 
would be the namespace, not the name: the name is the key in the hash 
table, and is a separate piece of data to the value. The key and the 
value are at different locations, you can't meaningfully say that the 
value is contained by the key, for the same reason that given a list 

[10, 11, 12, 13, 14, 15]

you wouldn't say that the int 12 was contained by the number 2.


> Maybe we can try something even wordier: a variable in Python is, by
> some means we're not specifying, associated with an object.  (This is
> what I mean when I say it "refers" to the object.)  Can we agree that
> far?

So far.


> Now, when you pass a variable into a method, the formal parameter gets
> associated with same object the actual parameter was associated with.

Agreed.


> I like to say that the object reference gets copied into the formal
> parameter, since that's a nice, simple, clear, and standard way of
> describing it.  I think you object to this way of saying it.  But are we
> at least in agreement that this is what happens?

At the implementation level of CPython, yes. In abstract, no. In 
abstract, we can't make *any* claims about what happens beyond the Python 
code, because we don't know how the VM is implemented. Perhaps it is a 
giant pattern in Conway's cellular automata "Life", which is Turing 
Complete.

In practice, any reasonable implementation of Python on existing computer 
hardware is going to more-or-less do what the CPython implementation 
does. But just because every existing implementation does something 
doesn't mean it's not an implementation detail.

 
>> But putting that aside, consider the Python code "x = 1". Which
>> statement would you agree with?
>>
>> (A) The value of x is 1.
> 
> Only speaking loosely (which we can get away with because numbers are
> immutable types, as pointed out in the last section of [1]).

Why "speaking loosely"?

At the level of Python code, the object you have access to is nothing 
more or less than 1. There is a concrete representation of the abstract 
Platonic number ONE, and that concrete representation is written as 1.

The fact that the object 1 is immutable rather than mutable is 
irrelevant. After x = [] the value of x is the empty list.

As far as I am concerned, this is one place where the plain English 
definition of the word "value" is the only meaningful definition: "what 
is denoted by a symbol". Or if you prefer, "what the symbol represents". 
At the language level, x=1 means that x represents the object 1, nothing 
more and nothing less, regardless of how the mechanics of that 
representation are implemented.

The value of a variable is whatever thing you assign to that variable. If 
that thing is the int 1, then the value is the int 1. If the thing is a 
list, the value is that list. If the thing is a pointer, the value is a 
pointer. (Python doesn't give you access to pointers, but other languages 
do.) Whatever mechanism is used to implement that occurs at a deeper 
level. In Pascal and C, bytes are copied into memory locations (which is 
actually implemented by flipping bits at one location to match the state 
of bits at another location). In CPython and Java, pointers or references 
are created and pointed at complex data structures called objects. That's 
an implementation detail, just like flipping bits is an implementation 
detail.

If you don't agree with me on this, I'm afraid that your understanding of 
value is so different from mine that I fear we will never find any common 
ground. I'm afraid that in this context I consider any other definition 
of "value" to be obtuse and obfuscatory and out-and-out harmful.


 
>> (B) The value of x is an implementation-specific thing which is
>> determined at runtime. At the level of the Python virtual machine, the
>> value of x is arbitrary and can't be determined.
> 
> Hmm, this might be true to somebody working at the implementation level,

Okay, we agree on that.


> but I think we're all agreed that that's not the level of this
> discussion.  What's relevant here is how the language actually behaves,
> as observable by tests written in that language.

As far as I can see, that implementation level _is_ the level you are 
talking at. You keep arguing that the value of x is a reference to the 
object, a reference which is implementation specific and determined at 
runtime. 


 
>> If you answer (A), then your claim that Python is call-by-value is
>> false.
> 
> Correct.
> 
>> If you answer (B), then your claim that Python is call-by-value is true
>> but pointless, obtuse and obfuscatory.
> 
> Correct again.

Well. I'm not sure what else I can say to that other than, "Why on earth 
would you prefer a pointless, obtuse and obfuscatory claim over one which 
is equally true but far more useful and simple?"



> My answer is:
> 
> (C) The value of x is a reference to an immutable object with the value
> of 1.  (That's too wordy for casual conversation so we might casually
> reduce this to (A), as long as we all understand that (A) is not
> actually true.  It's a harmless fiction as long as the object is
> immutable; it becomes important when we're dealing with mutable
> objects.)

But it doesn't matter. And that's important. I've seen this before, in 
other people. They get hung up about the difference between mutable and 
immutable objects and start assuming a difference in Python's behaviour 
that simply isn't there. When assigning to a name, Python makes no 
distinction between mutable and immutable objects:


>>> dis.dis( compile('x=set([1]); y=frozenset([1])', '', 'exec') )
  1           0 LOAD_NAME                0 (set)
              3 LOAD_CONST               0 (1)
              6 BUILD_LIST               1
              9 CALL_FUNCTION            1
             12 STORE_NAME               1 (x)
             15 LOAD_NAME                2 (frozenset)
             18 LOAD_CONST               0 (1)
             21 BUILD_LIST               1
             24 CALL_FUNCTION            1
             27 STORE_NAME               3 (y)
             30 LOAD_CONST               1 (None)
             33 RETURN_VALUE


 
>>> This is explicitly stated in
>>> the Python docs [1], yet many here seem to want to deny it.
>>
>>> [1] http://www.python.org/doc/2.5.2/ext/refcounts.html
>>
>> You have a mysterious and strange meaning of the word "explicitly".
>> Would you care to quote what you imagine is this explicit claim?
> 
> A few samples: "The chosen method is called reference counting. The
> principle is simple: every object contains a counter, which is
> incremented when a reference to the object is stored somewhere, and
> which is decremented when a reference to it is deleted. When the counter
> reaches zero, the last reference to the object has been deleted and the
> object is freed.   ...Python uses the traditional reference counting
> implementation..."

Implementation details again. It says nothing about what is visible at 
the level of Python code.

 
> This seems like a point we really shouldn't need to argue.  Do you
> really want to defend the claim that Python does not use references?

Python does not use references. Python uses names and objects. The 
CPython implementation implements such names and objects using references 
(pointers). Other implementations are free to make other choices at the 
implementation level. 

At the Python level, the programmer has access to objects:

(1, 3, [], None)

is a tuple (an object) consisting of four objects 1, 3, an empty list and 
None. There's no capacity to request a reference to an object: if there 
was, it would be like Pascal's var parameters.

At the implementation level, the above tuple is implemented (in part) by 
four pointers. But that's invisible at the Python level. It doesn't exist 
at the Python level: you can't access those pointers, you can't do 
anything with them, except indirectly by manipulating names and objects.

Here's an analogy: at the Python level we say that strings are immutable: 
they can't be changed. But at the implementation level that's clearly 
nonsense: strings are merely bytes no different from any other bytes, and 
they are as mutable as any others. But from Python code, the programmer 
has no way to mutate a string. Such behaviour isn't part of Python. We 
can rightly say that Python has no mutable strings, even though at the 
implementation level strings are mutable.


 
>> Yes, you are right, Python does not offer pass by reference. The
>> canonical test for "call by reference" behaviour is to write a function
>> that does this:
>>
>> x = 1
>> y = 2
>> swap(x, y)
>> assert x == 2 and y == 1
>>
>> If you can write such a function, your language may be call-by-
>> reference.
>> If you can't, it definitely isn't c-b-r. You can't write such a
>> function
>> in standard Python, so Python isn't c-b-r.
> 
> Whew!  That's a relief.  A week ago (or more?), it certainly sounded
> like some here were claiming that Python is c-b-r (usually followed by
> some extended hemming and hawing and except-for-ing to explain why you
> couldn't do the above).

Yes. Such confusion is very common, because people discover that Python 
isn't call-by-value since mutations to arguments in a function are 
visible outside of the function, and assume that therefore Python must be 
call-by-reference.

 
>> The canonical test for "call by value" semantics is if you can write a
>> function like this:
>>
>> x = [1]  # an object that supports mutation 
>> mutate(x)
>> assert x == [1]
>>
>> If mutations to an argument in a function are *not* reflected in the
>> caller's scope, then your language may be call-by-value. But if
>> mutations
>> are visible to the caller, then your language is definitely not c-b-v.
> 
> Aha.  So, in your view, neither C, nor C++, nor Java, nor VB.NET are c-
> b-v, since all of those support passing an object reference into a
> function, and using that reference to mutate the object.

Take C out of that list. C is explicitly call-by-value, since it always 
does copying of values. If you want to avoid copying the value, you have 
to write your function to accept a pointer to the value you care about 
and then dereference the pointer inside the function.

As for C++, Java and VB.NET, I would argue that using the term call-by-
value for them is misleading. I'm not the only such person who believes 
so:

"As in Java, the calling semantics are call-by-sharing: the formal 
argument variable and the actual argument share the same value, at least 
until the argument variable is assigned to. Assignments to the argument 
variable do not affect the value passed; however, if the value passed was 
an array, assignments to elements of that array will be visible from the 
calling context as well, since it shares the same array object."

http://www.cs.cornell.edu/courses/cs412/2001sp/iota/iota.html

Trust me, I didn't write that.



> Your view is at odds with the standard definition, though; in fact I'm
> pretty sure we could dig up C and Java specs that explicitly spell out
> their c-b-v semantics, and RB and VB.NET pretty clearly mean "ByVal" to
> indicate by-value in those languages.

I accept that C (like Pascal) is c-b-v. Given the following C code:

int count;
int  x[1000];
for( count = 0; count < 1000; count++ )
    x[count] = count;


the value of the variable 'x' is an array of ints 0,1,2,...999. When you 
call a function with argument x, the entire array is copied. Call-by-
value: the variable's value is copied into the function's scope.

For Java to be considered c-b-v, we have to agree that the value of x 
following that assignment is not the array of ints that the source code 
suggests it is, but some arbitrary pointer to that array. If we agree on 
that definition, we can agree that Java is c-b-v, but I maintain it is a 
foolish definition.

I can't imagine what reason people had for tossing out the commonsense 
meaning of the word "value". What benefit does it give? For those whose 
first language was Pascal or Fortran or C and had only heard of two 
calling conventions it avoids the need to learn the name for a third 
convention, but the cost is that c-b-v no longer has a single meaning. It 
now has at least two meanings: C call-by-value is different from Java 
call-by-value, because when you call a C function with an array the 
entire array is copied, and when you call a Java function with an array 
the entire array is *not* copied. Different program behaviour with the 
same name.

That's bad enough when it happens with an element of language syntax but 
it is unforgivable when it happens to something which is supposed to be 
generic to all languages. 

Not only do you lose the regular dictionary meaning of the word value, 
but you introduce a second meaning to call-by-value. When you say 
"Language Foo is call-by-value", you have no way of telling what 
listeners will understand by that. If they come from a Pascal or C 
background, they will understand one thing ("values are copied"), and if 
they come from a Java or VB.NET background they will understand something 
very different ("sometimes values are copied, and sometimes pointers to 
the value are copied").


> The canonical test of c-b-v is whether a *reassignment* of the formal
> parameter is visible to the caller.

That's just the same test for c-b-r: it's a variation of swap(x, y) but 
using only one variable. That's equivalent to assuming that c-b-r and c-b-
v are a dichotomy: if a language isn't one, it must be the other. Wrong, 
false, harmful!


> Simply using the parameter for
> something (such as dereferencing it to find and change data that lives
> on the heap) doesn't prove anything at all about how the parameter was
> passed.

But it does: if a mutation to the argument is visible to the caller, then 
you know you haven't mutated a copy. If no copy was made, then it isn't 
call-by-value.


>> Python is neither call-by-reference nor call-by-value.
> 
> That can be true only if at least one of the following is true:
> 
> 1. Python's semantics are different from C/C++ (restricted to pointers),
> Java, and RB/VB.NET; or

Why would you restrict C/C++ to pointers? Are you now going to argue that 
these languages have different calling conventions depending on the type 
of the argument?

In C, the calling convention is precisely the same whether the argument 
is a pointer or an int. In both cases, the value is copied.

Python's semantics are different from C since values are not copied when 
you pass them to a function, but (as fair as I know) the same as that of 
the others.


> 2. C/C++ (restricted to pointers), Java, and RB/VB.NET are not call-by-
> value.

Take C out of that list and I will agree.


> I asked you before which of these you believed to be the case, so we
> could focus on that, but I must have missed your reply.  Can you please
> clarify?
> 
>  From your above c-b-v test, I guess you would argue point 2, that
> none of those languages are c-b-v.  If so, then we can proceed to
> examine that in more detail.  Is that right?

Sure.

 
>>> That would indeed be nonsense.  But it's also not what I'm saying. See
>>> [2] again for a detailed discussion and examples.  Call-by-value and
>>> call-by-reference are quite distinct.
>>
>> And also a false dichotomy.
> 
> I've never claimed these are the only options; just that they're the
> only ones actually used in any of the languages under discussion.  If
> you think Python uses call by name, call by need, call by macro
> expansion,

No, none of those.


> or something else at [2], please do say which one.  "Call by
> object", as far as I can tell, is just a made-up term for call-by- value
> when the value is an object reference.

And you think that "call-by-reference" or "call-by-value" is anything 
other than "made-up"? What, you think that these terms have always 
existed, as far back as language? They're made-up terms too. I would say 
that the value is the object, full stop. The reference is an 
implementation detail.

As we've repeatedly said, "call-by-sharing" has a good pedigree: it goes 
back at least to Barbara Liskov and CLU in 1974, in the dawn of object-
oriented programming.

http://en.wikipedia.org/wiki/Barbara_Liskov

The real question isn't why Python should use the term "call-by-sharing", 
but why the Java and VB people didn't use it.



> (And I'm reasonably OK with that
> as long as we're all agreed that that is what it means.)
> 
>>>> "Calling by value" is not a useful definition of Pythons behaviour.
>>>
>>> It really is, though.  You have to know how the formal parameter
>>> relates
>>> to the actual parameter.  Is it a copy of it, or an alias of it?

False dichotomy again. The correct answer is, "Neither. It _is_ the 
actual parameter." We can prove this by using Python's scoping rules:

>>> def func(y):
...     print x is y
...
>>> x = ["something arbitrary"]
>>> func(x)
True

In case you think this is an artifact of mutable objects:

>>> x = "something immutable"
>>> func(x)
True


The value of x and the value of y are the same object. But the names are 
different. (They would be different even if the function parameter was 
called 'x', because it is in a different namespace.) Nothing we do to the 
name y can affect the name x. But things that we do to the object bound 
to y *can* affect the object bound to x, because they are the same object.



>> And by definition, "call by value" means that the parameter is a copy.
>> So
>> if you pass a ten megabyte data structure to a function using call-by-
>> value semantics, the entire ten megabyte structure is copied.
> 
> Right.  And if (as is more sensible) you pass a reference to a ten MB
> data structure to a function using call-by-value, then the reference is
> copied.

Sure. I agree. But in this case, the reference is the value. In Pascal, 
you would do something like this:

x := enormous_array();  {returns a 10MB array}
y := func(^x);

if you couldn't or didn't write func to use a var parameter.



>> Since this does not happen in Python, Python is not a call-by-value
>> language. End of story.
> 
> So your claim is that any language that includes references (which is
> all OOP languages, as far as I'm aware), is not call-by-value?

No, I'm not making that claim. Calling conventions and the existence of 
references in the implementation of an OO language are orthogonal: one 
does not imply anything about the other.

Consider a hypothetical OO language where variables are memory locations. 
x=Foo would set the value of the variable x to some object Foo. As an 
implementation detail, this might be implemented just as you say: the 
variable (memory location) x contains a reference to the object Foo, just 
like Java, except without primitive types.

Now we call a function func(x). What happens next?

Well, knowing that the language is OO doesn't tell us *anything* about 
what happens next. We can probably make an educated guess that, if the 
language is running on a von Neumann machine (a safe bet in the real 
world!), there's probably a stack involved, and the implementation will 
probably work by pushing references on the stack. What else can we 
predict? Nothing. Here are a couple of alternative calling behaviours:

- The implementation makes a copy of the object Foo, creates a new 
reference to it, and assigns that reference to the formal parameter of 
the function. I would call that "call-by-value". What would you call it?

- The implementation makes a copy of the reference to object Foo, and 
assigns it to the formal parameter of the function. Following Barbara 
Liskov, I would call that "call-by-sharing". Following the effbot, I 
would also accept "call-by-object".

- The implementation makes a reference to the location of x, rather than 
a reference to the object, and passes that to the function. Reassignments 
to the formal parameter are reflected in the caller's scope. That would 
be call-by-reference.

Any of these would be reasonable choices for a designer to make, although 
the first one has all the disadvantages of call-by-value in languages 
like C and Pascal. Nevertheless, if you want that behaviour in your OOP 
language, you can have it.



>>> Without knowing that, you don't know what assignments to the formal
>>> parameter will do, or even what sort of arguments are valid. Answer:
>>> it's a copy of it.
>>
>> Lies, all lies. Python doesn't copy variables unless you explicitly ask
>> for a copy.
> 
> Hmm, I'm struggling to understand why you would say this.  Perhaps you
> mean that Python doesn't copy *objects* unless you explicitly ask for a
> copy.  That's certainly true.

We can agree on this.


> But it does copy references in many circumstances, including in
> assignment statements, and parameter passing.

At the implementation level, not at the Python level.


>> That some implementations of Python choose to copy pointers rather than
>> move around arbitrarily large blocks of memory instead is an
>> implementation detail. It's an optimization and irrelevant to the
>> semantics of argument passing in Python.
> 
> I agree that under the hood, there are probably other ways to get the
> same behavior.  What's important is to know whether the formal parameter
> is an alias of the actual parameter, or its own independent local
> variable that (let me try to say it more like your way here) happens to
> be initially associated with the same referent as the actual parameter. 
> This obviously has behavioral consequences, as you showed above.
> 
> A concise way to describe the behavior of Python and other languages is
> to simply say: the object reference is copied into the formal parameter.

Hence the alias for "call-by-sharing", "call-by-object-reference". 
Personally I find that term too long and unwieldy and prefer call-by 
either sharing or object.


 
>>> Assignments don't affect the actual parameter at all.  This is exactly
>>> what "call by value" means.
>>
>> Nonsense. I don't know where you get your definitions from, but it
>> isn't
>> a definition anyone coming from a background in C, Pascal or Fortran
>> would agree with.
> 
> Well I can trivially refute that by counterexample: I come from a
> background in C, Pascal, and FORTRAN, and I agree with it.

Well there you go. Serves me right for making a sweeping generalization. 


> As for where I get my definitions from, I draw from several sources:
> 
> 1. Dead-tree textbooks
> 2. Wikipedia [2] (and yes, I know that has to be taken with a grain of
> salt, but it's so darned convenient)
> 3. My wife, who is a computer science professor and does compiler
> research
> 4. http://javadude.com/articles/passbyvalue.htm (a brief but excellent
> article)
> 5. Observations of the "ByVal" (default) mode in RB and VB.NET 6. My own
> experience implementing the RB compiler (not that implementation details
> matter, but it forced me to think very carefully about references and
> parameter passing for a very long time)

Which makes you so close to the trees that you can't see the forest. 
You're not think at the level of Python code. You're not even thinking at 
the level of the Python virtual machine. You're thinking about what 
happens to make the Python virtual machine work.


 
> Not that I'm trying to argue from authority; I'm trying to argue from
> logic.  I suspect, though, that your last comment gets to the crux of
> the matter, and reinforces my guess above: you don't think c-b-v means
> what most people think it means.

No, I make no claim about who is in a majority. What I argue is that the 
Java and VB communities have taken a term with an established meaning, 
and are using it for something which is at best pedantically true at a 
deeper implementation level. I think they are foolish to have done so, 
and their actions have led to imprecision in language: "call-by-value" is 
no longer a single strategy with a single observable behaviour, but now 
refers to at least two incompatible behaviours: Pascal/C style, and Java/
VB style. I don't accept that alternative meaning, because I insist that 
at the level of Python code, after x=1 the value of x is 1 and not an 
arbitrary, artificial, implementation-dependent reference to 1.



-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: Finding the instance reference of an object

Reply via email to