Re: why cannot assign to function call

Steven D'Aprano Sat, 10 Jan 2009 01:50:49 -0800

On Fri, 09 Jan 2009 20:23:11 +0000, Mark Wooding wrote:

> Steven D'Aprano <st...@remove-this-cybersource.com.au> wrote:
> 
>> I'm pretty sure that no other pure-Python coder has manipulated
>> references either. They've manipulated objects.
> 
> No: not directly.  The Python program deals solely with references;
> anything involving actual objects is mediated by the runtime.


Your claim is ambiguous: when you say "the Python program", are you 
talking about the program I write using the Python language, or the 
Python VM, or something else? If the first, then you are wrong: the 
Python program I write doesn't deal with references. It deals with 
objects.

This discussion flounders because we conflate multiple levels of 
explanation. People say "You do foo" when they mean "the Python VM does 
foo". Earlier, I responded to your claim that I was storing references by 
saying I was pretty sure I didn't store references, and gave an example 
of the line of Python code x=23. Your response was to mix explanatory 
levels:

> You bind names to locations which store immediate representations.
> Python IRs are (in the sense defined above) exclusively references.

I most certainly don't bind names to locations. When I write x=23, I 
don't know what the location of the object 23 is, so how could I bind 
that location to the name? You are conflating what the Python VM does 
with what I do. What *I* do is bind the object 23 to the name x. I don't 
know the location of 23, I don't even know if 23 has a well-defined 
location or if it is some sort of distributed virtual data structure. As 
a Python programmer, that's the level I see: names and objects.

At a lower level, what the Python VM does is store the name 'x' in a 
dictionary, bound to the object 23. No locations come into it, because at 
this level of explanation, dictionaries are an abstract mapping. There's 
no requirement that the abstract dictionary structure works by storing 
addresses. All we know is that it maps the name 'x' to the object 23 
somehow. Maybe there's no persistent storage, and the dict stores 
instructions telling the VM how to recreate the object 23 when it is 
needed. Who knows? But at this explanatory level, there are no locatives 
involved. Names and objects float as disembodied entities in the aether, 
and dicts map one to the other.

At an even lower explanatory level, the CPython implementation of 
dictionaries works by storing a pointer (or if you prefer, a reference) 
to the object in a hash table. Pointers, of course, are locatives, and so 
finally we come to the explanation you prefer. We've gone from abstract 
names-and-classes to concrete pointers-to-bytes. But this is at least two 
levels deeper than what's visible in Python code. Just about the only 
time Python coders work with locatives is when they manually calculate 
some index into a string or list, or similar.

At an even lower explanatory level, all the VM does is copy bytes. And at 
a lower level still, it doesn't even copy bytes, it just flips bits. And 
below that, we're into physics, and I won't go there.

I daresay you probably get annoyed at me when I bring up explanations at 
the level of copying bytes. You probably feel that for the average Python 
programmer, *most of the time* such explanations are more obfuscatory 
than useful. Of course, there are exceptions, such as explaining why 
repeated string concatenation is likely to be slow. There is a time and a 
place for such low level explanations, but not at the high-level overview 
needed by the average Python programmer.

And you would be right. But I argue that your explanation at the level of 
references is exactly the same: it is too low level. It relies on 
specific details which may not even be true for all implementations of 
Python. It certainly relies on details which won't be true for 
hypothetical versions of Python running on exotic hardware. One can do 
massively parallel calculations using DNA, and such "DNA computers" are 
apparently Turing complete. I have no idea how one would write a Python 
virtual machine in such a biological computer, but I'm pretty sure that 
data values won't have well-defined locations in a machine that consists 
of billions of DNA molecules floating in a liquid.

If that's too bizarre for you, think about simulating a Python VM in your 
own head. If we know one thing about the human brain, it is that thoughts 
and concepts are not stored in single, well-defined locations, so when 
you think of "x=23", there is no pointer to a location in your head.


>> That's why we should try to keep the different layers of explanation
>> separate, without conflating them. Python programmers don't actually
>> flip bits, and neither do they manipulate references. Python
>> programmers don't have access to bits, or references. What they have
>> access to is objects.
> 
> No, that's my point: Python programmers /don't/ have direct access to
> objects.  The objects themselves are kept at arm's length by the
> indirection layer of references.

I think you are wrong. If I want a name 'x' to refers to (is bound to) 
the object 23, I write x=23, not some variation of:

create object 23
give me a reference to that object
bind the reference to name 'x'

Those three steps may take places at some level of the Python VM, but 
that's not what *I* do as a Python programmer.

Note that what we're really doing is manipulating the symbol '23' in 
source code. Normally that makes no difference, but if you've ever tried 
to get the float 1.1 you'll discover the model (metaphor) of "source code 
symbols are programming entities" fails. All models fail sometimes.


>> > Python does pass-by-value, but the things it passes -- by value --
>> > are references.
>> 
>> If you're going to misuse pass-by-value to describe what Python does,
>> *everything* is pass-by-value "where the value is foo", for some foo.
> 
> No.  I've tried explaining this before, with apparently little success.
[...]
>   * A /value/ is an item of data.  The range and nature of values is
>     language specific.  Typically, values encompass at least some kinds
>     of numbers, textual data, and compound data structures; they may
>     also include behavioural items such as functions.

Yes. This is an intuitive meaning of the world value. In Python, all 
values are objects. Some typical examples of values are:

5, None, "Fred", True, 3.5, [2, 3, 4], {}, lambda x: x+1

These (and more complicated structures built on top of them) are the 
things of interest to the programmer. They are the values: the things 
which are denoted by the symbols '5', 'None', '"Fred"' etc.


>   * A /location/[1] is an area of memory suitable for storing the
>     /immediate representation/ (which I shall abbreviate to /IR/) of a
>     value.  (A location may be capable of storing things other than IRs,
>     e.g., representations of unevaluated expressions in lazily evaluated
>     languages.  Locations may vary in size, e.g., in order to be capable
>     of storing different types of IRs.)

At the level of Python code, we have no access to such locations. The 
closest we have is the id() function, which uses location in memory as a 
unique ID for objects, but this is an accident of the CPython 
implementation. Whatever the /immediate representation/ of a value is, we 
can't manipulate it directly in Python code.

 
>   * A /variable/ is a location to which has been /bound/ a name.  Given
>     an occurrence of a name in a program's source, there is a language
>     specific rule for determining the variable to which it is bound.

According to this definition, there are no variables in Python, because 
Python's data model is that names are an abstract mapping between symbols 
and values, not between symbols and locations.


>   * /Evaluation/ is the process of determining a value from an
>     expression.  The /value of/ an expression is the result of
>     evaluating the expression.  This value is, in general, dependent on
>     the contents of the locations to which names appearing in the
>     expression are bound.
[...]
> The argument passing model `pass-by-value' has a number of distinctive
> properties.
> 
>   * The argument expression is fully evaluated before the function is
>     called, yielding an argument value.
> 
>   * The corresponding parameter name is bound to a fresh location.
> 
>   * The argument value IR is stored in the parameter's location.

This is an underspecified definition. Without a definition of /immediate 
representation/, we can't determine what this means. I can guess that, 
based on Pascal, Fortran and C, the /immediate representation/ of a value 
is whatever data structure represents that value. However, I fear that 
you are going to try to slip in an open-ended definition, that /immediate 
representation/ could be *anything* -- for ints in C, it will be the 
bytes that represent the int; for C arrays, it will be a pointer to the 
bytes that represent the array; for Python, it will be references to 
objects; for Algol 60, it will be thunks.

To avoid weakening pass-by-value to mean everything and anything at all, 
I'm going to say that the /immediate representation/ is the bytes which 
represent the value. (That is, the value itself.) Given this, we can see 
that Python is not pass-by-value. As I have shown in another post, 
replying to Joe, the location (as exposed by the id() function in 
CPython) of the formal parameter is the same as that of the argument 
value, not a fresh location with a copy of the value. To save you looking 
up my post, here's a simple example:

>>> def function(parrot):
...     return id(parrot)
...
>>> spam = 23
>>> print id(spam), function(spam)
143599192 143599192


> By contrast, the `pass-by-reference' model has other distinguishing
> properties.
> 
>   * Whether arbitrary argument expressions are permitted is language
>     dependent; often, only a subset of available expressions -- those
>     that designate locations -- are permitted.

As you said above: "The /value of/ an expression is the result of 
evaluating the expression". Given the expression 2+3, the result of that 
expression is 5, not the location where 5 is stored. There is no reason 
to believe that 5 designates a location, as opposed to designating the 
number of peas in a pod or the average length of a piece of string.

For want of a better description, let me re-word the above to say:

* Whether arbitrary argument expressions are permitted is language
  dependent; often, only a subset of available expressions -- e.g. those
  that evaluate at a named location -- are permitted.

Note that I say they evaluate *at*, not *to*, a fixed location.

A practical example, to ensure we're talking about the same thing. In 
Pascal, I can declare a procedure swap(a, b) taking two VAR parameters, 
which use call-by-reference semantics. I might do something like this:

a := 8;  { number of peas in a pod }
b := 13;  { a baker's dozen }
swap(a, b);

Even though the values of a and b do not designate locations, the 
compiler can pass them to the procedure because the named variables a and 
b exist *at* particular locations. Contrast this with:

swap(a, 10+3);

which will fail in Pascal, because the value of the expression 10+3 
doesn't correspond to a named location. (Presumably this is a design 
choice, because the value of the expression will certainly exist at a 
known location, although possibly not known until runtime.)


>     If the argument
>     expression does designate a location, then this location is the
>     /argument location/.  

Replace this with "If the argument expression does evaluate at an allowed 
location (named location for Pascal), then..." and I will accept it.


>     If arbitrary expressions are permitted, and
>     the expression does not designate a location, then a fresh location
>     is allocated to be the argument location, the expression evaluated,
>     and the resulting IR stored in the argument location.

Modulo similar changes, accepted.


>   * The corresponding parameter name is bound to the argument location.

According to this definition, Python is call-by-reference. Refer my code 
snippet above. But clearly Python doesn't behave like call-by-reference 
in other languages: you can't write a swap() procedure.

This is where I quote Barbara Liskov, talking about the language CLU 
which has precisely the same calling semantics as Python:

"In particular it is not call by value because mutations of arguments 
performed by the called routine will be visible to the caller. And it is 
not call by reference because access is not given to the variables of the 
caller, but merely to certain objects."

http://coding.derkeiler.com/Archive/Python/comp.lang.python/2008-11/
msg01499.html

or http://snipurl.com/9qd0b


> There are other models, including value/return and call-by-name.

And Python's call-by-object (also CLU, Ruby, Java -- although Java people 
don't use the term -- and others).

[...]
> I hope that I have convincingly demonstrated that it's possible to
> define `pass-by-value' in a coherent manner, consistent with
> conventional usage, and distinguishing it clearly from `pass-by-
> reference'.

Of course you can define pass-by-value coherently, but not if the 
definition of value can be anything you like. Once you start declaring 
that a language is "pass-by-value, where the value is a Foo rather than 
the actual value", pass-by-value can be used to describe *anything*. Pass-
by-reference becomes pass-by-value where the value is the location of the 
value. Pass-by-object is pass-by-value where the value is a reference to 
the object (your claim). And so forth.


[...]
>> > I agree with the comment about Pascal, but C is actually pretty
>> > similar to Python here.  C only does pass-by-value.
>> 
>> Except for arrays.
> 
> Even for those.  C doesn't pass arrays at all; instead it passes
> (programmer-visible) pointers.  See other article.

But you are conflating concepts again. The value of an array is the 
array: it's what the programmer asked for when he declared an array. See 
your own definition of value above: "A /value/ is an item of data."

Given a symbol x which represents an array, C doesn't pass the value of x 
(the array). It passes a pointer (reference) to the value of x. This is 
not pass-by-value unless you define value so broadly that it could mean 
anything.

Given the Pascal declaration 

procedure foo(x: array[1..1000] of char)

You get pass-by-value semantics: when you pass an array to foo, the 
entire array is duplicated. Changes to x are not visible in the caller's 
array. C arrays do not behave like this with an equivalent declaration.

Change the declaration to be VAR x, and using pass-by-reference 
semantics, and the array is *not* duplicated, and changes to x *are* 
visible to the caller. C's default handling of arrays is just like 
Pascal's call-by-reference semantics, not like pass-by-value. This is 
AFAIK unique in C to arrays.

[...]
>> In other words... C is call-by-value, and (according to you) Python is
>> call-by-value, but they behaviour differently.
> 
> And this is entirely due to the difference in their immediate
> representations of values.

Values are values. Regardless of whether you are using C or Pascal or 
Python, the value of 1+1 is 2, not some arbitrary memory location. I'm 
going to quote from Fredrik Lundh:

"I'm not aware of any language where a reference to an object, rather 
than the *contents* of the object, is seen as the object's actual value. 
It's definitely not true for Python, at least."

http://coding.derkeiler.com/Archive/Python/comp.lang.python/2008-11/
msg01341.html

or http://snipurl.com/9qdze

The viewpoint that values are references is bizarre and counter-
intuitive, and it leaves us with no simple way of talking about the value 
of expressions in the sense that 2 is the value of the expression 1+1.


-- 
Steven
--
http://mail.python.org/mailman/listinfo/python-list

Re: why cannot assign to function call

Reply via email to