On Thu, 06 Nov 2008 21:31:16 -0700, Joe Strout wrote: >> You're wrong, Python variables don't contain *anything*. Python >> variables are names in a namespace. > > I think we're saying the same thing. What's a name? It's a string of > characters used to refer to something. That which refers to something > is a reference.
In some sense, I have to agree with that. "Reference" as a plain English word is very abstract. > The thing it refers to is a referent. I was trying to > avoid saying "the value of an object reference is a reference to an > object" since that seems tautological and you don't like my use of the > word "value," but I see you don't like "contains" either. I'm happier with the idea that a name in Python "refers to" an object than your claim that "variables" contain a reference to an object. Let me explain: In languages such as C and Pascal, a variable is a named memory location with an implied size. For the sake of the argument, let's assume variables are all two bytes in size, i.e. they can hold a single short integer. So, if the name 'x' refers to location 0x23A782, and the two bytes at that location are 0x0001, then we can legitimately say that the location 0x23A782 (otherwise known as 'x') _contains_ 1 because the byte pattern representing 1 is at that memory location. But in Python, what you've been calling "variables" is explicitly a *mapping* between a name and a value. Unlike variables above, the compiler can't map a name to a memory location. At run time, the VM has to search a namespace for the name. If you disassemble Python byte-code, you will see things like: LOAD_NAME 1 (x) If you want to talk about something containing the value, that something would be the namespace, not the name: the name is the key in the hash table, and is a separate piece of data to the value. The key and the value are at different locations, you can't meaningfully say that the value is contained by the key, for the same reason that given a list [10, 11, 12, 13, 14, 15] you wouldn't say that the int 12 was contained by the number 2. > Maybe we can try something even wordier: a variable in Python is, by > some means we're not specifying, associated with an object. (This is > what I mean when I say it "refers" to the object.) Can we agree that > far? So far. > Now, when you pass a variable into a method, the formal parameter gets > associated with same object the actual parameter was associated with. Agreed. > I like to say that the object reference gets copied into the formal > parameter, since that's a nice, simple, clear, and standard way of > describing it. I think you object to this way of saying it. But are we > at least in agreement that this is what happens? At the implementation level of CPython, yes. In abstract, no. In abstract, we can't make *any* claims about what happens beyond the Python code, because we don't know how the VM is implemented. Perhaps it is a giant pattern in Conway's cellular automata "Life", which is Turing Complete. In practice, any reasonable implementation of Python on existing computer hardware is going to more-or-less do what the CPython implementation does. But just because every existing implementation does something doesn't mean it's not an implementation detail. >> But putting that aside, consider the Python code "x = 1". Which >> statement would you agree with? >> >> (A) The value of x is 1. > > Only speaking loosely (which we can get away with because numbers are > immutable types, as pointed out in the last section of [1]). Why "speaking loosely"? At the level of Python code, the object you have access to is nothing more or less than 1. There is a concrete representation of the abstract Platonic number ONE, and that concrete representation is written as 1. The fact that the object 1 is immutable rather than mutable is irrelevant. After x = [] the value of x is the empty list. As far as I am concerned, this is one place where the plain English definition of the word "value" is the only meaningful definition: "what is denoted by a symbol". Or if you prefer, "what the symbol represents". At the language level, x=1 means that x represents the object 1, nothing more and nothing less, regardless of how the mechanics of that representation are implemented. The value of a variable is whatever thing you assign to that variable. If that thing is the int 1, then the value is the int 1. If the thing is a list, the value is that list. If the thing is a pointer, the value is a pointer. (Python doesn't give you access to pointers, but other languages do.) Whatever mechanism is used to implement that occurs at a deeper level. In Pascal and C, bytes are copied into memory locations (which is actually implemented by flipping bits at one location to match the state of bits at another location). In CPython and Java, pointers or references are created and pointed at complex data structures called objects. That's an implementation detail, just like flipping bits is an implementation detail. If you don't agree with me on this, I'm afraid that your understanding of value is so different from mine that I fear we will never find any common ground. I'm afraid that in this context I consider any other definition of "value" to be obtuse and obfuscatory and out-and-out harmful. >> (B) The value of x is an implementation-specific thing which is >> determined at runtime. At the level of the Python virtual machine, the >> value of x is arbitrary and can't be determined. > > Hmm, this might be true to somebody working at the implementation level, Okay, we agree on that. > but I think we're all agreed that that's not the level of this > discussion. What's relevant here is how the language actually behaves, > as observable by tests written in that language. As far as I can see, that implementation level _is_ the level you are talking at. You keep arguing that the value of x is a reference to the object, a reference which is implementation specific and determined at runtime. >> If you answer (A), then your claim that Python is call-by-value is >> false. > > Correct. > >> If you answer (B), then your claim that Python is call-by-value is true >> but pointless, obtuse and obfuscatory. > > Correct again. Well. I'm not sure what else I can say to that other than, "Why on earth would you prefer a pointless, obtuse and obfuscatory claim over one which is equally true but far more useful and simple?" > My answer is: > > (C) The value of x is a reference to an immutable object with the value > of 1. (That's too wordy for casual conversation so we might casually > reduce this to (A), as long as we all understand that (A) is not > actually true. It's a harmless fiction as long as the object is > immutable; it becomes important when we're dealing with mutable > objects.) But it doesn't matter. And that's important. I've seen this before, in other people. They get hung up about the difference between mutable and immutable objects and start assuming a difference in Python's behaviour that simply isn't there. When assigning to a name, Python makes no distinction between mutable and immutable objects: >>> dis.dis( compile('x=set([1]); y=frozenset([1])', '', 'exec') ) 1 0 LOAD_NAME 0 (set) 3 LOAD_CONST 0 (1) 6 BUILD_LIST 1 9 CALL_FUNCTION 1 12 STORE_NAME 1 (x) 15 LOAD_NAME 2 (frozenset) 18 LOAD_CONST 0 (1) 21 BUILD_LIST 1 24 CALL_FUNCTION 1 27 STORE_NAME 3 (y) 30 LOAD_CONST 1 (None) 33 RETURN_VALUE >>> This is explicitly stated in >>> the Python docs [1], yet many here seem to want to deny it. >> >>> [1] http://www.python.org/doc/2.5.2/ext/refcounts.html >> >> You have a mysterious and strange meaning of the word "explicitly". >> Would you care to quote what you imagine is this explicit claim? > > A few samples: "The chosen method is called reference counting. The > principle is simple: every object contains a counter, which is > incremented when a reference to the object is stored somewhere, and > which is decremented when a reference to it is deleted. When the counter > reaches zero, the last reference to the object has been deleted and the > object is freed. ...Python uses the traditional reference counting > implementation..." Implementation details again. It says nothing about what is visible at the level of Python code. > This seems like a point we really shouldn't need to argue. Do you > really want to defend the claim that Python does not use references? Python does not use references. Python uses names and objects. The CPython implementation implements such names and objects using references (pointers). Other implementations are free to make other choices at the implementation level. At the Python level, the programmer has access to objects: (1, 3, [], None) is a tuple (an object) consisting of four objects 1, 3, an empty list and None. There's no capacity to request a reference to an object: if there was, it would be like Pascal's var parameters. At the implementation level, the above tuple is implemented (in part) by four pointers. But that's invisible at the Python level. It doesn't exist at the Python level: you can't access those pointers, you can't do anything with them, except indirectly by manipulating names and objects. Here's an analogy: at the Python level we say that strings are immutable: they can't be changed. But at the implementation level that's clearly nonsense: strings are merely bytes no different from any other bytes, and they are as mutable as any others. But from Python code, the programmer has no way to mutate a string. Such behaviour isn't part of Python. We can rightly say that Python has no mutable strings, even though at the implementation level strings are mutable. >> Yes, you are right, Python does not offer pass by reference. The >> canonical test for "call by reference" behaviour is to write a function >> that does this: >> >> x = 1 >> y = 2 >> swap(x, y) >> assert x == 2 and y == 1 >> >> If you can write such a function, your language may be call-by- >> reference. >> If you can't, it definitely isn't c-b-r. You can't write such a >> function >> in standard Python, so Python isn't c-b-r. > > Whew! That's a relief. A week ago (or more?), it certainly sounded > like some here were claiming that Python is c-b-r (usually followed by > some extended hemming and hawing and except-for-ing to explain why you > couldn't do the above). Yes. Such confusion is very common, because people discover that Python isn't call-by-value since mutations to arguments in a function are visible outside of the function, and assume that therefore Python must be call-by-reference. >> The canonical test for "call by value" semantics is if you can write a >> function like this: >> >> x = [1] # an object that supports mutation >> mutate(x) >> assert x == [1] >> >> If mutations to an argument in a function are *not* reflected in the >> caller's scope, then your language may be call-by-value. But if >> mutations >> are visible to the caller, then your language is definitely not c-b-v. > > Aha. So, in your view, neither C, nor C++, nor Java, nor VB.NET are c- > b-v, since all of those support passing an object reference into a > function, and using that reference to mutate the object. Take C out of that list. C is explicitly call-by-value, since it always does copying of values. If you want to avoid copying the value, you have to write your function to accept a pointer to the value you care about and then dereference the pointer inside the function. As for C++, Java and VB.NET, I would argue that using the term call-by- value for them is misleading. I'm not the only such person who believes so: "As in Java, the calling semantics are call-by-sharing: the formal argument variable and the actual argument share the same value, at least until the argument variable is assigned to. Assignments to the argument variable do not affect the value passed; however, if the value passed was an array, assignments to elements of that array will be visible from the calling context as well, since it shares the same array object." http://www.cs.cornell.edu/courses/cs412/2001sp/iota/iota.html Trust me, I didn't write that. > Your view is at odds with the standard definition, though; in fact I'm > pretty sure we could dig up C and Java specs that explicitly spell out > their c-b-v semantics, and RB and VB.NET pretty clearly mean "ByVal" to > indicate by-value in those languages. I accept that C (like Pascal) is c-b-v. Given the following C code: int count; int x[1000]; for( count = 0; count < 1000; count++ ) x[count] = count; the value of the variable 'x' is an array of ints 0,1,2,...999. When you call a function with argument x, the entire array is copied. Call-by- value: the variable's value is copied into the function's scope. For Java to be considered c-b-v, we have to agree that the value of x following that assignment is not the array of ints that the source code suggests it is, but some arbitrary pointer to that array. If we agree on that definition, we can agree that Java is c-b-v, but I maintain it is a foolish definition. I can't imagine what reason people had for tossing out the commonsense meaning of the word "value". What benefit does it give? For those whose first language was Pascal or Fortran or C and had only heard of two calling conventions it avoids the need to learn the name for a third convention, but the cost is that c-b-v no longer has a single meaning. It now has at least two meanings: C call-by-value is different from Java call-by-value, because when you call a C function with an array the entire array is copied, and when you call a Java function with an array the entire array is *not* copied. Different program behaviour with the same name. That's bad enough when it happens with an element of language syntax but it is unforgivable when it happens to something which is supposed to be generic to all languages. Not only do you lose the regular dictionary meaning of the word value, but you introduce a second meaning to call-by-value. When you say "Language Foo is call-by-value", you have no way of telling what listeners will understand by that. If they come from a Pascal or C background, they will understand one thing ("values are copied"), and if they come from a Java or VB.NET background they will understand something very different ("sometimes values are copied, and sometimes pointers to the value are copied"). > The canonical test of c-b-v is whether a *reassignment* of the formal > parameter is visible to the caller. That's just the same test for c-b-r: it's a variation of swap(x, y) but using only one variable. That's equivalent to assuming that c-b-r and c-b- v are a dichotomy: if a language isn't one, it must be the other. Wrong, false, harmful! > Simply using the parameter for > something (such as dereferencing it to find and change data that lives > on the heap) doesn't prove anything at all about how the parameter was > passed. But it does: if a mutation to the argument is visible to the caller, then you know you haven't mutated a copy. If no copy was made, then it isn't call-by-value. >> Python is neither call-by-reference nor call-by-value. > > That can be true only if at least one of the following is true: > > 1. Python's semantics are different from C/C++ (restricted to pointers), > Java, and RB/VB.NET; or Why would you restrict C/C++ to pointers? Are you now going to argue that these languages have different calling conventions depending on the type of the argument? In C, the calling convention is precisely the same whether the argument is a pointer or an int. In both cases, the value is copied. Python's semantics are different from C since values are not copied when you pass them to a function, but (as fair as I know) the same as that of the others. > 2. C/C++ (restricted to pointers), Java, and RB/VB.NET are not call-by- > value. Take C out of that list and I will agree. > I asked you before which of these you believed to be the case, so we > could focus on that, but I must have missed your reply. Can you please > clarify? > > From your above c-b-v test, I guess you would argue point 2, that > none of those languages are c-b-v. If so, then we can proceed to > examine that in more detail. Is that right? Sure. >>> That would indeed be nonsense. But it's also not what I'm saying. See >>> [2] again for a detailed discussion and examples. Call-by-value and >>> call-by-reference are quite distinct. >> >> And also a false dichotomy. > > I've never claimed these are the only options; just that they're the > only ones actually used in any of the languages under discussion. If > you think Python uses call by name, call by need, call by macro > expansion, No, none of those. > or something else at [2], please do say which one. "Call by > object", as far as I can tell, is just a made-up term for call-by- value > when the value is an object reference. And you think that "call-by-reference" or "call-by-value" is anything other than "made-up"? What, you think that these terms have always existed, as far back as language? They're made-up terms too. I would say that the value is the object, full stop. The reference is an implementation detail. As we've repeatedly said, "call-by-sharing" has a good pedigree: it goes back at least to Barbara Liskov and CLU in 1974, in the dawn of object- oriented programming. http://en.wikipedia.org/wiki/Barbara_Liskov The real question isn't why Python should use the term "call-by-sharing", but why the Java and VB people didn't use it. > (And I'm reasonably OK with that > as long as we're all agreed that that is what it means.) > >>>> "Calling by value" is not a useful definition of Pythons behaviour. >>> >>> It really is, though. You have to know how the formal parameter >>> relates >>> to the actual parameter. Is it a copy of it, or an alias of it? False dichotomy again. The correct answer is, "Neither. It _is_ the actual parameter." We can prove this by using Python's scoping rules: >>> def func(y): ... print x is y ... >>> x = ["something arbitrary"] >>> func(x) True In case you think this is an artifact of mutable objects: >>> x = "something immutable" >>> func(x) True The value of x and the value of y are the same object. But the names are different. (They would be different even if the function parameter was called 'x', because it is in a different namespace.) Nothing we do to the name y can affect the name x. But things that we do to the object bound to y *can* affect the object bound to x, because they are the same object. >> And by definition, "call by value" means that the parameter is a copy. >> So >> if you pass a ten megabyte data structure to a function using call-by- >> value semantics, the entire ten megabyte structure is copied. > > Right. And if (as is more sensible) you pass a reference to a ten MB > data structure to a function using call-by-value, then the reference is > copied. Sure. I agree. But in this case, the reference is the value. In Pascal, you would do something like this: x := enormous_array(); {returns a 10MB array} y := func(^x); if you couldn't or didn't write func to use a var parameter. >> Since this does not happen in Python, Python is not a call-by-value >> language. End of story. > > So your claim is that any language that includes references (which is > all OOP languages, as far as I'm aware), is not call-by-value? No, I'm not making that claim. Calling conventions and the existence of references in the implementation of an OO language are orthogonal: one does not imply anything about the other. Consider a hypothetical OO language where variables are memory locations. x=Foo would set the value of the variable x to some object Foo. As an implementation detail, this might be implemented just as you say: the variable (memory location) x contains a reference to the object Foo, just like Java, except without primitive types. Now we call a function func(x). What happens next? Well, knowing that the language is OO doesn't tell us *anything* about what happens next. We can probably make an educated guess that, if the language is running on a von Neumann machine (a safe bet in the real world!), there's probably a stack involved, and the implementation will probably work by pushing references on the stack. What else can we predict? Nothing. Here are a couple of alternative calling behaviours: - The implementation makes a copy of the object Foo, creates a new reference to it, and assigns that reference to the formal parameter of the function. I would call that "call-by-value". What would you call it? - The implementation makes a copy of the reference to object Foo, and assigns it to the formal parameter of the function. Following Barbara Liskov, I would call that "call-by-sharing". Following the effbot, I would also accept "call-by-object". - The implementation makes a reference to the location of x, rather than a reference to the object, and passes that to the function. Reassignments to the formal parameter are reflected in the caller's scope. That would be call-by-reference. Any of these would be reasonable choices for a designer to make, although the first one has all the disadvantages of call-by-value in languages like C and Pascal. Nevertheless, if you want that behaviour in your OOP language, you can have it. >>> Without knowing that, you don't know what assignments to the formal >>> parameter will do, or even what sort of arguments are valid. Answer: >>> it's a copy of it. >> >> Lies, all lies. Python doesn't copy variables unless you explicitly ask >> for a copy. > > Hmm, I'm struggling to understand why you would say this. Perhaps you > mean that Python doesn't copy *objects* unless you explicitly ask for a > copy. That's certainly true. We can agree on this. > But it does copy references in many circumstances, including in > assignment statements, and parameter passing. At the implementation level, not at the Python level. >> That some implementations of Python choose to copy pointers rather than >> move around arbitrarily large blocks of memory instead is an >> implementation detail. It's an optimization and irrelevant to the >> semantics of argument passing in Python. > > I agree that under the hood, there are probably other ways to get the > same behavior. What's important is to know whether the formal parameter > is an alias of the actual parameter, or its own independent local > variable that (let me try to say it more like your way here) happens to > be initially associated with the same referent as the actual parameter. > This obviously has behavioral consequences, as you showed above. > > A concise way to describe the behavior of Python and other languages is > to simply say: the object reference is copied into the formal parameter. Hence the alias for "call-by-sharing", "call-by-object-reference". Personally I find that term too long and unwieldy and prefer call-by either sharing or object. >>> Assignments don't affect the actual parameter at all. This is exactly >>> what "call by value" means. >> >> Nonsense. I don't know where you get your definitions from, but it >> isn't >> a definition anyone coming from a background in C, Pascal or Fortran >> would agree with. > > Well I can trivially refute that by counterexample: I come from a > background in C, Pascal, and FORTRAN, and I agree with it. Well there you go. Serves me right for making a sweeping generalization. > As for where I get my definitions from, I draw from several sources: > > 1. Dead-tree textbooks > 2. Wikipedia [2] (and yes, I know that has to be taken with a grain of > salt, but it's so darned convenient) > 3. My wife, who is a computer science professor and does compiler > research > 4. http://javadude.com/articles/passbyvalue.htm (a brief but excellent > article) > 5. Observations of the "ByVal" (default) mode in RB and VB.NET 6. My own > experience implementing the RB compiler (not that implementation details > matter, but it forced me to think very carefully about references and > parameter passing for a very long time) Which makes you so close to the trees that you can't see the forest. You're not think at the level of Python code. You're not even thinking at the level of the Python virtual machine. You're thinking about what happens to make the Python virtual machine work. > Not that I'm trying to argue from authority; I'm trying to argue from > logic. I suspect, though, that your last comment gets to the crux of > the matter, and reinforces my guess above: you don't think c-b-v means > what most people think it means. No, I make no claim about who is in a majority. What I argue is that the Java and VB communities have taken a term with an established meaning, and are using it for something which is at best pedantically true at a deeper implementation level. I think they are foolish to have done so, and their actions have led to imprecision in language: "call-by-value" is no longer a single strategy with a single observable behaviour, but now refers to at least two incompatible behaviours: Pascal/C style, and Java/ VB style. I don't accept that alternative meaning, because I insist that at the level of Python code, after x=1 the value of x is 1 and not an arbitrary, artificial, implementation-dependent reference to 1. -- Steven -- http://mail.python.org/mailman/listinfo/python-list