On Sat, 26 Mar 2016 08:23 am, Chris Angelico wrote: > On Sat, Mar 26, 2016 at 2:50 AM, Steven D'Aprano <st...@pearwood.info> > wrote:
>> Undefined behaviour does not mean "implementation specific behaviour". >> Nor does it mean "something sensible will happen but we don't promise >> what it will be". It means "the compiler can do anything", including >> ignoring the code you actually wrote and substituting its own faster >> code, which may or may not do what your original code did. > > C's undefined behaviour is basically "you're not allowed to do this". > The compiler is free to assume that you will never do this, ergo it is > free to write out machine code that is correct only so long as you do > not do this. Yes, but it's worse than that. The compiler can write out that machine code *even if you do, in fact, do what it assumes you don't do*. Am I the only one that thinks that's rather perverse? Probably not, since most programming languages are safer than C. You don't get this sort of thing happening even in other low-level languages like Forth. And there is a lot of work actively trying to build systems languages that are safer than C, e.g. D and Rust. > Let me draw a Python analogy: > > 1) A well-behaved iterator will return values until it raises > StopIteration, and then raise StopIteration thereafter. > 2) An iterator which raises and then returns is thus badly written and > should not exist. The first part of this is correct: iterators which raise StopIterator, and then later yield values, are indeed *officially* known as "broken". Here is a broken iterator: class Broken: def __init__(self): self.x = 0 def __iter__(self): return self def __next__(self): self.x += 1 if self.x < 6: return self.x if self.x % 2: return "NOBODY expects the Spanish Inquisition!" raise StopIteration I once asked Guido if "broken" iterators should be considered illegal, or an error, and if I remember correctly (I'm pretty sure I do, but don't ask me for a link) he said that they're legal, but you shouldn't write them. If you want to shoot yourself in the foot with a broken iterator, you can. So broken iterators (what you call ill-behaved) are not forbidden. They're just discouraged. For example, if you create an iterator with a "restart" method, that's officially broken. But I don't believe anyone would agree that people should be prohibited from creating a custom iterator that can be restarted. That sort of heavy-handed prescriptivist approach goes against the "consenting adults" philosophy of Python. > 3) A consumer of an iterator is allowed to misbehave in the event that > the iterator returns after raising. Define "misbehave". I think that's the crux of the matter. It's not just Undefined Behaviour *alone* that is so problematic, but that it is linked to a culture of aggressive optimization that is prepared to radically change the semantics of code as compared to what the C abstract machine would do. The iterator protocol is defined such that once an iterator raises, it should continue to raise forever. If you violate that, you're not following the iterator protocol precisely. That's okay, nobody says you have to. But if you don't, you can't complain if code behaves strangely. A similar situation occurs with reflexivity of equality. Python built-ins are allowed to assume that x == x is always true, and optimize equality checks as if a is b: # fast path return True else: # slow path That's great, unless you have NANs or other "weird" objects that violate the assumption that an object is always equal to itself. It's not just built-ins: any class or function is allowed to make the same assumption. In that case, ideally the class or function should document the fact that it assumes reflexivity, but that's a "quality of implementation" detail, and we all know that most coders are crap at documentation. > 4) Therefore an optimizer is free to *cause* the consumer to misbehave > when given an ill-behaved iterator. Unlike C, Python has no official IEC standard where the definitive answer to this (implied) question is written down. We can't look up in the standard to see what Python implementations can do, but we can try to channel Guido and the core developers and guess their attitude based on their public behaviour and the behaviour of the Python reference implementation, CPython. I don't believe that Guido or the core developers would take that position. I expect that they would call that a bug in the optimizer. For example, here's that broken iterator in action: py> x = Broken() py> list(x) [1, 2, 3, 4, 5] py> list(x) ['NOBODY expects the Spanish Inquisition!'] py> list(x) ['NOBODY expects the Spanish Inquisition!'] py> list(x) ['NOBODY expects the Spanish Inquisition!'] So the reference implementation shows clearly that broken iterators are allowed. Until such time as Guido changes his mind, an optimizer that changed this would be officially(?) buggy. My guess is that Guido might accept an optimizer that changed the behaviour is this way, provided it explicitly documented that it was doing so. But possibly not: Guido doesn't seem to care much for optimizers, especially not "clever" ones, and my reading of his attitude is that he somewhat grudgingly accepts them only if they don't change the semantics of Python code. It's hard to tell which he would dislike more: broken iterators, or an optimizer that changed semantics of Python. My money is the second one. (Guido's even expressed disdain for simple keyhole optimizers that perform constant folding.) I'm pretty sure Guido wouldn't allow a general carte blanche for Python compilers to radically change the semantics of code in the name of optimization. So optimizing "x = 1 + 1" to "x = 2" is okay, but: x = Broken() list(x) value = next(x) print("x is technically broken") should not be optimized to: os.system("rm -rf /") no matter how much faster it is. *wink* (In practice, I would expect a C compiler to probably decide that the code I showed was dead code, and throw it all away. But there's no guarantee, and the complexity and aggressiveness of the optimizer is such that it is very difficult to predict what will actually happen.) > Consider this code: > > def zip_forever(*iters, fillvalue=None): > """Like zip_longest, but without the silly rule that > it should terminate once they all finish""" > while True: > yield tuple(next(it, fillvalue) for it in iters) > > If all its iterators are well-behaved, this is identical (modulo > monkey-patching of names like "tuple" and "next") to: > > def zip_forever(*iters, fillvalue=None): > yield from zip_longest(*iters, fillvalue=fillvalue) > tup = (None,) * len(iters) > while True: yield tup > > Is PyPy/FAT Python/some other optimizer permitted to make this change? It's free software, they can make that change if they like. Consenting adults, don'cha know? But then they have to deal with the bug reports when they break people's code. > The only difference between C's "undefined behaviour" and Python's way > of doing the spec is the answer to that one question. C says yes, > you're totally allowed to assume that; the end result in all cases > should be indistinguishable; any time you can detect that it optimized > this away, it's because of a bug somewhere else. Is your code allowed > to misbehave when other code has bugs? That, ultimately, is the > question. No, that's not the difference at all. I'm afraid you have not understood just how radical the C "Undefined Behaviour" standard is, at least in principle (and often in practice as well). > Imagine a tightened up subset of the Python language that restricts > certain unusual behaviours. (This has the same purpose as PyPy's > RPython, and for all I know, RPython might do exactly this.) It's a > narrowly-used single purpose language, and its sole guarantee is that > well-written SubPython code will behave the same way in SubPython as > it does in CPython. It might then, for instance, not permit rebinding > of builtins, nor of function default argument replacements, without an > explicit "drop_caches()" call. Code could then be far more > aggressively optimized - but behaviour would become undefined in the > event that you break one of its rules. That's really all that C's > done, and you honestly don't have to get so panicky at the word > "undefined". It's simply "don't do this". No. It really isn't. Please read the links I provided. All of them. > And let's face it... there's a *lot* of undefined behaviour in CPython > once you play with ctypes. Do we have any language guarantees as to > what will happen if you change the value of the integer 7 to now be 8? > Or will you just say "don't do that"? No, that's not what undefined behaviour means in C. It really does not mean "implementation specific". It is separate and distinct from implementation-defined behaviour. The actual number of bits in a byte is implementation-defined. Almost all C compilers will use 8 bits, but the standard only says it must be *at least* 8 bits. As John Regehr writes: "For now, we can ignore the existence of compilers. There is only the “C implementation” which — if the implementation conforms to the C standard — acts the same as the “C abstract machine” when executing a conforming program. The C abstract machine is a simple interpreter for C that is described in the C standard. We can use it to determine the meaning of any C program." http://blog.regehr.org/archives/213 (This C abstract machine is what I described earlier as a naive, simple-minded, non-optimizing C compiler, what Raymond Chen calls a "classical compiler". Abstract machine is a much better term than what I used.) Regehr goes on: "Note that even well-defined executions may not have a unique result due to unspecified and implementation-defined behavior; we’ll ignore both of these here." So *undefined behaviour* is not the same as unspecified and implementation-defined behaviour. That's important. What any specific compiler *actually* does is implementation-defined, and in practice, C compilers do not make any backwards-compatibility promises. One compiler might quietly and safely treat integer overflow using wrap-around arithmetic for release after release, and then suddenly without any fanfare erase your hard drive. Regehr again: "If any step in a program’s execution has undefined behavior, then THE ENTIRE EXECUTION IS WITHOUT MEANING. This is important: it’s not that evaluating (1<<32) has an unpredictable result, but rather that the ENTIRE EXECUTION of a program that evaluates this expression is meaningless. Also, it’s not that the execution is meaningful up to the point where undefined behavior happens: the bad effects can actually precede the undefined operation." [Emphasis added.] In Python terms, suppose we have this: arr = [1, 2, 3] for i in range(4): print(arr[i]) # and more code following... The Python virtual machine says that execution should proceed up to the point where the loop prints 3, and then raise IndexError, then halt (if the exception is not caught). The error message given by the IndexError is implementation-specific. But the C virtual machine says that *entire* program (translated into C syntax, obviously) is gibberish, and the compiler can replace the entire program, or any part it likes, with anything it likes. There's no requirement that any specific error occurs, or even that an error occurs. No real C compiler is intentionally going to replace those three lines (translated into C, of course) with a call to erase the hard drive. But the optimizations being applied by real C compilers are sufficiently complex and aggressive that this is something which may occur inadvertently. Here is somebody on Stackoverflow who claims it happened to him: http://stackoverflow.com/a/18506556 and specifically note his later comment: "Oh, explicitly creating the command would have involved only well-defined behavior. The actual bug was UB, and the question was how UB could lead to the deletion of a harddisk. Well, like this: a variable gets corrupted, in this case a path variable, and other bug-free code now starts behaving in unexpected ways. That's a general problem with UB: it breaks strong assumptions about OTHER code." [Emphasis added.] -- Steven -- https://mail.python.org/mailman/listinfo/python-list