On Sat, 26 Mar 2016 01:59 pm, Paul Rubin wrote: > Steven D'Aprano <st...@pearwood.info> writes: >> Culturally, C compiler writers have a preference for using undefined >> behaviour to allow optimizations, even if it means changing the semantics >> of your code. > > If your code has UB then by definition it has no semantics to change. > Code with UB has no meaning.
Ah, a language lawyer, huh? :-P By the rules of the C standard, you're right, but those rules make use of a rather specialised definition of "no meaning" or "meaningless". I'm using the ordinary English sense. For example, would you consider that this isolated C code is "meaningless"? int i = n + 1; I'm not talking about type errors where n is not an int. We can assume that n is also an int. I bet that you know exactly what that line of code means. But according to the standard, it's "meaningless", since it might overflow, and signed int overflow is Undefined Behaviour. Even the C FAQ (as quoted by John Regehr) implies that code which is defined as "meaningless" may have meaning in the ordinary English sense: [quote] Anything at all can happen; the Standard imposes no requirements. The program may fail to compile, or it may execute incorrectly (either crashing or silently generating incorrect results), or it may fortuitously do EXACTLY WHAT THE PROGRAMMER INTENDED. [Emphasis added.] http://blog.regehr.org/archives/213 If the code is "meaningless", how can we say that it does what the programmer intended? In plain English, if the programmer had an intention for the code, and it was valid C syntax, it's not hard to conclude that the code has some meaning. Even if that meaning isn't quite what the programmer expected. Compilers are well known for only doing what you tell them to do, not what you want them to do. But in the case of C and C++ they don't even do what you tell them to do. When I talk about changing the semantics of the code you write, I'm using a plain English sense of "meaning". Start with a simple-minded, non- optimizing C compiler -- what Raymond Chen refers to as a "classical compiler". For example: int table[4]; bool exists_in_table(int v) { for (int i = 0; i <= 4; i++) { if (table[i] == v) return true; } return false; } There's an out-of-bounds error there, but as Chen puts it, a classical compiler would mindlessly generate code that reads past the end of the array. A bug, but a predictable one: you can reason about it, and the effect will be dependent on whatever arbitrary value happens to be in that memory location. A better compiler would generate an error and refuse to compile code for it. Either way, in plain English, the meaning is obvious: * Create an array of four ints, naming it "table". * Declare a function named "exists_in_table", which takes an int "v" as argument and returns a bool. * This function iterates over i = 0 to 4 inclusive, returning true if the i-th item of table equals the given v, and false if none of those items equals the given v. I don't believe for a second that you can't read that code well enough to infer the intended meaning of it. Even I can read C well enough to do that. Yet according to the C standard, that perfectly understandable code snippet is deemed to be gibberish, and instead of returning true or false, the compiler is permitted to erase your hard disk, or turn off your life- support, if it so chooses. And as Raymond Chen describes, a post-classical compiler will probably optimize that function to one which always returns true. As far as I know, there is no other language apart from C and C++ that takes such a cavalier approach. I cannot emphasis enough that the treatment of "undefined behaviour" is intentional by the C standards committee. Given the absolutely catastrophic effect it has had on the reliability, safety and security of code written in C, in my opinion the C standard borders on professional negligence. Programming in C becomes a battle to defeat the compiler and force it to do what you tell it to do, all because the C standard was written by a bunch of people whose number one priority was being able to make their benchmarks look good. Imagine a bridge builder who discovers a tiny, technical ambiguity or error in the blueprints for a bridge. On one page, the documentation states that there should be four rivets per metre in the supporting beams, but on another page, it is described as five rivets per metre. What should the builder do? - ask for clarification and get the blueprints and documentation corrected? - play it safe and use five rivets? - declare that therefore the entire blueprints are meaningless, and so he is free to optimize the bridge and reduce costs by using steel of a cheaper grade, half the thickness, and one rivet per metre? When the bridge collapses under the load of normal traffic, killing hundreds of people, what comfort should we take from the fact that the builder was able to optimize it so that it was half the weight, a quarter of the cost, and finished ahead of schedule, compared to a bridge that would have actually done the job it was designed for? -- Steven -- https://mail.python.org/mailman/listinfo/python-list