On Wed, 18 Dec 2013 19:51:26 +1100, Chris Angelico wrote: > On Wed, Dec 18, 2013 at 7:18 PM, Steven D'Aprano <st...@pearwood.info> > wrote: >> You want to know why programs written in C are so often full of >> security holes? One reason is "undefined behaviour". The C language >> doesn't give a damn about writing *correct* code, it only cares about >> writing *efficient* code. Consequently, one little error, and does the >> compiler tell you that you've done something undefined? No, it turns >> your validator into a no-op -- silently: > > I disagree about undefined behaviour causing a large proportion of > security holes.
I didn't actually specify "large proportion", that's your words. But since you mention crashes: > Maybe it produces some, but it's more likely to produce > crashes or inoperative codde. *Every* crash is a potential security hole. Not only is a denial of service, but a fatal exception[1] is a sign that arbitrary memory has been executed as if it were code, or an illegal instruction executed. Every such crash is a potential opportunity for an attacker to run arbitrary code. There are only two sorts of bugs: bugs with exploits, and bugs that haven't been exploited *yet*. I think you are severely under-estimating the rule of undefined behaviour in C on security vulnerabilities. I quote from "Silent Elimination of Bounds Checks": "Most of the security vulnerabilities described in my book, Secure Coding in C and C++, Second Edition, are the result of exploiting undefined behavior in code." http://www.informit.com/articles/article.aspx?p=2086870 Undefined behaviour interferes with the ability of the programmer to understand causality with respect to his source code. That makes bugs of all sorts more likely, including buffer overflows. Earlier this year, four researchers at MIT analysed how undefined behaviour is effecting software, and they found that C compilers are becoming increasingly aggressive at optimizing such code, resulting in more bugs and vulnerabilities. They found 32 previously unknown bugs in the Linux kernel, 9 in Postgres and 5 in Python. http://www.itworld.com/security/380406/how-your-compiler-may-be-compromising-application-security I believe that the sheer number of buffer overflows in C is more due to the language semantics than the (lack of) skill of the programmers. C the language pushes responsibility for safety onto the developer. Even expert C programmers cannot always tell what their own code will do. Why else do you think there are so many applications for checking C code for buffer overflows, memory leaks, buggy code, and so forth? Because even expert C programmers cannot detect these things without help, and they don't get that help from the language or the compiler. [...] > Apart from the last one (file system atomicity, not a C issue at all), > every single issue on that page comes back to one thing: fixed-size > buffers and functions that treat a char pointer as if it were a string. > In fact, that one fundamental issue - the buffer overrun - comes up > directly when I search Google for 'most common security holes in c code' I think that you have missed the point that buffer overflows are often a direct consequence of the language. For example: http://www.kb.cert.org/vuls/id/162289 Quote: "Some C compilers optimize away pointer arithmetic overflow tests that depend on undefined behavior without providing a diagnostic (a warning). Applications containing these tests may be vulnerable to buffer overflows if compiled with these compilers." The truly frightening thing about this is that even if the programmer tries to write safe code that checks the buffer length, the C compiler is *allowed to silently optimize that check away*. > Python is actually *worse* than C in this respect. You've got to be joking. > I know this > particular one is reasonably well known now, but how likely is it that > you'll still see code like this: > > def create_file(): > f = open(".....", "w") > f.write(".......") > f.write(".......") > f.write(".......") > > Looks fine, is nice and simple, does exactly what it should. And in > (current versions of) CPython, this will close the file before the > function returns, so it'd be perfectly safe to then immediately read > from that file. But that's undefined behaviour. No it isn't. I got chastised for (allegedly) conflating undefined and implementation-specific behaviour. In this case, whether the file is closed or not is clearly implementation-specific behaviour, not undefined. An implementation is permitted to delay closing the file. It's not permitted to erase your hard drive. Python doesn't have an ISO standard like C, so where the documentation doesn't define the semantics of something, CPython behaves as the reference implementation. CPython allows you to simultaneously open the same file for reading and writing, in which case subsequent reads and writes will deterministically depend on the precise timing of when writes are written to disk. That's not something which the language can control, given the expected semantics of file I/O. The behaviour is defined, but it's defined in such a way that what you'll get is deterministic but unpredictable -- a bit like dict order, or pseudo-random numbers. A Python implementation is not permitted to optimize away subsequent reads, erase your hard drive, or download a copy of Wikipedia from the Internet. A C compiler is permitted to do any of these. (Of course, no competent C compiler would actually download all of Wikipedia, since that would be slow. Instead, they would probably only download the HTTP headers for the main page.) [1] I'm talking low level exceptions or errors, not Python exceptions. -- Steven -- https://mail.python.org/mailman/listinfo/python-list