On Mon, 07 Oct 2013 17:16:35 -0700, Mark Janssen wrote: > It's like this: there *should* be one-to-one mappings between the > various high-level constructs to the machine code, varying only between > different chips (that is the purpose of the compiler after all), yet for > some operations, in languages like scheme, well... I cannot say what > happens... hence my challenge. > >> But even putting that aside, even if somebody wrote such a description, >> it would be reductionism gone mad. What possible light on the problem >> would be shined by a long, long list of machine code operations, even >> if written using assembly mnemonics? > > Only that you've got a consistent, stable (and therefore, formalizable) > translation from your language to the machine.
You are mistaken to think that there is a single, one-to-one, mapping between high-level code and machine code. In practice, only if you use the exact same source code, the exact same compiler, the exact same version of that compiler, with the exact same compiler options, and the exact same version of the language and all its libraries, then and only then will you will get the same machine code. And even that is not guaranteed. Take, for example, the single high-level operation: sort(alist) What machine code will be executed? Obviously that will depend on the sort algorithm used. There are *dozens*. Here are just a few: http://en.wikipedia.org/wiki/Category:Sorting_algorithms Now sorting is pretty high level, but the same principle applies to even simple operations like "multiply two numbers". There are often multiple algorithms for performing the operation, and even a single algorithm can often be implemented in slightly different ways. Expecting all compilers to generate the same machine code is simply naive. We can use Python to discuss this, since Python includes a compiler. It generates byte code, which just means machine code for a virtual machine instead of a hardware machine, but the principle is the same. Python even has a disassembler, for taking those raw byte codes and turning them into human-readable pseudo-assembly for the Python Virtual Machine. Here is some "machine code" corresponding to the high-level instructions: x = 23 y = 42 z = x + y del x, y py> code = compile('x = 23; y = 42; z = x + y; del x, y', '', 'exec') py> code.co_code 'd\x00\x00Z\x00\x00d\x01\x00Z\x01\x00e\x00\x00e\x01\x00\x17Z\x02\x00[\x00 \x00[\x01\x00d\x02\x00S' Translated into "assembly": py> from dis import dis py> dis(code) 1 0 LOAD_CONST 0 (23) 3 STORE_NAME 0 (x) 6 LOAD_CONST 1 (42) 9 STORE_NAME 1 (y) 12 LOAD_NAME 0 (x) 15 LOAD_NAME 1 (y) 18 BINARY_ADD 19 STORE_NAME 2 (z) 22 DELETE_NAME 0 (x) 25 DELETE_NAME 1 (y) 28 LOAD_CONST 2 (None) 31 RETURN_VALUE You should be able to see that there are all sorts of changes that the compiler could have made, which would have lead to different "machine code" but with the exact same behaviour. This particular compiler emits code for x=23; y=42 in the order that they were given, but that's not actually required in this case. The compiler might have reversed the order, generating something like: 0 LOAD_CONST 1 (42) 3 STORE_NAME 1 (y) 6 LOAD_CONST 0 (23) 9 STORE_NAME 0 (x) or even: 0 LOAD_CONST 1 (42) 3 LOAD_CONST 0 (23) 6 STORE_NAME 1 (y) 9 STORE_NAME 0 (x) without changing the behaviour of the code. Or it might have even optimized the entire thing to: 0 LOAD_CONST 0 (65) 3 STORE_NAME 0 (z) since x and y are temporary variables and a smart compiler could perform the computation at compile time and throw them away. (Nitpicks about "what if x and y already existed?" aside.) CPython even does this sort of optimization, although in a more limited fashion: py> dis(compile('x = 23 + 42', '', 'exec')) 1 0 LOAD_CONST 3 (65) 3 STORE_NAME 0 (x) 6 LOAD_CONST 2 (None) 9 RETURN_VALUE This is called keyhole optimization. It's not beyond possibility for a smarter compiler to look beyond the keyhole and make optimizations based on the whole program. So you can see that there is no one-to-one correspondence from high-level source code to low-level machine code, even for a single machine. The same is even more true for languages such as C, Fortran, Pascal, Lisp, Scheme, Haskell, Java, etc. where people have spent years or decades working on compiler technology. Compilers differ in the quality and efficiency of the machine code they emit and the choices they make about translating source code to machine code. > That's all. Everything > else is magic. Do you know that the Warren Abstraction Engine used to > power the predicate logic in Prolog into machien code for a VonNeumann > machine is so complex, no one has understood it That's nonsense. In your next post, you even supply a link to a book describing how the WAM works with detailed step-by-step instructions for writing your own. For those reading, here it is: www.cvc.uab.es/shared/teach/a25002/wambook.pdf Prolog is not some dark black magic that nobody understands. There are multiple implementations of Prolog-like logic languages for Python: http://pyke.sourceforge.net/ https://github.com/logpy/logpy https://github.com/enriquepablo/nl/ http://code.google.com/p/fuxi/ https://sites.google.com/site/pydatalog/ to say nothing of similar, more advanced languages like Mercury. Just because *you personally* don't understand something, don't think that nobody else does. -- Steven -- https://mail.python.org/mailman/listinfo/python-list