On Sat, 26 Jan 2008 14:47:50 +0100, Bjoern Schliessmann <[EMAIL PROTECTED]> wrote:
>[EMAIL PROTECTED] wrote: > >> Intel processors can only process machine language[...] There's no >> way for a processor to understand any higher level language, even >> assembler, since it is written with hexadecimal codes and basic >> instructions like MOV, JMP, etc. The assembler compiler can >> convert an assembler file to a binary executable, which the >> processor can understand. > >This may be true, but I think it's not bad to assume that machine >language and assembler are "almost the same" in this context, since >the translation between them is non-ambiguous (It's >just "recoding"; this is not the case with HLLs). I have no problem with your explanation. It's nearly impossible to program in machine code, which is all 1's and 0's. Assembler makes it infinitely easier by converting the machine 1's and 0's to their hexadecimal equivalent and assigning an opcode name to them, like PUSH, MOV, CALL, etc. Still, the older machine-programmable processors used switches to set the 1's and 0's. Or, the machine code was fed in on perforated cards or tapes that were read. The computer read the switches, cards or tapes, and set voltages according to what it scanned. the difference is that machine code can be read directly, whereas assembler has to be compiled in order to convert the opcodes to binary data. > >> Both Linux and Windows compile down to binary files, which are >> essentially 1's and 0's arranged in codes that are meaningful to >> the processor. > >(Not really -- object code files are composed of header data and >different segments, data and code, and only the code segments are >really meaningful to the processor.) I agree that the code segments, and the data, are all that's meaningful to the processor. There are a few others, like interrupts that affect the processor directly. I understand what you're saying but I'm refering to an executable file ready to be loaded into memory. It's stored on disk in a series of 1's and 0's. As you say, there are also control codes on disk to separate each byte along with CRC codes, timing codes, etc. However, that is all stripped off by the hard drive electronics. The actual file on disk is in a certain format that only the operating system understands. But once the code is read in, it goes into memory locations which hold individual arrays of bits. Each memory location holds a precise number of bits corresponding to the particular code it represents. For example, the ret instruction you mention below is represent by hex C3 (0xC3), which represents the bits 11000011. That's a machine code, since starting at 00000000 to 11111111, you have 256 different codes available. When those 1's and 0's are converted to volatges, the computer can analyze them and set circuits in action which will bring about the desired operation. Since Linux is written in C, it must convert down to machine code, just as Windows must. > >> Once a python py file is compiled into a pyc file, I can >> disassemble it into assembler. > >But you _do_ know that pyc files are Python byte code, and you could >only directly disassemble them to Python byte code directly? that's the part I did not understand, so thanks for pointing that out. What I disassembled did not make sense. I was looking for assembler code, but I do understand a little bit about how the interpreter reads them. For example, from os.py, here's part of the script: # Note: more names are added to __all__ later. __all__ = ["altsep", "curdir", "pardir", "sep", "pathsep", "linesep", "defpath", "name", "path", "devnull"] here's the disassembly from os.pyc: 00000C04 06 00 00 00 dd 6 00000C08 61 6C 74 73 65 70 74 db 'altsept' 00000C0F 06 00 00 00 dd 6 00000C13 63 75 72 64 69 72 74 db 'curdirt' 00000C1A 06 00 00 00 dd 6 00000C1E 70 61 72 64 69 72 74 db 'pardirt' 00000C25 03 00 00 00 dd 3 00000C29 73 65 70 db 'sep' 00000C2C 74 07 00 00 dd 774h 00000C30 00 db 0 00000C31 70 61 74 68 73 65 70 db 'pathsep' 00000C38 74 07 00 00 dd 774h 00000C3C 00 db 0 00000C3D 6C 69 6E 65 73 65 70 db 'linesep' 00000C44 74 07 00 00 dd 774h 00000C48 00 db 0 00000C49 64 65 66 70 61 74 68 db 'defpath' 00000C50 74 04 00 00 dd offset unk_474 00000C54 00 db 0 00000C55 6E 61 6D 65 db 'name' 00000C59 74 04 00 00 dd offset unk_474 00000C5D 00 db 0 00000C5E 70 61 74 68 db 'path' 00000C62 74 07 00 00 dd 774h 00000C66 00 db 0 00000C67 64 65 76 6E 75 6C 6C db 'devnull' you can see all the ASCII names in the disassembly like altsep, curdir, etc. I'm not clear as to why they are all terminated with 0x74 = t, or if that's my poor interpretation. Some ASCII strings don't use a 0 terminator. The point is that all the ASCII strings have numbers between them which mean something to the interpreter. Also, they are at a particular address. The interpreter has to know where to find them. The script is essentially gone. I'd like to know how to read the pyc files, but that's getting away from my point that there is a link between python scripts and assembler. At this point, I admit the code above is NOT assembler, but sooner or later it will be converted to machine code by the interpreter and the OS and that can be disassembled as assembler. I realize this is a complicated process and I can understand people thinking I'm full of beans. Python needs an OS like Windows or Linux to interface it to the processor. And all a processor can understand is machine code. > >> Assembler is nothing but codes, which are combinations of 1's and >> 0's. > >No, assembly language source is readable text like this (gcc): > >.LCFI4: > movl $0, %eax > popl %ecx > popl %ebp > leal -4(%ecx), %esp > ret > Yes, the source is readable like that, but the compiled binary is not. A disaasembly shows both the source and the opcodes. The ret statement above is a mneumonic for hex C3 in assembler. You have left out the opcodes. Here's another example of assembler which is disassembled from python.exe: 1D001250 FF 74 24 04 push [esp+arg_0] 1D001254 E8 D1 FF FF FF call 1D00122A 1D001259 F7 D8 neg eax 1D00125B 1B C0 sbb eax, eax 1D00125D F7 D8 neg eax 1D00125F 59 pop ecx 1D001260 48 dec eax 1D001261 C3 retn the first column is obviously the address in memory. The second column are opcodes, and the third column are mneumonics, English words attached to the codes to give them meaning. The second and third column mean the same thing. A single opcode instruction like 59 = pop ecx and 48 = dec eax, are self-explanatory. 59 is hexadecimal for binary 01011001, which is a binary code. When a processor receives that binary as voltages, it is wired to push the contents of the ecx register onto the stack. The second instruction, call 1D00122A is not as straight forward. it is made up of two parts: E8 = the opcode for CALL and the rest 'D1 FF FF FF' is the opcode operator, or the data which the call is referencing. In this case it's an address in memory that holds the next instruction being called. It is written backward, however, which is convention in certain assemblers. D1 FF FF FF actually means FF FF FF D1. This instruction uses F's to negate the instruction, telling the processor to jump back. The signed number FFFFFFD1 = -2E. A call counts from the end of it's opcode numbers which is 1D001258, and 1D001258 - 2E = 1D00122A, the address being called. As you can see, it's all done with binary codes. The English statements are purely for the convenience of the programmer. If you look at the Intel definitons for assembler instructions, it lists both the opcodes and the mneumonics. I would agree with what you said earlier, that there is a similarity between machine code and assembler. You can actually write in machine code, but it is often entered in hexadecimal, requiring a hex to binary interpreter. In tht case, the similarity to compiled assembler is quite close. >Machine language is binary codes, yes. > >> You can't read a pyc file in a hex editor, > if I knew what the intervening numbers meant I could. :-) >By definition, you can read every file in a hex editor ... > >> but you can read it in a disassembler. It doesn't make a lot of >> sense to me right now, but if I was trying to trace through it >> with a debugger, the debugger would disassemble it into >> assembler, not python. > >Not at all. Again: It's Python byte code. Try experimenting with >pdb. I will eventually...thanks for reply. -- http://mail.python.org/mailman/listinfo/python-list