------- Comment #10 from rogerio at rilhas dot com 2010-08-11 01:57 ------- I'm replying now not in the context of the bug (since as I mentioned I must move on), but just as a conversation between 2 persons. So please don't getting me wrong for insisting.
The cdecl calling convention on x86-32 machines says that for any function call the arguments are placed on the stack as a series of 4-byte pushed values. I've checked that GCC respects this convention when calling a function and not optimized. So, for a typical GCC functiona call GCC pushes the parameters correctly according to the cdecl calling convention: func(a, b, c, d) ... the parameters will pushed right-to-left like: push d (4-byte) push c (4-byte) push b (4-byte) push a (4-byte) call func Obviously there are exceptions for doubles and structs, but this is not the case here as char* and int are pushed exactly the same way, as 4-byte values. This is well defined, right? The cdecl calling convention defines this clearly, correct? Or does GCC somehow uses another calling convention? Another thing well-defined in C is that when I request the address of a variable I get the address of it. So, inside func, when I request &a I get the stack address where "a" was pushed to. This is also well defined, right? Another thing well defined in C is what happens when navigating an array out of its bounds. Unlinke what you say, the behaviour is deterministic in absolutely every machine: if my PTR4 pointer to a 4-byte value contains the value X, then PTR4[1] accesses address memory X+4. The same way, if my PTR2 points to a 2-byte value at address Y, then PTR2[1] will access address Y+2. There is no hack here either, the behaviour is very well defined, and C doesn't care about or check the bounds of the array. So, if the calling convention states (in the example above) that "b" is placed 4 bytes after "a", and that "c" is placed 4 bytes after "b", and so on, it follows, without any doubt, that if I get a PTR4 to point to "a", and the address of "a" on the stack is X, then PTR4[0] accesses "a", PTR4[1] accesses "b", PTR4[2] accesses "c", and so on. It doesn't mater the size of the array, C will not check nor care about it, I can navigate the stack with it. That is what I do with format_address, since it is a char** and char** has size 4 then format_address is a PTR4. So, format_address[0] is "a" (or it should be if &format returned the true address of "a"), and *without any doubt or hack* format_address[1] is "b", and *with exactly the same confidence*, format_address[2] is "c", and so on. This is one of the most well established bases in C, for several decades now, so there is really no arguing about it. The problem is that GCC, when optimizing, places "a" somewhere on the stack which is not contiguous to "b". I think (although I'm not sure) that GCC does something like this: push d push c push b push a call format_direct format_direct: push some varied stuff push [a copy of the original "a" again somewhere] push [address of the above copy of "a"] call format_indirect format_indirect: (example) mov eax,[address of the copy of "a"+0] // not the original "a", but the copy mov ebx,[address of the copy of "a"+4] // contains some varied stuff, not "b" I'm not sure if GCC does exactly this, but I'm sure that, when optimizing, GCC does not place "a" adjacent to "b", and it should. Since I cannot get to "b" I'm not sure if the same happens to "c". This clearly violates the calling convention. If the calling convention were respected then "a", "b", "c", and "d" would all be adjacent, so I could navigate the 1-entry (4-byte) array using format_address[0], format_address[1], format_address[2], and format_address[3]. I don't see how you can refute one of the most well established properties of C, as the language allows me to populate any memory buffer with whichever stuff and then navigate with a 4-byte pointer (as long as properly aligned, which is the case here). I'm just doing that with the stack, wich should be (by the calling convention) a 4-byte aligned memory buffer with 4 adjacent 4-byte values "abcd". So, my point is: if - req1) GCC placed the parameters on the stack adjacent to one another (as it should as a result of the selected calling convention) *AND* - req2) if it gave me the address of the original format parameter on the stack (as it should by the definition of getting the address of a function parameter) then the code would work correctly, since - req3) is well established that C does not perform array boundary checking and, thus, PTR4[1] accesses memory location exactly 4 bytes after memory location PTR4[0]. Since I believe none of these 3 requirements is refutable, and since they are all well established, then - bug1) GCC is not putting the parameters adjacent on the stack as it should *OR* - bug2) GCC is not giving me the correct address when I ask for &format. The - bug3) of not doing pointer arithmetic correctly because the array has size 1 (your argument) is simply not true, I checked it while debugging, and I've seen it to be correct on the most varied machines (from PIC microcontrollers to PC's and mobile phones), otherwise it will not be C. The only thing that is allowed to vary (for performance reasons) is the is parameter width, as microcontrollers PIC16 have 1-byte parameters, PIC24 has 2-byte parameters, and x86-32 has 4 byte parameters). Another interpretation of your comment of "array of size 1" would be to think that a char* would take up more or less than 4 bytes, resulting in something like: option a: push32 d push32 c push32 b push16 a (less than 4 bytes, so access format_address[1] would be wrong. option b: push32 d push32 c push32 b push64 a (more than 4 bytes, so access format_address[1] would be wrong. For performance reasons it would be fairly obvious that none of the examples would apply, but I checked them anyway and that is not what happens: the format pointer is put on the stack as 4 bytes. So, to use your reasoning, is well established that if I have a buffer of 16 bytes, and if I declare a 1-entry (4-byte) int* PTR4 to point to the first byte of that buffer, it is absolutely well defined that PTR4[0] will access bytes 0 to 3 of the buffer, PTR4[1] will access bytes 4 to 7, PTR4[2] will access bytes 8 to 11, and PTR4[3] will access bytes 12 to 15. It has been established for several decades now that this is not undefined behaviour event if PTR4 is declared as a 1-entry (4-byte) pointer, and this type of code runs predictably on every platform in a very well-defined way. It is, in fact, the principle for serialization, which I have done numerous times for the last 20 years. I think it is clear why I think your arguments are missing the point, it is simply not true that this behaviour is undefined. -- rogerio at rilhas dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|INVALID | http://gcc.gnu.org/bugzilla/show_bug.cgi?id=45249