Of course, when I say "your" code, it's the code you have shown, and probably "our" (VMMaker) code ;)
Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier < [email protected]> a écrit : > Hi Pablo (again), > no, not a bug. > > The problem is in the source code. The compiler has the right to presume > that your code is exempt of UB, because you cannot depend on UB (obviously). > So it can eliminate all code which corresponds to UB. > > The compiler has the right to assume that a pointer to an int cannot point > to a long (UB). > So modifying a long cannot have any sort of impact on the content of the > int pointer. > So the compiler can decouple both path return int content and assign long. > But assigning the long has no effect, so the code can be suppressed > altogether. > > Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a > écrit : > >> Hi Aliaksei, >> to me it looks like a bug of GCC optimization. Basically, it is >> assuming that the x variable is used but never read or its value is >> never used. Also it assumes the same of the i variable, as we are only >> accessing indirectly to the memory where it locates (the code is even >> assuming that the variable exists, but it can be optimize out as in >> this scenario). Even though, the original C code is valid C code, we >> are not helping the compiler writing code like that. So I have >> rewritten the code in a way that does not use indirect memory access >> to the stack space. >> >> One thing more that makes me think is a bug, if you use an int >> constant as the index and not a parameter, the error does not occur >> (the code is not badly optimized) and there is a warning about the >> not-so-great access to the stack. >> >> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]> >> wrote: >> > >> > Hi Pablo, >> > >> > Wow! Thank you for the detective story :) >> > >> > Do I understand correctly that the original code causes undefined >> behavior and therefore can be changed (or even removed) by the compiler? >> > (because it returns something that is referencing memory on the stack) >> > >> > Please keep posting similar things in future! It is very educative :) >> > >> > Cheers, >> > Alex >> > >> > >> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]> >> wrote: >> >> >> >> Hi, >> >> this mail is related to Pharo because it is knowledge I found >> >> debugging the build of the VM, but the rest is to document it and >> >> perhaps someone will found it interesting (also I couldn't find it >> >> easily using Google). Sorry for the long mail! >> >> >> >> The problem >> >> ========== >> >> >> >> The following code does not produce good code in 8.3 when using >> optimizations: >> >> >> >> long __attribute__ ((noinline)) myFunc(long i, int index){ >> >> long v; >> >> long x = i >> 3; >> >> >> >> v = x; >> >> return ((int*)(&v))[index]; >> >> } >> >> >> >> #include <stdio.h> >> >> >> >> int main(){ >> >> >> >> long i; >> >> int x; >> >> >> >> scanf("%ld", &i); >> >> scanf("%d", &x); >> >> >> >> printf("%ld",myFunc(i,x)); >> >> } >> >> >> >> Basically, with -02, it generates the following code: >> >> >> >> myFunc: >> >> movslq %esi, %rsi >> >> movslq -8(%rsp,%rsi,4), %rax >> >> ret >> >> >> >> And with -01 it generates the following code: >> >> >> >> myFunc: >> >> sarq $3, %rdi >> >> movq %rdi, -8(%rsp) >> >> movslq %esi, %rsi >> >> movslq -8(%rsp,%rsi,4), %rax >> >> ret >> >> >> >> As you can see, the more optimized version is losing the bit shift (or >> >> any other operation). >> >> To detect the problem it was not easy, basically, I have used the >> >> Pharo Tests to detect the failing function and then to understand the >> >> generation of code in GCC, we need to debug its generation. >> >> >> >> The first snippet is generated using: >> >> >> >> gcc -O2 prb.c -S -o prb.s -Wall >> >> >> >> and the second using: >> >> >> >> gcc -O1 prb.c -S -o prb.s -Wall >> >> >> >> The GCC pipeline has different steps, I have centered my self in the >> >> tree optimizations. >> >> GCC dumps each of the intermediate states of the compilation, working >> >> with C like trees. For example to get the optimized version of the >> >> program, before generating the Assembler we have to do: >> >> >> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout >> >> >> >> This will generate: >> >> >> >> __attribute__((noinline)) >> >> myFunc (long int i, int index) >> >> { >> >> long int v; >> >> long unsigned int _1; >> >> long unsigned int _2; >> >> int * _3; >> >> int _4; >> >> long int _8; >> >> >> >> <bb 2> [local count: 1073741825]: >> >> _1 = (long unsigned int) index_7(D); >> >> _2 = _1 * 4; >> >> _3 = &v + _2; >> >> _4 = *_3; >> >> _8 = (long int) _4; >> >> v ={v} {CLOBBER}; >> >> return _8; >> >> >> >> } >> >> >> >> This is the optimized SSA (static single assign) version of the >> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html) >> >> As you can see in this version the code is already optimized, and our >> >> operations not correctly generated. We expect to see something like: >> >> >> >> __attribute__((noinline)) >> >> myFunc (long int i, int index) >> >> { >> >> long int x; >> >> long int v; >> >> long unsigned int _1; >> >> long unsigned int _2; >> >> int * _3; >> >> int _4; >> >> long int _10; >> >> >> >> <bb 2> [local count: 1073741825]: >> >> x_6 = i_5(D) >> 3; >> >> v = x_6; >> >> _1 = (long unsigned int) index_9(D); >> >> _2 = _1 * 4; >> >> _3 = &v + _2; >> >> _4 = *_3; >> >> _10 = (long int) _4; >> >> v ={v} {CLOBBER}; >> >> return _10; >> >> >> >> } >> >> >> >> In the first SSA version, we are lacking the shift operation: >> >> >> >> x_6 = i_5(D) >> 3; >> >> v = x_6; >> >> >> >> So, we need to see in which of the optimizations and transformations >> >> our code is lost: >> >> To see all the steps we should execute: >> >> >> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all >> >> >> >> This will generate a lot, really a lot, of files in the same directory. >> >> They have the name: prb.c.xxx.yyyy >> >> Where xxx is the number of the step, and yyyy is the name of the step. >> >> Each of the files contains the result of applying the changes, so what >> >> I have done is looking in binary search where my code was lost. >> >> >> >> I have found that the problem was in the first dead code elimination >> >> step (prb.c.041t.cddce1) >> >> GCC does not like the crap code that we are writing, as we are copying >> >> a stack variable and then accessing directly to the memory: >> >> >> >> v = x; >> >> return ((int*)(&v))[index]; >> >> >> >> So, basically, it was assuming that we were only using v and not x, so >> >> all the code modifying x is described (this optimization is called >> >> "dead store code deletion"). Basically tries to remove all the code >> >> that is affecting stack (temporary) variables that are never used. >> >> >> >> I am not sure if this is a bug in GCC, so I will report it. But I have >> >> fixed the problem writing the code in a proper way. >> >> >> >> Sorry again for the long mail, I wanted to store the knowledge and >> >> propagate it before I eventually will flush it. I like these things, >> >> but I am not debugging the GCC optimization pipeline all the days. >> >> >> >> -- >> >> Pablo Tesone. >> >> [email protected] >> >> >> >> >> -- >> Pablo Tesone. >> [email protected] >> >>
