On Wed, Dec 11, 2019 at 12:03 PM Nicolas Cellier < [email protected]> wrote:
> Yes, > But we have to replace natural crappy code (split a long in 2 ints) that > was once legal, by an even more crappy code (memcpy), so all in all, it's a > crappy art. > :-) Indeed. Personally I liked it when C was a portable assembler. That's what it's fit for and that's what it should be good at. Trying to pretend it's Pascal is, um, pretentious? > > Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a > écrit : > >> Hi Nicolas, >> thanks for the comment, you are right the problem is the bad >> original code. But my opinion is that it just is not reporting the >> situation correctly, generating a warning or non-optimizing the code >> looks more like a expected behavior. Because as I have said, using a >> constant as index in the last statement generates a meaningful warning >> and the non-optimizated version of the function. >> >> And again as you said, the only thing to learn about all this is that >> we should not write crappy code. >> >> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier >> <[email protected]> wrote: >> > >> > Of course, when I say "your" code, it's the code you have shown, and >> probably "our" (VMMaker) code ;) >> > >> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier < >> [email protected]> a écrit : >> >> >> >> Hi Pablo (again), >> >> no, not a bug. >> >> >> >> The problem is in the source code. The compiler has the right to >> presume that your code is exempt of UB, because you cannot depend on UB >> (obviously). >> >> So it can eliminate all code which corresponds to UB. >> >> >> >> The compiler has the right to assume that a pointer to an int cannot >> point to a long (UB). >> >> So modifying a long cannot have any sort of impact on the content of >> the int pointer. >> >> So the compiler can decouple both path return int content and assign >> long. >> >> But assigning the long has no effect, so the code can be suppressed >> altogether. >> >> >> >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a >> écrit : >> >>> >> >>> Hi Aliaksei, >> >>> to me it looks like a bug of GCC optimization. Basically, it is >> >>> assuming that the x variable is used but never read or its value is >> >>> never used. Also it assumes the same of the i variable, as we are only >> >>> accessing indirectly to the memory where it locates (the code is even >> >>> assuming that the variable exists, but it can be optimize out as in >> >>> this scenario). Even though, the original C code is valid C code, we >> >>> are not helping the compiler writing code like that. So I have >> >>> rewritten the code in a way that does not use indirect memory access >> >>> to the stack space. >> >>> >> >>> One thing more that makes me think is a bug, if you use an int >> >>> constant as the index and not a parameter, the error does not occur >> >>> (the code is not badly optimized) and there is a warning about the >> >>> not-so-great access to the stack. >> >>> >> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]> >> wrote: >> >>> > >> >>> > Hi Pablo, >> >>> > >> >>> > Wow! Thank you for the detective story :) >> >>> > >> >>> > Do I understand correctly that the original code causes undefined >> behavior and therefore can be changed (or even removed) by the compiler? >> >>> > (because it returns something that is referencing memory on the >> stack) >> >>> > >> >>> > Please keep posting similar things in future! It is very educative >> :) >> >>> > >> >>> > Cheers, >> >>> > Alex >> >>> > >> >>> > >> >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]> >> wrote: >> >>> >> >> >>> >> Hi, >> >>> >> this mail is related to Pharo because it is knowledge I found >> >>> >> debugging the build of the VM, but the rest is to document it and >> >>> >> perhaps someone will found it interesting (also I couldn't find it >> >>> >> easily using Google). Sorry for the long mail! >> >>> >> >> >>> >> The problem >> >>> >> ========== >> >>> >> >> >>> >> The following code does not produce good code in 8.3 when using >> optimizations: >> >>> >> >> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){ >> >>> >> long v; >> >>> >> long x = i >> 3; >> >>> >> >> >>> >> v = x; >> >>> >> return ((int*)(&v))[index]; >> >>> >> } >> >>> >> >> >>> >> #include <stdio.h> >> >>> >> >> >>> >> int main(){ >> >>> >> >> >>> >> long i; >> >>> >> int x; >> >>> >> >> >>> >> scanf("%ld", &i); >> >>> >> scanf("%d", &x); >> >>> >> >> >>> >> printf("%ld",myFunc(i,x)); >> >>> >> } >> >>> >> >> >>> >> Basically, with -02, it generates the following code: >> >>> >> >> >>> >> myFunc: >> >>> >> movslq %esi, %rsi >> >>> >> movslq -8(%rsp,%rsi,4), %rax >> >>> >> ret >> >>> >> >> >>> >> And with -01 it generates the following code: >> >>> >> >> >>> >> myFunc: >> >>> >> sarq $3, %rdi >> >>> >> movq %rdi, -8(%rsp) >> >>> >> movslq %esi, %rsi >> >>> >> movslq -8(%rsp,%rsi,4), %rax >> >>> >> ret >> >>> >> >> >>> >> As you can see, the more optimized version is losing the bit shift >> (or >> >>> >> any other operation). >> >>> >> To detect the problem it was not easy, basically, I have used the >> >>> >> Pharo Tests to detect the failing function and then to understand >> the >> >>> >> generation of code in GCC, we need to debug its generation. >> >>> >> >> >>> >> The first snippet is generated using: >> >>> >> >> >>> >> gcc -O2 prb.c -S -o prb.s -Wall >> >>> >> >> >>> >> and the second using: >> >>> >> >> >>> >> gcc -O1 prb.c -S -o prb.s -Wall >> >>> >> >> >>> >> The GCC pipeline has different steps, I have centered my self in >> the >> >>> >> tree optimizations. >> >>> >> GCC dumps each of the intermediate states of the compilation, >> working >> >>> >> with C like trees. For example to get the optimized version of the >> >>> >> program, before generating the Assembler we have to do: >> >>> >> >> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout >> >>> >> >> >>> >> This will generate: >> >>> >> >> >>> >> __attribute__((noinline)) >> >>> >> myFunc (long int i, int index) >> >>> >> { >> >>> >> long int v; >> >>> >> long unsigned int _1; >> >>> >> long unsigned int _2; >> >>> >> int * _3; >> >>> >> int _4; >> >>> >> long int _8; >> >>> >> >> >>> >> <bb 2> [local count: 1073741825]: >> >>> >> _1 = (long unsigned int) index_7(D); >> >>> >> _2 = _1 * 4; >> >>> >> _3 = &v + _2; >> >>> >> _4 = *_3; >> >>> >> _8 = (long int) _4; >> >>> >> v ={v} {CLOBBER}; >> >>> >> return _8; >> >>> >> >> >>> >> } >> >>> >> >> >>> >> This is the optimized SSA (static single assign) version of the >> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html) >> >>> >> As you can see in this version the code is already optimized, and >> our >> >>> >> operations not correctly generated. We expect to see something >> like: >> >>> >> >> >>> >> __attribute__((noinline)) >> >>> >> myFunc (long int i, int index) >> >>> >> { >> >>> >> long int x; >> >>> >> long int v; >> >>> >> long unsigned int _1; >> >>> >> long unsigned int _2; >> >>> >> int * _3; >> >>> >> int _4; >> >>> >> long int _10; >> >>> >> >> >>> >> <bb 2> [local count: 1073741825]: >> >>> >> x_6 = i_5(D) >> 3; >> >>> >> v = x_6; >> >>> >> _1 = (long unsigned int) index_9(D); >> >>> >> _2 = _1 * 4; >> >>> >> _3 = &v + _2; >> >>> >> _4 = *_3; >> >>> >> _10 = (long int) _4; >> >>> >> v ={v} {CLOBBER}; >> >>> >> return _10; >> >>> >> >> >>> >> } >> >>> >> >> >>> >> In the first SSA version, we are lacking the shift operation: >> >>> >> >> >>> >> x_6 = i_5(D) >> 3; >> >>> >> v = x_6; >> >>> >> >> >>> >> So, we need to see in which of the optimizations and >> transformations >> >>> >> our code is lost: >> >>> >> To see all the steps we should execute: >> >>> >> >> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all >> >>> >> >> >>> >> This will generate a lot, really a lot, of files in the same >> directory. >> >>> >> They have the name: prb.c.xxx.yyyy >> >>> >> Where xxx is the number of the step, and yyyy is the name of the >> step. >> >>> >> Each of the files contains the result of applying the changes, so >> what >> >>> >> I have done is looking in binary search where my code was lost. >> >>> >> >> >>> >> I have found that the problem was in the first dead code >> elimination >> >>> >> step (prb.c.041t.cddce1) >> >>> >> GCC does not like the crap code that we are writing, as we are >> copying >> >>> >> a stack variable and then accessing directly to the memory: >> >>> >> >> >>> >> v = x; >> >>> >> return ((int*)(&v))[index]; >> >>> >> >> >>> >> So, basically, it was assuming that we were only using v and not >> x, so >> >>> >> all the code modifying x is described (this optimization is called >> >>> >> "dead store code deletion"). Basically tries to remove all the code >> >>> >> that is affecting stack (temporary) variables that are never used. >> >>> >> >> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I >> have >> >>> >> fixed the problem writing the code in a proper way. >> >>> >> >> >>> >> Sorry again for the long mail, I wanted to store the knowledge and >> >>> >> propagate it before I eventually will flush it. I like these >> things, >> >>> >> but I am not debugging the GCC optimization pipeline all the days. >> >>> >> >> >>> >> -- >> >>> >> Pablo Tesone. >> >>> >> [email protected] >> >>> >> >> >>> >> >>> >> >>> -- >> >>> Pablo Tesone. >> >>> [email protected] >> >>> >> >> >> -- >> Pablo Tesone. >> [email protected] >> >> -- _,,,^..^,,,_ best, Eliot
