Of course, when I say "your" code, it's the code you have shown, and
probably "our" (VMMaker) code ;)

Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier <
[email protected]> a écrit :

> Hi Pablo (again),
> no, not a bug.
>
> The problem is in the source code. The compiler has the right to presume
> that your code is exempt of UB, because you cannot depend on UB (obviously).
> So it can eliminate all code which corresponds to UB.
>
> The compiler has the right to assume that a pointer to an int cannot point
> to a long (UB).
> So modifying a long cannot have any sort of impact on the content of the
> int pointer.
> So the compiler can decouple both path return int content and assign long.
> But assigning the long has no effect, so the code can be suppressed
> altogether.
>
> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a
> écrit :
>
>> Hi Aliaksei,
>>       to me it looks like a bug of GCC optimization. Basically, it is
>> assuming that the x variable is used but never read or its value is
>> never used. Also it assumes the same of the i variable, as we are only
>> accessing indirectly to the memory where it locates (the code is even
>> assuming that the variable exists, but it can be optimize out as in
>> this scenario). Even though, the original C code is valid C code, we
>> are not helping the compiler writing code like that. So I have
>> rewritten the code in a way that does not use indirect memory access
>> to the stack space.
>>
>> One thing more that makes me think is a bug, if you use an int
>> constant as the index and not a parameter, the error does not occur
>> (the code is not badly optimized) and there is a warning about the
>> not-so-great access to the stack.
>>
>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]>
>> wrote:
>> >
>> > Hi Pablo,
>> >
>> > Wow! Thank you for the detective story :)
>> >
>> > Do I understand correctly that the original code causes undefined
>> behavior and therefore can be changed (or even removed) by the compiler?
>> > (because it returns something that is referencing memory on the stack)
>> >
>> > Please keep posting similar things in future! It is very educative :)
>> >
>> > Cheers,
>> > Alex
>> >
>> >
>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]>
>> wrote:
>> >>
>> >> Hi,
>> >>     this mail is related to Pharo because it is knowledge I found
>> >> debugging the build of the VM, but the rest is to document it and
>> >> perhaps someone will found it interesting (also I couldn't find it
>> >> easily using Google). Sorry for the long mail!
>> >>
>> >> The problem
>> >> ==========
>> >>
>> >> The following code does not produce good code in 8.3 when using
>> optimizations:
>> >>
>> >> long __attribute__ ((noinline)) myFunc(long i, int index){
>> >>    long v;
>> >>    long x = i >> 3;
>> >>
>> >>    v = x;
>> >>    return ((int*)(&v))[index];
>> >> }
>> >>
>> >> #include <stdio.h>
>> >>
>> >> int main(){
>> >>
>> >>     long i;
>> >>     int x;
>> >>
>> >>     scanf("%ld", &i);
>> >>     scanf("%d", &x);
>> >>
>> >>     printf("%ld",myFunc(i,x));
>> >> }
>> >>
>> >> Basically, with -02, it generates the following code:
>> >>
>> >> myFunc:
>> >>      movslq %esi, %rsi
>> >>      movslq -8(%rsp,%rsi,4), %rax
>> >>      ret
>> >>
>> >> And with -01 it generates the following code:
>> >>
>> >> myFunc:
>> >>      sarq $3, %rdi
>> >>      movq %rdi, -8(%rsp)
>> >>      movslq %esi, %rsi
>> >>      movslq -8(%rsp,%rsi,4), %rax
>> >>      ret
>> >>
>> >> As you can see, the more optimized version is losing the bit shift (or
>> >> any other operation).
>> >> To detect the problem it was not easy, basically, I have used the
>> >> Pharo Tests to detect the failing function and then to understand the
>> >> generation of code in GCC, we need to debug its generation.
>> >>
>> >> The first snippet is generated using:
>> >>
>> >> gcc -O2 prb.c -S -o prb.s -Wall
>> >>
>> >> and the second using:
>> >>
>> >> gcc -O1 prb.c -S -o prb.s -Wall
>> >>
>> >> The GCC pipeline has different steps, I have centered my self in the
>> >> tree optimizations.
>> >> GCC dumps each of the intermediate states of the compilation, working
>> >> with C like trees. For example to get the optimized version of the
>> >> program, before generating the Assembler we have to do:
>> >>
>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
>> >>
>> >> This will generate:
>> >>
>> >> __attribute__((noinline))
>> >> myFunc (long int i, int index)
>> >> {
>> >>   long int v;
>> >>   long unsigned int _1;
>> >>   long unsigned int _2;
>> >>   int * _3;
>> >>   int _4;
>> >>   long int _8;
>> >>
>> >>   <bb 2> [local count: 1073741825]:
>> >>   _1 = (long unsigned int) index_7(D);
>> >>   _2 = _1 * 4;
>> >>   _3 = &v + _2;
>> >>   _4 = *_3;
>> >>   _8 = (long int) _4;
>> >>   v ={v} {CLOBBER};
>> >>   return _8;
>> >>
>> >> }
>> >>
>> >> This is the optimized SSA (static single assign) version of the
>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
>> >> As you can see in this version the code is already optimized, and our
>> >> operations not correctly generated. We expect to see something like:
>> >>
>> >> __attribute__((noinline))
>> >> myFunc (long int i, int index)
>> >> {
>> >>   long int x;
>> >>   long int v;
>> >>   long unsigned int _1;
>> >>   long unsigned int _2;
>> >>   int * _3;
>> >>   int _4;
>> >>   long int _10;
>> >>
>> >>   <bb 2> [local count: 1073741825]:
>> >>   x_6 = i_5(D) >> 3;
>> >>   v = x_6;
>> >>   _1 = (long unsigned int) index_9(D);
>> >>   _2 = _1 * 4;
>> >>   _3 = &v + _2;
>> >>   _4 = *_3;
>> >>   _10 = (long int) _4;
>> >>   v ={v} {CLOBBER};
>> >>   return _10;
>> >>
>> >> }
>> >>
>> >> In the first SSA version, we are lacking the shift operation:
>> >>
>> >>   x_6 = i_5(D) >> 3;
>> >>   v = x_6;
>> >>
>> >> So, we need to see in which of the optimizations and transformations
>> >> our code is lost:
>> >> To see all the steps we should execute:
>> >>
>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
>> >>
>> >> This will generate a lot, really a lot, of files in the same directory.
>> >> They have the name: prb.c.xxx.yyyy
>> >> Where xxx is the number of the step, and yyyy is the name of the step.
>> >> Each of the files contains the result of applying the changes, so what
>> >> I have done is looking in binary search where my code was lost.
>> >>
>> >> I have found that the problem was in the first dead code elimination
>> >> step (prb.c.041t.cddce1)
>> >> GCC does not like the crap code that we are writing, as we are copying
>> >> a stack variable and then accessing directly to the memory:
>> >>
>> >> v = x;
>> >> return ((int*)(&v))[index];
>> >>
>> >> So, basically, it was assuming that we were only using v and not x, so
>> >> all the code modifying x is described (this optimization is called
>> >> "dead store code deletion"). Basically tries to remove all the code
>> >> that is affecting stack (temporary) variables that are never used.
>> >>
>> >> I am not sure if this is a bug in GCC, so I will report it. But I have
>> >> fixed the problem writing the code in a proper way.
>> >>
>> >> Sorry again for the long mail, I wanted to store the knowledge and
>> >> propagate it before I eventually will flush it. I like these things,
>> >> but I am not debugging the GCC optimization pipeline all the days.
>> >>
>> >> --
>> >> Pablo Tesone.
>> >> [email protected]
>> >>
>>
>>
>> --
>> Pablo Tesone.
>> [email protected]
>>
>>

Reply via email to