On Wed, Dec 11, 2019 at 12:03 PM Nicolas Cellier <
[email protected]> wrote:

> Yes,
> But we have to replace natural crappy code (split a long in 2 ints) that
> was once legal, by an even more crappy code (memcpy), so all in all, it's a
> crappy art.
>

:-)  Indeed.  Personally I liked it when C was a portable assembler.
That's what it's fit for and that's what it should be good at.  Trying to
pretend it's Pascal is, um, pretentious?


>
> Le mer. 11 déc. 2019 à 19:30, [email protected] <[email protected]> a
> écrit :
>
>> Hi Nicolas,
>>   thanks for the comment, you are right the problem is the bad
>> original code. But my opinion is that it just is not reporting the
>> situation correctly, generating a warning or non-optimizing the code
>> looks more like a expected behavior. Because as I have said, using a
>> constant as index in the last statement generates a meaningful warning
>> and the non-optimizated version of the function.
>>
>> And again as you said, the only thing to learn about all this is that
>> we should not write crappy code.
>>
>> On Wed, Dec 11, 2019 at 7:11 PM Nicolas Cellier
>> <[email protected]> wrote:
>> >
>> > Of course, when I say "your" code, it's the code you have shown, and
>> probably "our" (VMMaker) code ;)
>> >
>> > Le mer. 11 déc. 2019 à 19:05, Nicolas Cellier <
>> [email protected]> a écrit :
>> >>
>> >> Hi Pablo (again),
>> >> no, not a bug.
>> >>
>> >> The problem is in the source code. The compiler has the right to
>> presume that your code is exempt of UB, because you cannot depend on UB
>> (obviously).
>> >> So it can eliminate all code which corresponds to UB.
>> >>
>> >> The compiler has the right to assume that a pointer to an int cannot
>> point to a long (UB).
>> >> So modifying a long cannot have any sort of impact on the content of
>> the int pointer.
>> >> So the compiler can decouple both path return int content and assign
>> long.
>> >> But assigning the long has no effect, so the code can be suppressed
>> altogether.
>> >>
>> >> Le mer. 11 déc. 2019 à 18:54, [email protected] <[email protected]> a
>> écrit :
>> >>>
>> >>> Hi Aliaksei,
>> >>>       to me it looks like a bug of GCC optimization. Basically, it is
>> >>> assuming that the x variable is used but never read or its value is
>> >>> never used. Also it assumes the same of the i variable, as we are only
>> >>> accessing indirectly to the memory where it locates (the code is even
>> >>> assuming that the variable exists, but it can be optimize out as in
>> >>> this scenario). Even though, the original C code is valid C code, we
>> >>> are not helping the compiler writing code like that. So I have
>> >>> rewritten the code in a way that does not use indirect memory access
>> >>> to the stack space.
>> >>>
>> >>> One thing more that makes me think is a bug, if you use an int
>> >>> constant as the index and not a parameter, the error does not occur
>> >>> (the code is not badly optimized) and there is a warning about the
>> >>> not-so-great access to the stack.
>> >>>
>> >>> On Wed, Dec 11, 2019 at 6:01 PM Aliaksei Syrel <[email protected]>
>> wrote:
>> >>> >
>> >>> > Hi Pablo,
>> >>> >
>> >>> > Wow! Thank you for the detective story :)
>> >>> >
>> >>> > Do I understand correctly that the original code causes undefined
>> behavior and therefore can be changed (or even removed) by the compiler?
>> >>> > (because it returns something that is referencing memory on the
>> stack)
>> >>> >
>> >>> > Please keep posting similar things in future! It is very educative
>> :)
>> >>> >
>> >>> > Cheers,
>> >>> > Alex
>> >>> >
>> >>> >
>> >>> > On Wed, 11 Dec 2019 at 17:35, [email protected] <[email protected]>
>> wrote:
>> >>> >>
>> >>> >> Hi,
>> >>> >>     this mail is related to Pharo because it is knowledge I found
>> >>> >> debugging the build of the VM, but the rest is to document it and
>> >>> >> perhaps someone will found it interesting (also I couldn't find it
>> >>> >> easily using Google). Sorry for the long mail!
>> >>> >>
>> >>> >> The problem
>> >>> >> ==========
>> >>> >>
>> >>> >> The following code does not produce good code in 8.3 when using
>> optimizations:
>> >>> >>
>> >>> >> long __attribute__ ((noinline)) myFunc(long i, int index){
>> >>> >>    long v;
>> >>> >>    long x = i >> 3;
>> >>> >>
>> >>> >>    v = x;
>> >>> >>    return ((int*)(&v))[index];
>> >>> >> }
>> >>> >>
>> >>> >> #include <stdio.h>
>> >>> >>
>> >>> >> int main(){
>> >>> >>
>> >>> >>     long i;
>> >>> >>     int x;
>> >>> >>
>> >>> >>     scanf("%ld", &i);
>> >>> >>     scanf("%d", &x);
>> >>> >>
>> >>> >>     printf("%ld",myFunc(i,x));
>> >>> >> }
>> >>> >>
>> >>> >> Basically, with -02, it generates the following code:
>> >>> >>
>> >>> >> myFunc:
>> >>> >>      movslq %esi, %rsi
>> >>> >>      movslq -8(%rsp,%rsi,4), %rax
>> >>> >>      ret
>> >>> >>
>> >>> >> And with -01 it generates the following code:
>> >>> >>
>> >>> >> myFunc:
>> >>> >>      sarq $3, %rdi
>> >>> >>      movq %rdi, -8(%rsp)
>> >>> >>      movslq %esi, %rsi
>> >>> >>      movslq -8(%rsp,%rsi,4), %rax
>> >>> >>      ret
>> >>> >>
>> >>> >> As you can see, the more optimized version is losing the bit shift
>> (or
>> >>> >> any other operation).
>> >>> >> To detect the problem it was not easy, basically, I have used the
>> >>> >> Pharo Tests to detect the failing function and then to understand
>> the
>> >>> >> generation of code in GCC, we need to debug its generation.
>> >>> >>
>> >>> >> The first snippet is generated using:
>> >>> >>
>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall
>> >>> >>
>> >>> >> and the second using:
>> >>> >>
>> >>> >> gcc -O1 prb.c -S -o prb.s -Wall
>> >>> >>
>> >>> >> The GCC pipeline has different steps, I have centered my self in
>> the
>> >>> >> tree optimizations.
>> >>> >> GCC dumps each of the intermediate states of the compilation,
>> working
>> >>> >> with C like trees. For example to get the optimized version of the
>> >>> >> program, before generating the Assembler we have to do:
>> >>> >>
>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout
>> >>> >>
>> >>> >> This will generate:
>> >>> >>
>> >>> >> __attribute__((noinline))
>> >>> >> myFunc (long int i, int index)
>> >>> >> {
>> >>> >>   long int v;
>> >>> >>   long unsigned int _1;
>> >>> >>   long unsigned int _2;
>> >>> >>   int * _3;
>> >>> >>   int _4;
>> >>> >>   long int _8;
>> >>> >>
>> >>> >>   <bb 2> [local count: 1073741825]:
>> >>> >>   _1 = (long unsigned int) index_7(D);
>> >>> >>   _2 = _1 * 4;
>> >>> >>   _3 = &v + _2;
>> >>> >>   _4 = *_3;
>> >>> >>   _8 = (long int) _4;
>> >>> >>   v ={v} {CLOBBER};
>> >>> >>   return _8;
>> >>> >>
>> >>> >> }
>> >>> >>
>> >>> >> This is the optimized SSA (static single assign) version of the
>> >>> >> function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
>> >>> >> As you can see in this version the code is already optimized, and
>> our
>> >>> >> operations not correctly generated. We expect to see something
>> like:
>> >>> >>
>> >>> >> __attribute__((noinline))
>> >>> >> myFunc (long int i, int index)
>> >>> >> {
>> >>> >>   long int x;
>> >>> >>   long int v;
>> >>> >>   long unsigned int _1;
>> >>> >>   long unsigned int _2;
>> >>> >>   int * _3;
>> >>> >>   int _4;
>> >>> >>   long int _10;
>> >>> >>
>> >>> >>   <bb 2> [local count: 1073741825]:
>> >>> >>   x_6 = i_5(D) >> 3;
>> >>> >>   v = x_6;
>> >>> >>   _1 = (long unsigned int) index_9(D);
>> >>> >>   _2 = _1 * 4;
>> >>> >>   _3 = &v + _2;
>> >>> >>   _4 = *_3;
>> >>> >>   _10 = (long int) _4;
>> >>> >>   v ={v} {CLOBBER};
>> >>> >>   return _10;
>> >>> >>
>> >>> >> }
>> >>> >>
>> >>> >> In the first SSA version, we are lacking the shift operation:
>> >>> >>
>> >>> >>   x_6 = i_5(D) >> 3;
>> >>> >>   v = x_6;
>> >>> >>
>> >>> >> So, we need to see in which of the optimizations and
>> transformations
>> >>> >> our code is lost:
>> >>> >> To see all the steps we should execute:
>> >>> >>
>> >>> >> gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all
>> >>> >>
>> >>> >> This will generate a lot, really a lot, of files in the same
>> directory.
>> >>> >> They have the name: prb.c.xxx.yyyy
>> >>> >> Where xxx is the number of the step, and yyyy is the name of the
>> step.
>> >>> >> Each of the files contains the result of applying the changes, so
>> what
>> >>> >> I have done is looking in binary search where my code was lost.
>> >>> >>
>> >>> >> I have found that the problem was in the first dead code
>> elimination
>> >>> >> step (prb.c.041t.cddce1)
>> >>> >> GCC does not like the crap code that we are writing, as we are
>> copying
>> >>> >> a stack variable and then accessing directly to the memory:
>> >>> >>
>> >>> >> v = x;
>> >>> >> return ((int*)(&v))[index];
>> >>> >>
>> >>> >> So, basically, it was assuming that we were only using v and not
>> x, so
>> >>> >> all the code modifying x is described (this optimization is called
>> >>> >> "dead store code deletion"). Basically tries to remove all the code
>> >>> >> that is affecting stack (temporary) variables that are never used.
>> >>> >>
>> >>> >> I am not sure if this is a bug in GCC, so I will report it. But I
>> have
>> >>> >> fixed the problem writing the code in a proper way.
>> >>> >>
>> >>> >> Sorry again for the long mail, I wanted to store the knowledge and
>> >>> >> propagate it before I eventually will flush it. I like these
>> things,
>> >>> >> but I am not debugging the GCC optimization pipeline all the days.
>> >>> >>
>> >>> >> --
>> >>> >> Pablo Tesone.
>> >>> >> [email protected]
>> >>> >>
>> >>>
>> >>>
>> >>> --
>> >>> Pablo Tesone.
>> >>> [email protected]
>> >>>
>>
>>
>> --
>> Pablo Tesone.
>> [email protected]
>>
>>

-- 
_,,,^..^,,,_
best, Eliot

Reply via email to