[Pharo-dev] Debugging GCC code generation

[email protected] Wed, 11 Dec 2019 08:36:08 -0800

Hi,
    this mail is related to Pharo because it is knowledge I found
debugging the build of the VM, but the rest is to document it and
perhaps someone will found it interesting (also I couldn't find it
easily using Google). Sorry for the long mail!


The problem
==========

The following code does not produce good code in 8.3 when using optimizations:

long __attribute__ ((noinline)) myFunc(long i, int index){
   long v;
   long x = i >> 3;

   v = x;
   return ((int*)(&v))[index];
}

#include <stdio.h>

int main(){

    long i;
    int x;

    scanf("%ld", &i);
    scanf("%d", &x);

    printf("%ld",myFunc(i,x));
}

Basically, with -02, it generates the following code:

myFunc:
     movslq %esi, %rsi
     movslq -8(%rsp,%rsi,4), %rax
     ret

And with -01 it generates the following code:

myFunc:
     sarq $3, %rdi
     movq %rdi, -8(%rsp)
     movslq %esi, %rsi
     movslq -8(%rsp,%rsi,4), %rax
     ret

As you can see, the more optimized version is losing the bit shift (or
any other operation).
To detect the problem it was not easy, basically, I have used the
Pharo Tests to detect the failing function and then to understand the
generation of code in GCC, we need to debug its generation.

The first snippet is generated using:

gcc -O2 prb.c -S -o prb.s -Wall

and the second using:

gcc -O1 prb.c -S -o prb.s -Wall

The GCC pipeline has different steps, I have centered my self in the
tree optimizations.
GCC dumps each of the intermediate states of the compilation, working
with C like trees. For example to get the optimized version of the
program, before generating the Assembler we have to do:

gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-optimized=/dev/stdout

This will generate:

__attribute__((noinline))
myFunc (long int i, int index)
{
  long int v;
  long unsigned int _1;
  long unsigned int _2;
  int * _3;
  int _4;
  long int _8;

  <bb 2> [local count: 1073741825]:
  _1 = (long unsigned int) index_7(D);
  _2 = _1 * 4;
  _3 = &v + _2;
  _4 = *_3;
  _8 = (long int) _4;
  v ={v} {CLOBBER};
  return _8;

}

This is the optimized SSA (static single assign) version of the
function (https://gcc.gnu.org/onlinedocs/gccint/SSA.html)
As you can see in this version the code is already optimized, and our
operations not correctly generated. We expect to see something like:

__attribute__((noinline))
myFunc (long int i, int index)
{
  long int x;
  long int v;
  long unsigned int _1;
  long unsigned int _2;
  int * _3;
  int _4;
  long int _10;

  <bb 2> [local count: 1073741825]:
  x_6 = i_5(D) >> 3;
  v = x_6;
  _1 = (long unsigned int) index_9(D);
  _2 = _1 * 4;
  _3 = &v + _2;
  _4 = *_3;
  _10 = (long int) _4;
  v ={v} {CLOBBER};
  return _10;

}

In the first SSA version, we are lacking the shift operation:

  x_6 = i_5(D) >> 3;
  v = x_6;

So, we need to see in which of the optimizations and transformations
our code is lost:
To see all the steps we should execute:

gcc -O2 prb.c -S -o prb.s -Wall -fdump-tree-all

This will generate a lot, really a lot, of files in the same directory.
They have the name: prb.c.xxx.yyyy
Where xxx is the number of the step, and yyyy is the name of the step.
Each of the files contains the result of applying the changes, so what
I have done is looking in binary search where my code was lost.

I have found that the problem was in the first dead code elimination
step (prb.c.041t.cddce1)
GCC does not like the crap code that we are writing, as we are copying
a stack variable and then accessing directly to the memory:

v = x;
return ((int*)(&v))[index];

So, basically, it was assuming that we were only using v and not x, so
all the code modifying x is described (this optimization is called
"dead store code deletion"). Basically tries to remove all the code
that is affecting stack (temporary) variables that are never used.

I am not sure if this is a bug in GCC, so I will report it. But I have
fixed the problem writing the code in a proper way.

Sorry again for the long mail, I wanted to store the knowledge and
propagate it before I eventually will flush it. I like these things,
but I am not debugging the GCC optimization pipeline all the days.

--
Pablo Tesone.
[email protected]

[Pharo-dev] Debugging GCC code generation

Reply via email to