Sent from my iPhone

On Mar 6, 2009, at 7:00 PM, Silvius Rus <r...@google.com> wrote:

I'm thinking about adding bitwise dataflow analysis support to RTL.

Is this a good idea? Bad idea? Already done? Please review if interested.

There is already some bitwise dataflow implemented in combine. And I think df supports bytewise already but it is turned off because of problems with some targets and modes. I also think Steven B. had an implementation on the tree level or at least ideas on how to implement there.



Thank you,
Silvius


Motivation
==========

int foo(int x)
{
 int i = 100;

 do
   {
     if (x > 0)
       x = x & 1;  /* After this insn, all bits except 1 are 0.  */
     else
       x = x & 2;  /* After this insn, all bits except 2 are 0.  */
     i--;
   }
 while (i);

return x & 4; /* Before this insn, only bits 1 and 2 may be nonzero. */
}

"foo" should simply return 0.
This optimization is currently missed at -O2 and -O3 on x86_64.
(Cases with simpler control are solved by the "combine" pass.)


Proposal
========

1. Dataflow framework to propagate bitwise register properties.
  (Integrated with the current dataflow framework.)
2. Forward bitwise dataflow analysis: constant bit propagation.
3. Backward bitwise dataflow analysis: dead bit propagation.
4. Target applications: improve dce and see.  (Others?)


Preliminary Design
==================

1. I don't have enough understanding of GCC to decide whether it should be done at RTL level or tree level. After some brainstorming with Ian Taylor,
  Diego Novillo and others, we decided to go with RTL.

2. This problem could be solved using iterative dataflow with bitmaps.
  However, memory size would increase significantly compared to scalar
  analysis, as would processing time.
For constant bit propagation, we need to keep 2 bits of state for each
  register bit.  For 64 bit registers, that's a factor of 128x over
  the scalar reaching definition problem.

Instead, I propose a sparse bitwise dataflow framework. We would still use the existing RTL dataflow framework to build scalar DU/UD chains.
  Once they are available, bitwise information is propagated only over
  these chains, analogous to the sparse constant propagation described
  by Wegman & Zadeck TOPLAS 1991.

3. This might be too much detail at this point, but just in case, here is
  a brief description of a bit constant propagation algorithm.

  For each instruction I in the function body
    For each register R in instruction I
def_constant_bits(I, R) = collect constants from AND/OR/... operations.

  Iterate until the def_constant_bits don't change:
    For each instruction I in the function body
      For each register R used at I
        use_constant_bits(I, R) = merge (def_constant_bits(D, R))
across all definitions D of R that reach I
      For each register R defined at I
        def_constant_bits(I, R) = transfer (use_constant_bits(I, RU))
for all register uses RU, based on opcodes.

The data structures and routines "collect", "merge" and "transfer"
depend on the problem solved.

4. Complexity considerations.
The solver visits every DU edge once for each iteration of the fixed point
  convergence loop.  The maximum number of iterations is given by the
  height of the state lattice multiplied by the number of bits.
Although this can be as high as 128 for constant bit propagation on x86_64,
  in practice we expect much lower times.
Also, lower complexity guarantees can be given if less accurate information is allowed, e.g., byte level rather than bit level. For byte constants,
  the upper bound constant factor drops from 128 to 16.

Some of these ideas came from discussions with Preston Briggs,
Sriraman Tallam and others.

Reply via email to