Question regarding the "Clean up how cse works" project

Steven Bosscher Mon, 19 Jun 2006 14:24:02 -0700

Hi,

I have a question about the "Clean up how cse works" project on
http://gcc.gnu.org/projects/optimize.html


Let me first explain what I am trying to do.  I have seen Vlad's patch
to make CSE path following remember its state at the end of a path, so
that when a new path is followed, a re-scan is unnecessary for all the
insns up to the point where a state is stored.  I believe that in the
long run we want CSE to work on extended basic blocks, i.e. more like
a tree walk, without looking path following as we know it.  And IMHO
that special hash table implementation in cse.c shouldn't be necessary
so I'd like to replace it with libiberty's hashtab (which seems to do
Just Fine for e.g. cselib).

So far I've mostly used Vlad's code for learning but it seems to me that
his code is not easily adapted to my "do CSE on extended basic blocks"
idea, and I have also found out that with his patch the order of the
elements in the hash table is not restored properly.  This even causes
some test suite failures for me with gcc 3.2 (the last version that the
patch will apply to without too much effort).
I don't really see an easy way to efficiently implement a scoped hash
table the cse.c way, such that you can invalidate and roll back while
maintaining the order of the linked list of equal-valued exprs.  And the
cse.c hash table is alos simply too slow (fixed number of buckets, so
potentially quadratic if you record lots of expressions).

The first problem I immediately ran into while trying to figure out how
to make cse.c use libiberty's hashtab is that we seem to use different
"equivalent" checks depending on how strict we want to be.  For some
lookups, apparently if we only want to find the first_same_value element
in the hash table, we lookup without validating in exp_equiv_p.  For other
lookups we call exp_equiv_p with validate set to true.  The most obvious
example is lookup_as_function.

In the projects page "Clean up how cse works" there is a scheme described
to make cse.c work without first_same_value and next_same_value. Apparently,
at some point someone decided that we should not _have_ this whole thing
with  multiple expressions describing the same value.  I couldn't agree
more.

But I am _still_ not sufficiently familiar with cse.c to fully understand
what it can do (other than sending email).  E.g. I have been trying to
figure out why we record multiple expressions with the same value in the
hash table.  I would like to know what the benefit is, and whether we would
lose optimizations if I make it go away.

It turns out that we sometimes record widely different expressions that get
the same value (due to canonicalization and so on).  Usually the different
expression with the same value comes from a REG_EQUAL note.  When you look
from gdb what we are recording, you get things like:

2: debug_rtx (elt->exp) = (reg:SI 91)
void
1: dump_class (classp) = Equivalence chain for (reg:SI 82):
(reg:SI 82)
(plus:SI (reg:SI 81)
    (reg:SI 59 [ D.6710 ]))
(mult:SI (reg:SI 59 [ D.6710 ])
    (const_int 3 [0x3]))

2: debug_rtx (elt->exp) = (reg:SI 92)
void
1: dump_class (classp) = Equivalence chain for (reg:SI 83):
(reg:SI 83)
(ashift:SI (reg:SI 82)
    (const_int 2 [0x2]))
(mult:SI (reg:SI 59 [ D.6710 ])
    (const_int 12 [0xc]))

2: debug_rtx (elt->exp) = (reg:QI 184 [ D.5910 ])
void
1: dump_class (classp) = Equivalence chain for (reg:QI 385):
(reg:QI 385)
(eq:QI (reg/v:SI 136 [ spec_long ])
    (const_int 0 [0x0]))
(eq:QI (reg:CCZ 17 flags)
    (const_int 0 [0x0]))

In my collection of cc1-i files (half a million lines of preprocessed
code), at -O2, we record multiple expressions with the same value in
4460 cases.  My guess is that in most of these cases we record a
SET_SRC and a REG_EQUAL note.  I of course still need to make sure
that assumption is correct ;-)
In all cases where the value leader is a constant, we can apparently
fold_rtx the expression to that constant so those are not interesting
expressions to count as dups.  That still leaves more than 3000 cases.

I suspect we may benefit from recording these different expressions
in e.g. find_best_addr.  For some machines, the ashift may be better
and for others the mult is cheaper.  So to know that these expressions
have the same value is very important.

That brings me back to the CSE project on the projects page.  Assume
we'd be looking at the first case again, which comes from the following
insns:

(insn 54 53 55 6 (parallel [
            (set (reg:SI 82)
                (plus:SI (reg:SI 81)
                    (reg:SI 59 [ D.6710 ])))
            (clobber (reg:CC 17 flags))
        ]) 208 {*addsi_1} (nil)
    (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ])
            (const_int 3 [0x3]))
        (nil)))

(insn 77 76 78 8 (set (reg:SI 91)
        (reg:SI 82)) 40 {*movsi_1} (nil)
    (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ])
            (const_int 3 [0x3]))
        (nil)))

With the current cse, we simply brute-force record the REG_EQUAL note
and the expression and reg 82, and the whole bunch is equivalenced. Later
on we make reg 91 equivalent to reg 82.  We can at all times find both
the plus and the mult expression as possible values for reg 82 and reg 91.

I don't see how I could do the same with the new scheme from the projects
page, which goes like this (quoted from that page):
-----------------
For arithmetic, each hash table elt has the following slots:

    * Operation. This is an rtx code.
    * Mode.
    * Operands 0, 1 and 2. These point to other hash table elements.

So, if we want to enter (plus:SI (reg:SI 30) (const_int 104)), we first
enter (const_int 104) and find the entry that (reg:SI 30) now points to.
Then we put these elts into operands 0 and 1 of a new elt. We put PLUS
and SI into the new elt.

Registers and mem refs would never be entered into the table as such.
However, the values they contain would be entered. There would be a
table indexed by regno which points at the hash entry for the value in
that reg.
-----------------

With this new scheme, assuming we record REG_EQUAL notes first, we would
record the mult expression, then the plus expression, and then make the
equivalence entry for reg 82 point to the hash table element for either
the plus or the mult.  The mult and the plus would hash to different
buckets and there wouldn't be any way to equivalence the two (same_value
links are not there, remember? ;-).  Whenever we now see reg 82 and we
want to try to substitute its known value, and substituting the mult
fails, we can't try to substitute the plus because we have no way to find
it.

In summary, I don't believe the idea would actually work very well.
Apparently, cselib roughly follows the suggested implementation from the
projects page, but it just does not deal with this problem at all, because
it does not record anything from REG notes.

That brings me to a few questions:

Does anyone see some way to "fix" the idea in the projects page to make
it possible to equivalence expressions that don't at first look the same?

Perhaps it would be enough to only put the expression in the REG_EQUIV
or REG_EQUAL note of an expression in the table, and hang the alternative
forms to some kind of "equivalent value" list much like next_same_value
without putting those in the hash table (a bit like cselib's locs?)?
You would not have something like first_same_value, but you would know
that an expression had a REG_EQUIV/REG_EQUAL note.  I believe this would
not make us lose much because so far all expressions (i.e. equivalent
plus/mult, or ashift/mult) that I've seen recorded as equivalent came
from REG_EQUAL notes.
I'm not sure how this would work for cse's "related values" optimizations
because I haven't really looked at how this works.

Does anyone have some insights and/or ideas to share? :-)

Gr.
Steven

Question regarding the "Clean up how cse works" project

Reply via email to