Hi, I have a question about the "Clean up how cse works" project on http://gcc.gnu.org/projects/optimize.html
Let me first explain what I am trying to do. I have seen Vlad's patch to make CSE path following remember its state at the end of a path, so that when a new path is followed, a re-scan is unnecessary for all the insns up to the point where a state is stored. I believe that in the long run we want CSE to work on extended basic blocks, i.e. more like a tree walk, without looking path following as we know it. And IMHO that special hash table implementation in cse.c shouldn't be necessary so I'd like to replace it with libiberty's hashtab (which seems to do Just Fine for e.g. cselib). So far I've mostly used Vlad's code for learning but it seems to me that his code is not easily adapted to my "do CSE on extended basic blocks" idea, and I have also found out that with his patch the order of the elements in the hash table is not restored properly. This even causes some test suite failures for me with gcc 3.2 (the last version that the patch will apply to without too much effort). I don't really see an easy way to efficiently implement a scoped hash table the cse.c way, such that you can invalidate and roll back while maintaining the order of the linked list of equal-valued exprs. And the cse.c hash table is alos simply too slow (fixed number of buckets, so potentially quadratic if you record lots of expressions). The first problem I immediately ran into while trying to figure out how to make cse.c use libiberty's hashtab is that we seem to use different "equivalent" checks depending on how strict we want to be. For some lookups, apparently if we only want to find the first_same_value element in the hash table, we lookup without validating in exp_equiv_p. For other lookups we call exp_equiv_p with validate set to true. The most obvious example is lookup_as_function. In the projects page "Clean up how cse works" there is a scheme described to make cse.c work without first_same_value and next_same_value. Apparently, at some point someone decided that we should not _have_ this whole thing with multiple expressions describing the same value. I couldn't agree more. But I am _still_ not sufficiently familiar with cse.c to fully understand what it can do (other than sending email). E.g. I have been trying to figure out why we record multiple expressions with the same value in the hash table. I would like to know what the benefit is, and whether we would lose optimizations if I make it go away. It turns out that we sometimes record widely different expressions that get the same value (due to canonicalization and so on). Usually the different expression with the same value comes from a REG_EQUAL note. When you look from gdb what we are recording, you get things like: 2: debug_rtx (elt->exp) = (reg:SI 91) void 1: dump_class (classp) = Equivalence chain for (reg:SI 82): (reg:SI 82) (plus:SI (reg:SI 81) (reg:SI 59 [ D.6710 ])) (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) 2: debug_rtx (elt->exp) = (reg:SI 92) void 1: dump_class (classp) = Equivalence chain for (reg:SI 83): (reg:SI 83) (ashift:SI (reg:SI 82) (const_int 2 [0x2])) (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 12 [0xc])) 2: debug_rtx (elt->exp) = (reg:QI 184 [ D.5910 ]) void 1: dump_class (classp) = Equivalence chain for (reg:QI 385): (reg:QI 385) (eq:QI (reg/v:SI 136 [ spec_long ]) (const_int 0 [0x0])) (eq:QI (reg:CCZ 17 flags) (const_int 0 [0x0])) In my collection of cc1-i files (half a million lines of preprocessed code), at -O2, we record multiple expressions with the same value in 4460 cases. My guess is that in most of these cases we record a SET_SRC and a REG_EQUAL note. I of course still need to make sure that assumption is correct ;-) In all cases where the value leader is a constant, we can apparently fold_rtx the expression to that constant so those are not interesting expressions to count as dups. That still leaves more than 3000 cases. I suspect we may benefit from recording these different expressions in e.g. find_best_addr. For some machines, the ashift may be better and for others the mult is cheaper. So to know that these expressions have the same value is very important. That brings me back to the CSE project on the projects page. Assume we'd be looking at the first case again, which comes from the following insns: (insn 54 53 55 6 (parallel [ (set (reg:SI 82) (plus:SI (reg:SI 81) (reg:SI 59 [ D.6710 ]))) (clobber (reg:CC 17 flags)) ]) 208 {*addsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) (nil))) (insn 77 76 78 8 (set (reg:SI 91) (reg:SI 82)) 40 {*movsi_1} (nil) (expr_list:REG_EQUAL (mult:SI (reg:SI 59 [ D.6710 ]) (const_int 3 [0x3])) (nil))) With the current cse, we simply brute-force record the REG_EQUAL note and the expression and reg 82, and the whole bunch is equivalenced. Later on we make reg 91 equivalent to reg 82. We can at all times find both the plus and the mult expression as possible values for reg 82 and reg 91. I don't see how I could do the same with the new scheme from the projects page, which goes like this (quoted from that page): ----------------- For arithmetic, each hash table elt has the following slots: * Operation. This is an rtx code. * Mode. * Operands 0, 1 and 2. These point to other hash table elements. So, if we want to enter (plus:SI (reg:SI 30) (const_int 104)), we first enter (const_int 104) and find the entry that (reg:SI 30) now points to. Then we put these elts into operands 0 and 1 of a new elt. We put PLUS and SI into the new elt. Registers and mem refs would never be entered into the table as such. However, the values they contain would be entered. There would be a table indexed by regno which points at the hash entry for the value in that reg. ----------------- With this new scheme, assuming we record REG_EQUAL notes first, we would record the mult expression, then the plus expression, and then make the equivalence entry for reg 82 point to the hash table element for either the plus or the mult. The mult and the plus would hash to different buckets and there wouldn't be any way to equivalence the two (same_value links are not there, remember? ;-). Whenever we now see reg 82 and we want to try to substitute its known value, and substituting the mult fails, we can't try to substitute the plus because we have no way to find it. In summary, I don't believe the idea would actually work very well. Apparently, cselib roughly follows the suggested implementation from the projects page, but it just does not deal with this problem at all, because it does not record anything from REG notes. That brings me to a few questions: Does anyone see some way to "fix" the idea in the projects page to make it possible to equivalence expressions that don't at first look the same? Perhaps it would be enough to only put the expression in the REG_EQUIV or REG_EQUAL note of an expression in the table, and hang the alternative forms to some kind of "equivalent value" list much like next_same_value without putting those in the hash table (a bit like cselib's locs?)? You would not have something like first_same_value, but you would know that an expression had a REG_EQUIV/REG_EQUAL note. I believe this would not make us lose much because so far all expressions (i.e. equivalent plus/mult, or ashift/mult) that I've seen recorded as equivalent came from REG_EQUAL notes. I'm not sure how this would work for cse's "related values" optimizations because I haven't really looked at how this works. Does anyone have some insights and/or ideas to share? :-) Gr. Steven