On 08/06/2012 11:34 AM, Ulrich Weigand wrote: > Richard Henderson wrote: > > > Some more comments on this patch. > >> +; Different from movdi_31 in that we have no splitters. >> +(define_insn "atomic_loaddi_1" >> + [(set (match_operand:DI 0 "register_operand" "=d,d,!*f,!*f") >> + (unspec:DI [(match_operand:DI 1 "s_operand" "Q,S,Q,m")] > > The constraint line doesn't look right. ld(y) accept base+index, > so they should get R and T constraints, respectively.
Yes, but since I'd limited to s_operand, R seemed weird. > In fact, I wouldn't use "s_operand" as a predicate, since this > excludes valid addresses for those cases, and since in general > I've found it to be preferable to have reload fix up addresses. > So I'd just use "memory_operand" here, and actually in all the > other patterns as well ... (This also simplifes the expander.) In the first patch I did, I had memory_operand and QS, but that ran into reload failures. I assumed I'd just made a mistake. I'll see if I can replicate this for your debugging enjoyment... > "m" is wrong here (and basically everywhere), since it includes > SYMBOL_REFs for lrl etc. on z10. For lpq we need "RT". Ah right. > LPQ and STPQ probably should get a "type" of "other". This may not > model the pipeline exactly, but it's better than just assuming > "integer" by default. These instructions are special (and slow). Noted. > There is one particular inefficiency I have noticed. This code: > > if (!__atomic_compare_exchange_n (&v, &expected, max, 0 , 0, 0)) > abort (); > > from atomic-compare-exchange-3.c gets compiled into: > > l %r3,0(%r2) > larl %r1,v > cs %r3,%r4,0(%r1) > ipm %r1 > sra %r1,28 > st %r3,0(%r2) > ltr %r1,%r1 > jne .L3 > > which is extremely inefficient; it converts the condition code into > an integer using the slow ipm, sra sequence, just so that it can > convert the integer back into a condition code via ltr and branch > on it ... ... > Is there a way of structuring the atomic optabs/expander so that the store > gets done either before or after all the CC operations? No. But I'm a bit confused since a store doesn't affect the flags, so I'm not sure why the EQ can't be combined with the branch. I'll give that another look. All that said, one of the things that MacLeod's gimple_atomic will achieve is allowing two register outputs, which (for the common case) will eliminate the store entirely. r~