On 2012-07-30 07:09, Ulrich Weigand wrote: > Richard Henderson wrote: > >> Tested only as far as cross-compile. I had a browse through >> objdump of libatomic for a brief sanity check. >> >> Can you please test on real hw and report back? > > I'll run a test, but a couple of things I noticed: > > >> /* Shift the values to the correct bit positions. */ >> - if (!(ac.aligned && MEM_P (cmp))) >> - cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift); >> - if (!(ac.aligned && MEM_P (new_rtx))) >> - new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift); >> + cmp = s390_expand_mask_and_shift (cmp, mode, ac.shift); >> + new_rtx = s390_expand_mask_and_shift (new_rtx, mode, ac.shift); > > This seems to disable use of ICM / STCM to perform byte or > aligned halfword access. Why is this necessary? Those operations > are supposed to provide the required operand consistency ...
Because MEM_P for cmp and new_rtx are always false. The expander always requests register_operand for those. I suppose I could back out merging those cases into the macro. I presume a good test case to examine for ICM is with such an operand coming from a global. What about STCM? I don't see the output from sync_compare_and_swap ever being allowed in memory... > This seems to force DImode accesses through floating-point > registers, which is quite inefficient. Why not allow LM/STM? > Those are supposed to provide doubleword consistency if the > operand is sufficiently aligned ... ... because I only looked at the definition of LM which itself doesn't mention consistency, and the definition of LPQ which talks about LM not being suitable for quadword consistency, and came to the wrong conclusion. So now, looking at movdi_31, I see two problems that prevent just using a "normal" move for the atomic_load/store_di: the o/d and d/b alternatives which are split. Is there some specific goodness that those alternatives provide that is not had by reloading into the Q/S memory patterns? r~