Re: [PATCH 0/2] Convert s390 to atomic optabs, v2

Ulrich Weigand Tue, 31 Jul 2012 11:17:41 -0700

Richard Henderson wrote:

> I've had a go at generating better code in the HQImode CAS
> loop for aligned memory, but I don't know that I'd call it
> the most efficient thing ever.


Thanks for having a look at this!

>   (3) Support for IC, and ICM via the insv pattern is lacking.
>       I've added a tiny bit of support here, in the form of using
>       the existing strict_low_part patterns, but most definitely we
>       could do better.

This doesn't look correct:
+      /* Emit a strict_low_part pattern if possible.  */
+      if (bitpos == 0 && GET_MODE_BITSIZE (smode) == bitsize)

With bitpos == 0 we need to insert into the *high* part, not
the low part on a big-endian platform.  This probably causes
this incorrect code below:
         icm     %r5,3,0(%r12)
We'd need icm mask 12, not 3, to load into the two upper bytes.

[ This is also probably causing the testing failures I'm seeing
with the patch as-is.  I haven't looked into them in detail yet.  ]

>   (4) The *sethighpartsi and *sethighpartdi_64 patterns ought to be
>       more different.  As is, we can't insert into bits 48-56 of a
>       DImode quantity, because we don't generate ICM for DImode,
>       only ICMH.
> 
>   (5) Missing support for RISBGZ in the form of an extv/z expander.
>       The existing *extv/z splitters probably ought to be conditionalized
>       on !Z10.
> 
>   (6) The strict_low_part patterns should allow registers for at
>       least Z10.  The SImode strict_low_part can use LR everywhere.
> 
>   (7) RISBGZ could be used for a 3-address constant lshrsi3 before
>       srlk is available.

Good points, agreed with all of that.  None of that ought to be
a prerequisite for the atomic patch, of course ...

>    * Given that we're having to zap the mask in %r1 for the second
>      compare anyway, I wonder if RISBG is really beneficial over OR.
>      Is RISBG (or ICM for that matter) any faster (or even smaller)?

Just a plain OR is preferable to a RISBG.  I guess the point of the
RISBG is that you can avoid the extra shift ...  Now, if that shift
can be moved ahead of the loop, that may not be all that big of a
win.  On the other hand, these loops hopefully don't loop very often
if we don't have a lot of contention ...

Bye,
Ulrich

-- 
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com

Re: [PATCH 0/2] Convert s390 to atomic optabs, v2

Reply via email to