Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG

Richard Henderson Thu, 16 Apr 2020 09:14:22 -0700

On 4/16/20 7:42 AM, Stephen Long wrote:
> +static inline uint8_t do_histseg_cnt(uint8_t n, uint64_t m0, uint64_t m1)
> +{
> +    int esz = 0;


Clearer to use MO_8.

> +    int bits = 8 << esz;
> +    uint64_t ones = dup_const(esz, 1);
> +    uint64_t signs = ones << (bits - 1);
> +    uint64_t cmp0, cmp1;
> +
> +    cmp1 = dup_const(1, n);

Error in the esz argument here.

> +    cmp0 = cmp1 ^ m0;
> +    cmp1 = cmp1 ^ m1;
> +    cmp0 = (cmp0 - ones) & ~cmp0;
> +    cmp1 = (cmp1 - ones) & ~cmp1;
> +    return ctpop64((cmp0 | cmp1) & signs);
> +}

Ah, well, I may have been too brief in my suggestion before.  I encourage you
to have a look at the bithacks patch and understand the algorithm here -- it's
quite clever.

We cannot simply OR the two halves together, since 8 | 8 == 8 loses one from
the count of bits.  So:

  cmp0 = (cmp0 - ones) & ~cmp0 & signs;
  cmp1 = (cmp1 - ones) & ~cmp1 & signs;

  /*
   * Combine the two compares in a way that the bits do
   * not overlap, and so preserves the count of set bits.
   * If the host has a efficient instruction for ctpop,
   * then ctpop(x) + ctpop(y) has the same number of
   * operations as ctpop(x | (y >> 1)).  If the host does
   * not have an efficient ctpop, then we only want to
   * use it once.
   */
  return ctpop64(cmp0 | (cmp1 >> 1));

> +        for (j = 0; j < 64; j += 8) {
> +            uint8_t count0 = do_histseg_cnt(n0 >> j, m0, m1);
> +            out0 |= count0 << j;
> +
> +            uint8_t count1 = do_histseg_cnt(n1 >> j, m0, m1);
> +            out1 |= count1 << j;
> +        }

Wrong type for count0/count1 for shifting by e.g. 56.

You might as well just use uint64_t as the return value from do_histseg_cnt()
so that we don't get unnecessary zero-extensions from the compiler.


r~

Re: [PATCH RFC v4] target/arm: Implement SVE2 HISTCNT, HISTSEG

Reply via email to