On 8/1/23 00:47, Robin Dapp via Gcc-patches wrote:
I'm not against continuing with the more well-known approach for now
but we should keep in mind that might still be potential for improvement.
No. I don't think it's faster.
I did a quick check on my x86 laptop and it's roughly 25% fas
>>> I'm not against continuing with the more well-known approach for now
>>> but we should keep in mind that might still be potential for improvement.
>
> No. I don't think it's faster.
I did a quick check on my x86 laptop and it's roughly 25% faster there.
That's consistent with the literature.
7;s meaningless.
Thanks.
juzhe.zh...@rivai.ai
From: Robin Dapp
Date: 2023-08-01 03:38
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng
Subject: Re: [PATCH V2] RISC-V: Support POPCOUNT auto-vectorization
Hi Juzhe,
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +/* FIXME: We don't allow vectorize "__builtin_popcountll" yet since it needs
> "vec_pack_trunc" support
> + and such pattern may cause inferior codegen.
> + We will enable "vec_pack_trunc" when we support reasonable vector
> cost model. */
Wait, why do we need vec_pack_trunc f
Hi Juzhe,
> +/* Expand Vector POPCOUNT by parallel popcnt:
> +
> + int parallel_popcnt(uint32_t n) {
> + #define POW2(c) (1U << (c))
> + #define MASK(c) (static_cast(-1) / (POW2(POW2(c)) + 1U))
> + #define COUNT(x, c) ((x) & MASK(c)) + (((x)>>(POW2(c))) & MASK(c))
> + n = CO