> Ok. Just sent V2. I will adjust comment and send V3 again :)
Sorry, was too slow.
Regards
Robin
Ok. Just sent V2. I will adjust comment and send V3 again :)
juzhe.zh...@rivai.ai
From: Robin Dapp
Date: 2023-06-20 16:55
To: juzhe.zh...@rivai.ai; gcc-patches
CC: rdapp.gcc; kito.cheng; Kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize codegen of VLA SLP
> + /* Step 2: VID AND -NPATTERNS:
> + { 0&-4, 1&-4, 2&-4, 3 &-4, 4 &-4, 5 &-4, 6 &-4, 7 &-4, ... }
> + */
Before that, just add something simple like:
We want to create a pattern where value[ix] = floor (ix / NPATTERNS).
As NPATTERNS is always a power of two we
.
Thanks.
juzhe.zh...@rivai.ai
From: Robin Dapp
Date: 2023-06-20 16:03
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize codegen of VLA SLP
> This is a nice improvement. Even though we're in the SLP realm
2023-06-20 15:55
To: Juzhe-Zhong; gcc-patches
CC: rdapp.gcc; kito.cheng; kito.cheng; palmer; palmer; jeffreyalaw
Subject: Re: [PATCH] RISC-V: Optimize codegen of VLA SLP
Hi Juzhe,
> Case 1:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
> for (int i = 0; i < 100; ++i)
&g
> This is a nice improvement. Even though we're in the SLP realm I would
> still add an assert that documents that we're indeed operating with
> pow2_p (NPATTERNS) and some comment as to why we can use AND.
> Sure we're doing exact_log2 et al later anyway, just to make things
> clearer.
Actually
Hi Juzhe,
> Case 1:
> void
> f (uint8_t *restrict a, uint8_t *restrict b)
> {
> for (int i = 0; i < 100; ++i)
> {
> a[i * 8] = b[i * 8 + 37] + 1;
> a[i * 8 + 1] = b[i * 8 + 37] + 2;
> a[i * 8 + 2] = b[i * 8 + 37] + 3;
> a[i * 8 + 3] = b[i * 8 + 37] + 4;
> a[i *
Recently, I figure out a better approach in case of codegen for VLA stepped
vector.
Here is the detail descriptions:
Case 1:
void
f (uint8_t *restrict a, uint8_t *restrict b)
{
for (int i = 0; i < 100; ++i)
{
a[i * 8] = b[i * 8 + 37] + 1;
a[i * 8 + 1] = b[i * 8 + 37] + 2;