>> Yes, we just wants to add the whole vector register length in bytes.
OK, I learn it and appreciate you give me the information.

>> I wonder if you also want WHILE_LEN to have the implicit effect
>>to update vector length register?
>>From this perspective, Richi's
>>suggestion on "tieing the scalar result with the uses" looks better IMHO.
No, I don't want to make WHILE_LEN have implict side-effect.
Just tieing the scalar result with the uses.
Updating vector length register, I let RISC-V backend port to do that.
I don't want to involve any RISC-V specific feature into GCC middle-end.

>>No, for both cases, IV is variable, the dumping at loop2_doloop for the 
>>proposed sequence says
>>"Doloop: Possible infinite iteration case.", it seems to show that for the 
>>proposed sequence compiler
>>isn't able to figure out the loop is finite, it may miss the range 
>>information on n, or it isn't
>>able to analyze how the invariant involves, but I didn't look into it, all my 
>>guesses.
Ok, I think it may be fixed in the future.

So, I wonder whether you are basically agree with the concept of this patch?
Would you mind giving more suggestions  that I can fix this patch to make more 
benefits for IBM (s390 or rs6000)?
For example, will you try this patch to see whether it can work for IBM in case 
of multiple rgroup of SLP?
 
Thanks.


juzhe.zh...@rivai.ai
 
From: Kewen.Lin
Date: 2023-04-14 10:54
To: 钟居哲
CC: gcc-patches; Jeff Law; rdapp; richard.sandiford; rguenther
Subject: Re: [PATCH] VECT: Add WHILE_LEN pattern for decrement IV support for 
auto-vectorization
Hi Juzhe,
 
on 2023/4/13 21:44, 钟居哲 wrote:
> Thanks Kewen.
> 
> Current flow in this patch like you said:
> ....
> len = WHILE_LEN (n,vf);
> ...
> v = len_load (addr,len);
> ..
> addr = addr + vf (in byte align);
> ....
> 
> This patch is just keep adding address with a vector factor (adjust as byte 
> align).
> For example, if your vector length = 512bit. Then this patch is just updating 
> address as
> addr = addr + 64;
> 
> However, today after I read RVV ISA more deeply, it should be more 
> appropriate that
> the address should updated as : addr = addr + (len * 4) if len is element 
> number of INT32.
> the len is the result by WHILE_LEN which calculate the len.
 
I just read your detailed explanation on the usage of vsetvli insn (really 
appreciate that),
it looks that this WHILE_LEN wants some more semantics than MIN, so I assume 
you still want
to introduce this WHILE_LEN.
 
> 
> I assume for IBM target, it's better to just update address directly adding 
> the whole register bytesize 
> in address IV. Since I think the second way (address = addr + (len * 4)) is 
> too RVV specific, and won't be suitable for IBM. Is that right?
 
Yes, we just wants to add the whole vector register length in bytes.
 
> If it is true, I will keep this patch flow (won't change to  address = addr + 
> (len * 4)) to see what else I need to do for IBM.
> I would rather do that in RISC-V backend port.
 
IMHO, you don't need to push this down to RV backend, just query these ports 
having len_{load,store}
support with a target hook or special operand in optab while_len (see 
internal_len_load_store_bias)
for this need, and generate different codes accordingly.  IIUC, for WHILE_LEN, 
you want it to have
the semantics as what vsetvli performs, but for IBM ports, it would be just 
like MIN_EXPR, maybe we
can also generate MIN or WHILE_LEN based on this kind of target information.
 
If the above assumption holds, I wonder if you also want WHILE_LEN to have the 
implicit effect
to update vector length register?  If yes, the codes with multiple rgroups 
looks unexpected:
 
+ _76 = .WHILE_LEN (ivtmp_74, vf * nitems_per_ctrl);
+ _79 = .WHILE_LEN (ivtmp_77, vf * nitems_per_ctrl);
 
as the latter one seems to override the former.  Besides, if the given operands 
are known constants,
it can't directly be folded into constants and do further propagation.   From 
this perspective, Richi's
suggestion on "tieing the scalar result with the uses" looks better IMHO.
 
> 
>>> I tried
>>>to compile the above source files on Power, the former can adopt doloop
>>>optimization but the latter fails to. 
> You mean GCC can not do hardward loop optimization when IV loop control is 
> variable ? 
 
No, for both cases, IV is variable, the dumping at loop2_doloop for the 
proposed sequence says
"Doloop: Possible infinite iteration case.", it seems to show that for the 
proposed sequence compiler 
isn't able to figure out the loop is finite, it may miss the range information 
on n, or it isn't
able to analyze how the invariant involves, but I didn't look into it, all my 
guesses.
 
BR,
Kewen
 

Reply via email to