Hi Juzhe, on 2023/4/12 21:22, 钟居哲 wrote: > Thanks Kewen. > > It seems that this proposal WHILE_LEN can help s390 when using --param > vect-partial-vector-usage=2 compile option. >
Yeah, IMHO, the previous sequence vs. the proposed sequence are like: int foo (int *__restrict a, int *__restrict b, int n) { if (n <= 0) return 0; int iv = 0; int len = MIN (n, 16); int sum = 0; do { sum += a[len] + b[len]; iv += 16; int n1 = MIN (n, iv); // line A int n2 = n - n1; len = MIN (n2, 16); } while (n > iv); return sum; } vs. int foo (int *__restrict a, int *__restrict b, int n) { if (n <= 0) return 0; int len; int sum = 0; do { len = MIN (n, 16); sum += a[len] + b[len]; n -= len; } while (n > 0); return sum; } it at least saves one MIN (at line A) and one length preparation in the last iteration (it's useless since loop ends). But I think the concern that this proposed IV isn't recognized as simple iv may stay. I tried to compile the above source files on Power, the former can adopt doloop optimization but the latter fails to. > Would you mind apply this patch && support WHILE_LEN in s390 backend and test > it to see the overal benefits for s390 > as well as the correctness of this sequence ? Sure, if all of you think this approach and this revision is good enough to go forward for this kind of evaluation, I'm happy to give it a shot, but only for rs6000. ;-) I noticed that there are some discussions on withdrawing this WHILE_LEN by using MIN_EXPR instead, I'll stay tuned. btw, now we only adopt vector with length on the epilogues rather than the main vectorized loops, because of the non-trivial extra costs for length preparation than just using the normal vector load/store (all lanes), so we don't care about the performance with --param vect-partial-vector-usage=2 much. Even if this new proposal can optimize the length preparation for --param vect-partial-vector-usage=2, the extra costs for length preparation is still unavoidable (MIN, shifting, one more GPR used), we would still stay with default --param vect-partial-vector-usage=1 (which can't benefit from this new proposal). BR, Kewen