https://gcc.gnu.org/bugzilla/show_bug.cgi?id=117722

--- Comment #12 from Li Pan <pan2.li at intel dot com> ---
(In reply to Robin Dapp from comment #11)
> (In reply to Li Pan from comment #9)
> > Created attachment 59663 [details]
> > before_vs_after when outer loop is 128
> 
> Ok, that's a different loop then.  I'm seeing vmv1rs in the current version,
> is that what you're referring to as problematic?  Do they result from the
> lack of overlap constraints?  I'd prefer a bit more context rather than just
> code dumps :)

Oh, forget this, list code and build option as below for the above png.

   1   │ #include <stdint.h>
   2   │ #include <stdlib.h>
   3   │
   4   │ #define T1 uint8_t
   5   │ #define T2 int32_t
   6   │
   7   │ T2
   8   │ foo (T2 * restrict op_0, T1 * restrict op_1,
   9   │      T1 * restrict op_2, T2 op_3, T2 op_4)
  10   │ {
  11   │   T2 sum = 0;
  12   │   for (unsigned i = 0; i < 128; i++) // x264_pixel_sad_4x4 is i < 4.
  13   │     {
  14   │       for (unsigned k = 0; k < 8; k++)
  15   │         sum += abs (op_1[k] - op_2[k]);
  16   │
  17   │       op_1 += op_3;
  18   │       op_2 += op_4;
  19   │     }
  20   │
  21   │   return sum;
  22   │ }

-O3 -march=rv64gcv -mabi=lp64d -c -S u_sad.c -o after.S -fno-schedule-insns
-fno-schedule-insns2
-O3 -march=rv64gcv -mabi=lp64d -c -S u_sad.c -mno-vector-strict-align -o
before.S -fno-schedule-insns -fno-schedule-insns2

Reply via email to