Hi,

On Thu, Jan 28 2021, Richard Biener wrote:
> On Thu, Jan 28, 2021 at 7:32 AM Hongtao Liu via Gcc-patches
> <gcc-patches@gcc.gnu.org> wrote:
>>
>> Hi:
>>    GCC11 will be the system GCC 2 years from now, and for the
>> processors then, they shouldn't even need to split a 256-bit vector
>> into 2 128-bits vectors.
>>    .i.e. Test SPEC2017 with the below 2 options on Zen3/ICL show
>> option B is better than Option A.
>> Option A:
>> -march=x86-64 -mtune=generic -mavx2 -mfma -Ofast
>>
>> Option B:
>> Option A + 
>> -mtune-ctrl="256_unaligned_load_optimal,256_unaligned_store_optimal"
>>
>>   Bootstrapped and regtested on x86-64_iinux-gnu{-m32,}.
>
> Given the explicit list for unaligned loads it's a no-brainer to change that
> for X86_TUNE_AVX256_UNALIGNED_LOAD_OPTIMAL.  Given both
> BDVER and ZNVER1 are listed for X86_TUNE_AVX256_UNALIGNED_STORE_OPTIMAL
> we should try to benchmark the effect on ZNVER1 - Martin, do we still
> have a znver1 machine around?

Sorry, I kept forgetting about this and when I did not, the machine was
busy.  I did just one SPEC CPUrate (single threaded) reference run
without and with the patch (on top of 6b1633378b7), both times with
-Ofast -mavx2 -mtune=generic and with LTO, and the results were actually
rather good (smaller is better):

  SPEC 2017 FPrate (time):
  | Benchmark       | Before | After |      % |
  |-----------------+--------+-------+--------|
  | 503.bwaves_r    |    217 |   209 |  -3.69 |
  | 507.cactuBSSN_r |    236 |   235 |  -0.42 |
  | 508.namd_r      |    252 |   242 |  -3.97 |
  | 510.parest_r    |    384 |   383 |  -0.26 |
  | 511.povray_r    |    486 |   495 |  +1.85 |
  | 519.lbm_r       |    172 |   173 |  +0.58 |
  | 521.wrf_r       |    292 |   277 |  -5.14 |
  | 526.blender_r   |    300 |   303 |  +1.00 |
  | 527.cam4_r      |    255 |   248 |  -2.75 |
  | 538.imagick_r   |    400 |   400 |  +0.00 |
  | 544.nab_r       |    316 |   316 |  +0.00 |
  | 549.fotonik3d_r |    366 |   351 |  -4.10 |
  | 554.roms_r      |    283 |   248 | -12.37 |
  #+TBLFM: $4=100*$3/$2-100;%+.2f


  SPEC 2017 INTrate (time):
  | Benchmark       | Before | After |     % |
  |-----------------+--------+-------+-------|
  | 500.perlbench_r |    446 |   443 | -0.67 |
  | 502.gcc_r       |    267 |   267 | +0.00 |
  | 505.mcf_r       |    285 |   285 | +0.00 |
  | 520.omnetpp_r   |    437 |   436 | -0.23 |
  | 523.xalancbmk_r |    302 |   308 | +1.99 |
  | 525.x264_r      |    217 |   219 | +0.92 |
  | 531.deepsjeng_r |    316 |   311 | -1.58 |
  | 541.leela_r     |    500 |   499 | -0.20 |
  | 548.exchange2_r |    314 |   315 | +0.32 |
  | 557.xz_r        |    391 |   392 | +0.26 |
  #+TBLFM: $4=100*$3/$2-100;%+.2f

If we regard any regressions smaller than 2% as noise then there were
none.  And 554.roms_r really liked the change, even on znver1.

Martin

Reply via email to