On Mon, 7 Apr 2025 14:23:52 GMT, Hamlin Li <m...@openjdk.org> wrote:

> Hi,
> Can you help to review this patch?
> On riscv, CMoveI/L already were implemented, but there are some gap:
> 1. CMoveI/L does not support comparison with float/double, corresponding 
> tests are not turn on either.
> 2. Some optimization of C2 is not turned on, e.g. `Phi -> CMove -> min_max`.
> 3. lack of some corresponding performance tests.
> 
> Also there are some issue with current Zicond:
> 1. UseZicond is turned on automatically by hwprobe, but jmh tests show that 
> it's not always bring benefit, in some situation it even brings regression, 
> the reason is the generated code by Zicond is much larger than branch 
> version, in particular when it's in a loop and unrolled.
> 
> This patch on riscv is to:
> 1. add CMoveI/L comparing float/double, and corresponding tests,
> 2. enable more C2 optimization,
> 3. add more benchmark tests,
> 4. turn off UseZicond by default.
> 
> Thanks!
> 
> ## Performance
> 
> ### MinMax
> test data
> <google-sheets-html-origin style="caret-color: rgb(0, 0, 0); color: rgb(0, 0, 
> 0); font-style: normal; font-variant-caps: normal; font-weight: 400; 
> letter-spacing: normal; orphans: auto; text-align: start; text-indent: 0px; 
> text-transform: none; white-space: normal; widows: auto; word-spacing: 0px; 
> -webkit-text-stroke-width: 0px; text-decoration: none;">
> Benchmark | Mode | Cnt | Score - master | Score - master+UseZbb | Score - 
> -master+UseZicond | Score - master+UseZicond+UseZbb | Score - cmovei | Score 
> - cmovei+UseZbb | Score - cmovei+UseZicond | Score - cmovei+UseZicond+UseZbb 
> | Error | Units | Opt (master/cmovei)
> -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
> o.o.b.vm.compiler.IfMinMax.testReductionInt | avgt | 40 | 17152.075 | 
> 17216.592 | 17272.493 | 17296.89 | 17127.844 | 17036.605 | 17299.333 | 
> 17250.566 | 73.179 | ns/op | 1.001
> o.o.b.vm.compiler.IfMinMax.testReductionLong | avgt | 40 | 19770.828 | 
> 19967.578 | 20268.905 | 20166.165 | 20065.552 | 20059.095 | 20161.914 | 
> 20151.295 | 131.428 | ns/op | 0.985
> o.o.b.vm.compiler.IfMinMax.testSingleInt | avgt | 40 | 114.734 | 114.402 | 
> 114.887 | 114.384 | 114.4 | 110.631 | 112.162 | 110.915 | 0.333 | ns/op | 
> 1.003
> o.o.b.vm.compiler.IfMinMax.testSingleLong | avgt | 40 | 121.53 | 121.711 | 
> 120.91 | 121.665 | 121.309 | 120.57 | 118.639 | 119.373 | 0.451 | ns/op | 
> 1.002
> o.o.b.vm.compiler.IfMinMax.testVectorInt | avgt | 40 | 60130.165 | 60062.303 
> | 61839.776 | 61895.194 | 15887.398 | 15924.502 | 15874.835 | 15667.936 | 
> 101.94 | ns/op | 3.785
> o.o.b.vm.compiler.IfMinMax.testVectorLong | avgt | 40 | 63855.379 | 6309...

because zicond code is bigger, e.g.

void MacroAssembler::cmov_eq(Register cmp1, Register cmp2, Register dst, 
Register src) {
  if (UseZicond) {
    xorr(t0, cmp1, cmp2);
    czero_eqz(dst, dst, t0);
    czero_nez(t1 , src, t0);
    orr(dst, dst, t1);
    return;
  }
  Label no_set;
  bne(cmp1, cmp2, no_set);
  mv(dst, src);
  bind(no_set);
}

-------------

PR Comment: https://git.openjdk.org/jdk/pull/24490#issuecomment-2786284949

Reply via email to