Issue 153402
Summary [RISCV] Inefficient constant pool access
Labels backend:RISC-V
Assignees
Reporter asb
    This issue documents the problem of inefficient constant pool access on RISC-V. It focuses on `double` values, but it could equally apply to e.g. strings.

In general, constant `double` values are accessed from the constant pool (unless they can be generated with 2 instructions). Such a value is emitted in the assembly output like:
```
.LCPI0_0:
        .quad 0xbff0000000000000              # double -1
```

When compiling with PIE (default for Linux targets), this will be loaded with a code sequence like:
```
auipc   a2, %pcrel_hi(.LCPI0_0)
fld     ft10, %pcrel_lo(.Lpcrel_hi1)(a2)
```

For some workloads (e.g. lbm from SPEC), we see very poor codegen where after the `auipc`, that value is spilled and loater reloaded when the constant pool access actually happens. We can gain some control over the hoisting of these accesses with e.g. `isAsCheapAsAMove`, but for the PIE codepath we start with a `PseudoLLA` that is later expanded. Potential options for addressing this:
* Re-evaluate how we handle PseudoLLA and when it is expanded with the aim of preventing hoisting of the auipc. This isn't completely trivial, as keeping PseudoLLA later in the pipeline would have an impact on things like RISCVMergeBaseOffset.
* Although it doesn't fix the problem directly, reducing usage of the constant pool where it isn't necessary will reduce the impact of this kind of poor code generation. e.g. being more liberal about materialising an integer and converting to double, or adding more optimisations around generating one constant using another as a base.
* Don't access such constants via separate symbols, sidestepping the issue. i.e. access the constants for a function at offsets to a common base.

I prototyped the last option as something of a limit study to see what we could gain, and found a simple pass actually works very well.

## Promoting constants

There are multiple ways this could be done, but I prototyped a pass that will:
* Iterate over each function in a module.
* Examine all `double` constants used withing a function and collect any that would be accessed via a constant pool into a new private global array.
* Replace all uses of those constants with an explicit load from that global array.
  * Most commonly the calculation of the base of the pool should happen near the function entry. But I found explicitly placing it there, vs placing the calculation of the array's address next to its use made no difference in practice as later passes clean it up appropriately.

For the worst affected SPEC benchmark (lbm), this results in significantly better codegen in `LBM_performStreamCollideTRT` (where the majority of execution time is spent and many constants are accessed). The rough cut patch is [here](https://gist.github.com/asb/6a3db97b6498f1c149e530bb2d88dae7). The impact on executed instruction count for SPEC 2017 benchmarks (compiled for rva22u64 -O3) is:

```
Benchmark                 Baseline WithDoublePromotion   Diff (%)
======================================================================
500.perlbench_r 180485513127    180486905438      0.00%
502.gcc_r 221180482926    221181065104      0.00%
505.mcf_r               131217768764 131217768764      0.00%
508.namd_r              220536219869 220612466820      0.03%
510.parest_r            291735122413    291853988954 0.04%
511.povray_r             30915528669     30990240444 0.24%
519.lbm_r                91960216684     87897183782 -4.42%
520.omnetpp_r           137704535105    138074982921 0.27%
523.xalancbmk_r         283930786269    284431739183 0.18%
525.x264_r              379364891237    379357617460 -0.00%
526.blender_r           660274620672    660527712107 0.04%
531.deepsjeng_r         350830911210    350830911210 0.00%
538.imagick_r           238456376537    238486280612 0.01%
541.leela_r             406267274643    406267274643 0.00%
544.nab_r               397560964084    390704449786 -1.72%
557.xz_r                129480350182    129480350182 0.00%
```

As noted, `lbm` has the largest impact, with `nab` also positively affected. I looked at the cause of slightly increased instruction count for povray and it seemed to come down with ever so slightly different register allocation in one function (leading to more stack accesses), but seemed more along the lines of expected variation rather than a case where the transformation was forcing an obviously "bad" choice.

Notes:
* We lose out on deduplication of constants, but at least for double as handled here it seems unlikely this will be a big deal (compared to e.g. materialising the constants in the instruction stream, which other targets might do).
* If potentially large constant pools are common, you may want to try to arrange constants based on frequency of access etc.
* AArch64 has an AArch64PromoteConstant pass but it only targets vectors.
* The pass could have alternatively promoted constants to globals and then let globals merging handle it. For the sake of testing out the approach, having a simple pass directly implementing the desired transformation seemed most straightforward.
* More optimisations are possible but not implemented so far (or tested for how often they might kick in). e.g. the pass could recognise when a function accesses a constant and its negation, and put just one of those values in the array. Then use fneg when accessing.
* In rare cases fp constants can be introduced later in the pipeline (e.g.  sdag legalisation). If moving forward with this pass, we'd want to look more at that.

## What's next

The point of this issue was to document my investigations and approaches so far, and get feedback / experiences from anyone else looking in a similar area. As I said up above, I was surprised that the pass-based approach worked so well with no real downside it. Possibly it makes sense to package that up and ship the improvement while looking at other approaches in parallel.

_______________________________________________
llvm-bugs mailing list
llvm-bugs@lists.llvm.org
https://lists.llvm.org/cgi-bin/mailman/listinfo/llvm-bugs

Reply via email to