> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti <andrew.carlo...@arm.com>:
>
> Hi,
>
> I'm working on optimising assignments to the AArch64 Floating-point Mode
> Register (FPMR), as part of our FP8 enablement work. Claudio has already
> implemented FPMR as a hard register, with the intention that FP8 intrinsic
> functions will compile to a combination of an fpmr register set, followed by
> an
> FP8 operation that takes fpmr as an input operand.
>
> It would clearly be inefficient to retain an explicit FPMR assignment prior to
> each FP8 instruction (especially in the common case where every assignment
> uses
> the same FPMR value). I think the best way to optimise this would be to
> implement a new pass that can optimise assignments to individual hard
> registers.
>
> There are a number of existing passes that do similar optimisations, but which
> I believe are unsuitable for this scenario for various reasons. For example:
>
> - cse1 can already optimise FPMR assignments within an extended basic block,
> but can't handle broader optimisations.
> - pre (in gcse.c) doesn't work with assigning constant values, which would
> miss
> many potential usages. It also has limits on how far code can be moved,
> based around ideas of register pressure that don't apply to the context of a
> single hard register that shouldn't be used by the register allocator for
> anything else. Additionally, it doesn't run at -Os.
> - hoist (also using gcse.c) only handles constant values, and only runs when
> optimising for size. It also has the rest of the issues that pre does.
> - mode_sw only handles a small finite set of modes. The mode requirements are
> determined solely by the instructions that require the specific mode, so mode
> switches don't depend on the output of previous instructions.
>
>
> My intention would be for the new pass to reuse ideas, and hopefully some of
> the existing code, from the mode-switching and gcse passes. In particular,
> gcse.c (or it's dependencies) has code that could identify when values
> assigned
> to the FPMR are known to be the same (although we may not need the full CSE
> capabilities of gcse.c), and mode-switching.cc knows how to globally optimise
> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid
> excessively increasing register pressure).
>
> Initially the new pass would only apply to the AArch64 FPMR register, but in
> future it could also be used for other hard registers with similar properties.
>
> Does anyone have any comments on this approach, before I start writing any
> code?
Can you explain in more detail why the mode-switching pass infrastructure isn’t
a good fit? ISTR it already is customizable via target hooks.
Richard
> Thanks,
> Andrew
>
>