> Am 06.09.2024 um 17:38 schrieb Andrew Carlotti <andrew.carlo...@arm.com>:
> 
> Hi,
> 
> I'm working on optimising assignments to the AArch64 Floating-point Mode
> Register (FPMR), as part of our FP8 enablement work.  Claudio has already
> implemented FPMR as a hard register, with the intention that FP8 intrinsic
> functions will compile to a combination of an fpmr register set, followed by 
> an
> FP8 operation that takes fpmr as an input operand.
> 
> It would clearly be inefficient to retain an explicit FPMR assignment prior to
> each FP8 instruction (especially in the common case where every assignment 
> uses
> the same FPMR value).  I think the best way to optimise this would be to
> implement a new pass that can optimise assignments to individual hard 
> registers.
> 
> There are a number of existing passes that do similar optimisations, but which
> I believe are unsuitable for this scenario for various reasons.  For example:
> 
> - cse1 can already optimise FPMR assignments within an extended basic block,
>  but can't handle broader optimisations.
> - pre (in gcse.c) doesn't work with assigning constant values, which would 
> miss
>  many potential usages.  It also has limits on how far code can be moved,
>  based around ideas of register pressure that don't apply to the context of a
>  single hard register that shouldn't be used by the register allocator for
>  anything else.  Additionally, it doesn't run at -Os.
> - hoist (also using gcse.c) only handles constant values, and only runs when
>  optimising for size.  It also has the rest of the issues that pre does.
> - mode_sw only handles a small finite set of modes.  The mode requirements are
>  determined solely by the instructions that require the specific mode, so mode
>  switches don't depend on the output of previous instructions.
> 
> 
> My intention would be for the new pass to reuse ideas, and hopefully some of
> the existing code, from the mode-switching and gcse passes.  In particular,
> gcse.c (or it's dependencies) has code that could identify when values 
> assigned
> to the FPMR are known to be the same (although we may not need the full CSE
> capabilities of gcse.c), and mode-switching.cc knows how to globally optimise
> mdoe assignments (and unlike gcse.c, doesn't use cautious heuristics to avoid
> excessively increasing register pressure).
> 
> Initially the new pass would only apply to the AArch64 FPMR register, but in
> future it could also be used for other hard registers with similar properties.
> 
> Does anyone have any comments on this approach, before I start writing any
> code?

Can you explain in more detail why the mode-switching pass infrastructure isn’t 
a good fit?  ISTR it already is customizable via target hooks.

Richard 

> Thanks,
> Andrew
> 
> 

Reply via email to