On Wed, 26 Jul 2023 08:34:14 PDT (-0700), gcc-patches@gcc.gnu.org wrote:
I would say LCM/PRE is the key of this set of static rounding model
intrinsic, otherwise I think it's will push people to using dynamic with
fesetrouding mode or inline asm to set the rounding mode for performance
issue - it's kind of opposite way of the design concept, we want to provide
a reliable way with performance to precisely control the ronding model.

For the function call stuff that could be resolved by fenv_access pragma in
theory, since it can be an annotation to tell compiler some function has
modify fenv or not, but unfortunately it’s not well modeled within GCC yet,
so we must did the conservative to make sure we didn't break anything.

And also the LLVM side is trying to implement some simple LCM/PRE to
optimize that, so I believe we need LCM/PRE based mode switching to do that.

IMO that's a perfectly reasonably way to start: let's just get something that's correct and simple, if we need to do more complicated stuff later we can always add it.

There's going to be a very small amount of this code written my a very small number of people (that are likely very close to the compiler teams doing the optimizations here), so we can just all work with each other to sort out any important performance issues as we go.

I think whether LCM or entry/exit performs better is probably just going to boil down to some uarch/workload specific decisions, so as long as whatever we have is correct and reasonably simple it seems fine for now. Given how little of this code there's going to be it's probably not worth spending a ton of time on things until we have a concrete use case to drive things.

Let's just make sure to also update the intrinsic spec to get rid of the grey area here, that way we can point to something if we want to optimize differently in the future.

Li, Pan2 <pan2...@intel.com>於 2023年7月26日 週三,22:31寫道:

As Juzhe mentioned, the problem of the CALL is resolved by LCM/PRE
similar to the VSETVL pass, which is well proofed up to a point.



I would like to propose that being focus and moving forward for this patch
itself, the underlying other RVV floating point API support and the RVV
instrinsic API fully tests depend on this.



Of course, I am working on PATCH v8 and thanks again for Robin’s comments.



Pan



*From:* 钟居哲 <juzhe.zh...@rivai.ai>
*Sent:* Wednesday, July 26, 2023 10:18 PM
*To:* rdapp.gcc <rdapp....@gmail.com>; Li, Pan2 <pan2...@intel.com>
*Cc:* rdapp.gcc <rdapp....@gmail.com>; kito.cheng <kito.ch...@sifive.com>;
gcc-patches <gcc-patches@gcc.gnu.org>; Wang, Yanzhang <
yanzhang.w...@intel.com>
*Subject:* Re: Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
dynamic rounding



Explicitly backup and restore for each intrinsic just the same as we did
for CALL in this patch.



I can't have the data to prove how good we use LCM/PRE of mode switching
but I trust it.



Since the the LCM/PRE is the key optimization method of VSETVL PASS which
is doing good job on VSETVL instruction optimizations.



I don't we should give up LCM/PRE chance then just backup and restore for
each intrinsic bindly.




------------------------------

juzhe.zh...@rivai.ai



*From:* Robin Dapp <rdapp....@gmail.com>

*Date:* 2023-07-26 21:46

*To:* juzhe.zhong <juzhe.zh...@rivai.ai>; Li, Pan2 <pan2...@intel.com>

*CC:* rdapp.gcc <rdapp....@gmail.com>; Kito Cheng <kito.ch...@sifive.com>;
gcc-patches@gcc.gnu.org; Wang, Yanzhang <yanzhang.w...@intel.com>

*Subject:* Re: [PATCH v7] RISC-V: Support CALL for RVV floating-point
dynamic rounding

> current llvm didn't do any pre optimization.  They always

> backup+restore for each rounding mode intrinsic



I see.  There is still the option of lazily restoring the

(entry) FRM before a function call but not read the FRM

after every call.  Do we have any data on how good or bad the

mode-switching LCM works when we explicitly backup and restore

for each intrinsic?



Regards

Robin




Reply via email to