On 5/22/25 05:12, Robin Dapp wrote: >>> AFAICT the main difference to standard mode switching is that we (ab)use it >>> to set the rounding mode to the value it had initially, either at function >>> entry or after a call. That's different to regular mode switching which >>> assumes "static" rounding modes for different instructions. >>> >>> Standard could e.g. be: >>> - insn1 demands frm1 >>> - call1 demands frm4 >>> - call2 demands frm5 >>> >>> Whereas we have: >>> - insn1 demands frm1 >>> - call1 demands "frm at the start of the function" >>> - call2 demands "frm after call1 that could have called fesetround" >> Weird, call2 can demand the frm as it existed after call1?!? I'm going >> to try not to cry and return to my bubble :-) > Maybe demand was not really accurate. It's rather that we want call2 to > operate with the global rounding mode that call1 might have changed. > So the demand is not a specific rounding mode but rather the global one. > > And as we might have changed the rounding mode without having restored it > since > we need to jump through those hoops. Not defending, just describing the > status > quo, I don't like it either ;) and yeah it's contrary to what we usually > assume > of liveness etc. > > I'd much rather see only a local backup of the rounding mode in the > mode-changing intrinsics. Like > > backup FRM > set specific FRM > insn with specific FRM > restore FRM > > so we'd always be sure the rounding mode is back to "normal"/"unknown" after > an > intrinsic and wouldn't need to do anything for calls and exits.
I have a prototype which implements the simple/dumb strategy. Indeed calls are no longer special. This does change codegen semantics somewhat (although there's no clear documented ABI so we have some wiggle room) 1. OK: Inline asm based FRM writes are now considered same as calls. They are the only way to facilitate global FRM change at all and now this happens unconditionally whereas in the the old regime things it could behave differently "under certain conditions" (a restore could overwrite a later global update with prior one ) see PR/120404 [1]. Gory details in the PR, but it seems this was not really designed to be, just a fallout of the implementation detail (eager/early backup before any inline asm got inspected). [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=120404 2. OK'ish: A bunch of testcases see more reads/writes as PRE of redundant read/writes is punted to later passes which obviously needs more work. 3. NOK: We loose the ability to instrument local RM writes - especially in the testsuite. e.g. a. instrinsic setting a static RM b. get_frm() to ensure that happened (inline asm to read out frm) The tightly coupled restore kicks in before get_frm could be emitted which fails to observe #a. This is a deal breaker for the testsuite as much of frm tests report as fail even if the actual codegen is sane. > Another argument would also be that we technically aren't allowed to change > the > rounding mode without -frounding-math (as seen in a PR a while ago) because > passes might not do the right thing. Therefore, increasing the region of > non-default rounding mode, could lead to incorrect optimizations. > > Anyway, I think Vineet's patches improve on what we have right now. I'd > still > like to understand if this "abuse" of mode switching gets globally better > results than a very simple approach like above. If the other (non > mode-switching) LCMs cannot really optimize FRM reads and writes the simple > approach could indeed be worse. At this point I think we need to reconsider how to proceed. The simple approach seems promising but we need to solve #3 above first. OTOH we could just get the low hanging fruit by accepting the incremental updates to existing state machine (my v1 series) This solves the optim issue PR/119164, correctness issue PR/120203. The pending things would be optim issue PR/120245 and correctness issue PR/120404 Thx, -Vineet