On Tue, 11 Jul 2023, Richard Biener wrote:
> > > If a function contains calls then GCC can't know which > > > parts of the XMM regset is clobbered by that, it may be parts > > > which don't even exist yet (say until avx2048 comes out), so we must > > > restrict ourself to only save/restore the SSE2 parts and then of course > > > can only claim to not clobber those parts. > > > > Hm, I guess this is kinda the reason a "weak" form is needed. But this > > highlights the difference between the two: the "weak" form will actively > > preserve some state (so it cannot preserve future extensions), while > > the "strong" form may just passively not touch any state, preserving > > any state it doesn't know about. > > > > > To that end I introduce actually two related attributes (for naming > > > see below): > > > * nosseclobber: claims (and ensures) that xmm8-15 aren't clobbered > > > > This is the weak/active form; I'd suggest "preserve_high_sse". > > Isn't it the opposite? "preserves_low_sse", unless you suggest > the name applies to the caller which has to preserve high parts > when calling nosseclobber. This is the form where the function annnotated with this attribute consumes 128 bytes on the stack to "blindly" save/restore xmm8-15 if it calls anything with a vanilla ABI. (actually thinking about it more, I'd like to suggest shelving this part and only implement the zero-cost variant, noanysseclobber) > > > * noanysseclobber: claims (and ensures) that nothing of any of the > > > registers overlapping xmm8-15 is clobbered (not even future, as of > > > yet unknown, parts) > > > > This is the strong/passive form; I'd suggest "only_low_sse". > > Likewise. Sorry if I managed to sow confusion here. In my mind, this is the form where only xmm0-xmm7 can be written in the function annotated with the attribute, including its callees. I was thinking that writing to zmm16-31 would be disallowed too. The initial example was memcpy, where eight vector registers are sufficient for the job. > As for mask registers I understand we'd have to split the 8 register > set into two halves to make the same approach work, otherwise > we'd have no registers left to allocate from. I'd suggest to look how many mask registers OpenMP SIMD AVX-512 clones can receive as implicit arguments, as one data point. Alexander