On Wed, Jul 1, 2015 at 8:23 AM, Vladimir Makarov <vmaka...@redhat.com> wrote: > > > On 06/30/2015 05:37 PM, Jakub Jelinek wrote: >> >> On Tue, Jun 30, 2015 at 02:22:33PM -0700, Andy Lutomirski wrote: >>> >>> I'm working on a massive set of cleanups to Linux's syscall handling. >>> We currently have a nasty optimization in which we don't save rbx, >>> rbp, r12, r13, r14, and r15 on x86_64 before calling C functions. >>> This works, but it makes the code a huge mess. I'd rather save all >>> regs in asm and then call C code. >>> >>> Unfortunately, this will add five cycles (on SNB) to one of the >>> hottest paths in the kernel. To counteract it, I have a gcc feature >>> request that might not be all that crazy. When writing C functions >>> intended to be called from asm, what if we could do: >>> >>> __attribute__((extra_clobber("rbx", "rbp", "r12", "r13", "r14", >>> "r15"))) void func(void); >>> >>> This will save enough pushes and pops that it could easily give us our >>> five cycles back and then some. It's also easy to be compatible with >>> old GCC versions -- we could just omit the attribute, since preserving >>> a register is always safe. >>> >>> Thoughts? Is this totally crazy? Is it easy to implement? >>> >>> (I'm not necessarily suggesting that we do this for the syscall bodies >>> themselves. I want to do it for the entry and exit helpers, so we'd >>> still lose the five cycles in the full fast-path case, but we'd do >>> better in the slower paths, and the slower paths are becoming >>> increasingly important in real workloads.) >> >> GCC already supports -ffixed-REG, -fcall-used-REG and -fcall-saved-REG >> options, which allow to tweak the calling conventions; but it is per >> translation unit right now. It isn't clear which of these options >> you mean with the extra_clobber. >> I assume you are looking for a possibility to change this to be >> per-function, with caller with a different calling convention having to >> adjust for different ABI callee. To some extent, recent GCC versions >> do that automatically with -fipa-ra already - if some call used registers >> are not clobbered by some call and the caller can analyze that callee, >> it can stick values in such registers across the call. >> I'd say the most natural API for this would be to allow >> f{fixed,call-{used,saved}}-REG in target attribute. >> >> > One consequence of frequent changing calling convention per function or > register usage could be GCC slowdown. RA calculates too many data and it > requires a lot of time to recalculate them after something in the register > usage convention is changed.
Do you mean that RA precalculates things based on the calling convention and saves it across functions? Hmm. I don't think this would be a big problem in my intended use case -- there would only be a handful of functions using this extension, and they'd have very few non-asm callers. > > Another consequence would be that RA fails generate the code in some cases > and even worse the failure might depend on version of GCC (I already saw PRs > where RA worked for an asm in one GCC version because a pseudo was changed > by equivalent constant and failed in another GCC version where it did not > happen). > Would this be a problem generating code for a function with extra "used" regs or just a problem generating code to call such a function. I imagine that, in the former case, RA's job would be easier, not harder, since there would be more registers to work with. In practice, though, I think it would just end up changing the prologue and epilogue. --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/