On 10.09.2020 22:30, Andrew Cooper wrote: > On 10/09/2020 15:57, Jan Beulich wrote: >> On 09.09.2020 11:59, Andrew Cooper wrote: >>> Split into two functions. Passing a load of zeros in results in somewhat >>> poor >>> register scheduling in __context_switch(). >> I'm afraid I don't understand why this would be, no matter that >> I trust you having observed this being the case: The registers >> used for passing parameters are all call-clobbered anyway, so >> the compiler can't use them for anything across the call. And >> it would look pretty poor code generation wise if the XORs to >> clear them (which effectively have no latency at all) would be >> scheduled far ahead of the call, especially when there's better >> use for the registers. The observation wasn't possibly from >> before your recent dropping of two of the parameters, when they >> couldn't all be passed in registers (albeit even then it would >> be odd, as the change then should merely have lead to a slightly >> smaller stack frame of the function)? > > Hmm yes. I wrote this patch before I did the assertion fix, and it the > comment didn't rebase very well. > > Back then, one of the zeros was on the stack, which was definitely an > unwanted property. Even though the XORs are mostly free, they're not > totally free, as they cost decode bandwidth and instruction cache space > (Trivial amounts, but still...). > > In general, LTO's inter-procedural-analysis can figure out that > svm_load_segs_prefetch() doesn't use many registers, and the caller can > be optimised based on the fact that some registers aren't actually > clobbered. (Then again, in this case with a sole caller, LTO really > ought to be able to inline and delete the function.) > > How about "results in unnecessary caller setup code" ?
Yeah, that's probably better as a description. Reviewed-by: Jan Beulich <jbeul...@suse.com> Jan