Re: [RFC] Kernel livepatching support in GCC

2015-05-30 Thread Li Bin
On 2015/5/28 16:39, Maxim Kuvyrkov wrote:
> Hi,
> 
> Akashi-san and I have been discussing required GCC changes to make kernel's 
> livepatching work for AArch64 and other architectures.  At the moment 
> livepatching is supported for x86[_64] using the following options: "-pg 
> -mfentry -mrecord-mcount -mnop-mcount" which is geek-speak for "please add 
> several NOPs at the very beginning of each function, and make a section with 
> addresses of all those NOP pads".
> 
> The above long-ish list of options is a historical artifact of how 
> livepatching support evolved for x86.  The end result is that for 
> livepatching (or ftrace, or possible future kernel features) to work compiler 
> needs to generate a little bit of empty code space at the beginning of each 
> function.  Kernel can later use that space to insert call sequences for 
> various hooks.
> 
> Our proposal is that instead of adding -mfentry/-mnop-count/-mrecord-mcount 
> options to other architectures, we should implement a target-independent 
> option -fprolog-pad=N, which will generate a pad of N nops at the beginning 
> of each function and add a section entry describing the pad similar to 
> -mrecord-mcount [1].
> 
> Since adding NOPs is much less architecture-specific then outputting call 
> instruction sequences, this option can be handled in a target-independent way 
> at least for some/most architectures.
> 
> Comments?
> 

This proposal sounds good to me, and I look forward to it be merged soon:)
Then I'll make the appropriate changes in kernel.
Thanks!
Li Bin

> As I found out today, the team from Huawei has implemented [2], which follows 
> x86 example of -mfentry option generating a hard-coded call sequence.  I hope 
> that this proposal can be easily incorporated into their work since most of 
> the livepatching changes are in the kernel.
> 
> [1] Technically, generating a NOP pad and adding a section entry in 
> .__mcount_loc are two separate actions, so we may want to have a 
> -fprolog-pad-record option.  My instinct is to stick with a single option for 
> now, since we can always add more later.
> 
> [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2015-May/346905.html
> 
> --
> Maxim Kuvyrkov
> www.linaro.org
> 
> 
> 
> 
> 




Re: Better info for combine results in worse code generated

2015-05-30 Thread Segher Boessenkool
On Sat, May 30, 2015 at 10:47:27AM +0930, Alan Modra wrote:
> > > > I think this is too simplistic though.  For example, AND with -7 is not
> > > > zero-extended (rlwinm rD,rA,0,31,28 sets the high 32 bits of rD to the 
> > > > low
> > > > 32 bits of rA).
> > > 
> > > We take some pains in rs6000.md to ensure that the wrap-around case
> > > for rlwinm does not occur for TARGET_POWERPC64.
> > 
> > I consider that a bug; it pessimises code.
> 
> At the time I added the checks for wrap-around, I recall that gcc
> generated wrong code without the fix.

It still does: some of the things that use mask_operand cannot handle
a wrapped around (MB > ME) 32-bit mask in DImode.

> > > You'll find that an
> > > SImode AND with any value is in fact zero extending.
> > 
> > int f(int x) { return x & 0xc000; }
> > 
> > is a counter-example with current trunk (it does a rldicr).
> 
> Huh, that does look like you've destroyed my claim about SImode AND.

Carefully worded :-)


I don't think it is a good idea to optimise code based on assumptions
of what SImode SETs will do to the dest seen as DImode, without making
those assumptions explicit in the RTL.


Segher