On Thu, Apr 14, 2016 at 04:58:12PM +0100, Szabolcs Nagy wrote: > On 14/04/16 14:15, Andrew Pinski wrote: > > On Thu, Apr 14, 2016 at 9:08 PM, Maxim Kuvyrkov > > <maxim.kuvyr...@linaro.org> wrote: > >> On Mar 14, 2016, at 11:14 AM, Li Bin <huawei.li...@huawei.com> wrote: > >>> > >>> As ARM64 is entering enterprise world, machines can not be stopped for > >>> some critical enterprise production environment, that is, live patch as > >>> one of the RAS features is increasing more important for ARM64 arch now. > >>> > >>> Now, the mainstream live patch implementation which has been merged in > >>> Linux kernel (x86/s390) is based on the 'ftrace with regs' feature, and > >>> this feature needs the help of gcc. > >>> > >>> This patch proposes a generic solution for arm64 gcc which called mfentry, > >>> following the example of x86, mips, s390, etc. and on these archs, this > >>> feature has been used to implement the ftrace feature 'ftrace with regs' > >>> to support live patch. > >>> > >>> By now, there is an another solution from linaro [1], which proposes to > >>> implement a new option -fprolog-pad=N that generate a pad of N nops at the > >>> beginning of each function. This solution is a arch-independent way for > >>> gcc, > >>> but there may be some limitations which have not been recognized for Linux > >>> kernel to adapt to this solution besides the discussion on [2] > >> > >> It appears that implementing -fprolog-pad=N option in GCC will not enable > >> kernel live-patching support for AArch64. The proposal for the option was > >> to make GCC output a given number of NOPs at the beginning of each > >> function, and then the kernel could use that NOP pad to insert whatever > >> instructions it needs. The modification of kernel instruction stream > >> needs to be done atomically, and, unfortunately, it seems the kernel can > >> use only architecture-provided atomicity primitives -- i.e., changing at > >> most 8 bytes at a time. > >> > > > > Can't we add a 16byte atomic primitive for ARM64 to the kernel? > > Though you need to align all functions to a 16 byte boundary if the > > -fprolog-pag=N needs to happen. Do you know what the size that needs > > to be modified? It does seem to be either 12 or 16 bytes. > > > > looking at [2] i don't see why > > func: > mov x9, x30 > bl _tracefunc > <function body>
Actually, mov x9, x30 bl _tracefunc mov x30, x9 <function body> > is not good for the kernel. > > mov x9, x30 is a nop at function entry, so in > theory 4 byte atomic write should be enough > to enable/disable tracing. Please see my previous reply to Maxim. Thanks, -Takahiro AKASHI > >> From the kernel discussion thread it appears that the pad needs to be more > >> than 8 bytes, and that the kernel can't update that atomically. However > >> if -mfentry approach is used, then we need to update only 4 (or 8) bytes > >> of the pad, and we avoid the atomicity problem. > > > > I think you are incorrect, you could add a 16 byte atomic primitive if > > needed. > > > >> > >> Therefore, [unless there is a clever multi-stage update process to > >> atomically change NOPs to whatever we need,] I think we have to go with > >> Li's -mfentry approach. > > > > Please consider the above of having a 16 byte (128bit) atomic > > instructions be available would that be enough? > > > > Thanks, > > Andrew > > > >> > >> Comments? > >> > >> -- > >> Maxim Kuvyrkov > >> www.linaro.org > >> > >> > >>> , typically > >>> for powerpc archs. Furthermore I think there are no good reasons to > >>> promote > >>> the other archs (such as x86) which have implemented the feature 'ftrace > >>> with regs' > >>> to replace the current method with the new option, which may bring heavily > >>> target-dependent code adaption, as a result it becomes a arm64 dedicated > >>> solution, leaving kernel with two different forms of implementation. > >>> > >>> [1] https://gcc.gnu.org/ml/gcc/2015-10/msg00090.html > >>> [2] > >>> http://lists.infradead.org/pipermail/linux-arm-kernel/2016-January/401854.html > >> > > > -- Thanks, -Takahiro AKASHI