Re: scripts/kallsyms: Avoid ARM veneer symbols
On Sat, Jul 06, 2013 at 01:34:56AM +0200, Arnd Bergmann wrote: > On Friday 05 July 2013, Dave P Martin wrote: > > On Fri, Jul 05, 2013 at 05:42:44PM +0100, Arnd Bergmann wrote: > > > On Friday 05 July 2013, Dave P Martin wrote: > > > > On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote: > > > > I think there are a small number of patterns to check for. > > > > __*_veneer, __*_from_arm and __*_from_thumb should cover most cases. > > Ok. > > > > * There are actually symbols without a name on ARM, which screws up the > > > kallsyms.c parser. These also seem to be veneers, but attached to some > > > random function: > > > > Hmmm, I don't what those are. By default, we should probably ignore those > > too. Maybe they have something to do with link-time relocation processing. > > Definitely link-time. It only shows up after the final link, and only > with ld.bfd not with ld.gold as I found out now. > > > > $ nm obj-tmp/.tmp_vmlinux1 | head > > > c09e8db1 t > > > c09e8db5 t > > > c09e8db9 t# <== > > > c09e8dbd t > > > c0abfc29 t > > > c0008000 t $a > > > c0f7b640 t $a > > > > > > $ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db. > > > c0851fcc : > > > c0851fcc: b538push{r3, r4, r5, lr} > > > c0851fce: b500push{lr} > > > c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc> > > > c0851fd4: f240 456b movwr5, #1131 ; 0x46b > > > c0851fd8: 4604mov r4, r0 > > > c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5 > > > c0851fde: 462amov r2, r5 > > > c0851fe0: f44f 710b mov.w r1, #556; 0x22c > > > c0851fe4: f7ff fe6d bl c0851cc2 > > > c0851fe8: 4620mov r0, r4 > > > c0851fea: 462amov r2, r5 > > > c0851fec: f240 212d movwr1, #557; 0x22d > > > c0851ff0: f7ff fe67 bl c0851cc2 > > > c0851ff4: 4620mov r0, r4 > > > c0851ff6: f240 212e movwr1, #558; 0x22e > > > c0851ffa: f44f 7270 mov.w r2, #960; 0x3c0 > > > c0851ffe: f196 fedb bl c09e8db8 > > > # <=== > > > c0852002: 4620mov r0, r4 > > > c0852004: f240 212f movwr1, #559; 0x22f > > > c0852008: f44f 7270 mov.w r2, #960; 0x3c0 > > > c085200c: e8bd 4038 ldmia.w sp!, {r3, r4, r5, lr} > > > c0852010: f7ff be57 b.w c0851cc2 > > > > > > > > > ... # in tpci200_free_irq: > > > c09e8d9e: e003b.n c09e8da8 > > > c09e8da0: f06f 0415 mvn.w r4, #21 > > > c09e8da4: e000b.n c09e8da8 > > > c09e8da6: 4c01ldr r4, [pc, #4]; (c09e8dac > > > ) > > > c09e8da8: 4620mov r0, r4 > > > c09e8daa: bdf8pop {r3, r4, r5, r6, r7, pc} > > > c09e8dac: fe00; > > > instruction: 0xfe00 > > > c09e8db0: f4cf b814 b.w c06b7ddc > > > > > > c09e8db4: f53e bed8 b.w c0727b68 > > > c09e8db8: f668 bf83 b.w c0851cc2 > > > # <== > > > c09e8dbc: d101bne.n c09e8dc2 > > > c09e8dbe: f435 b920 b.w c061e002 > > > > > > > > > It makes no sense to me at all that a function in one driver can just call > > > write_phy_reg a couple of times, but need a veneer in the middle, and put > > > that veneer in a totally unrelated function in another driver! > > > > I think that if ld inserts a veneer for a function anywhere, branches > > from any object in the link to that target symbol can reuse the same > > veneer as a trampoline, effectively appearing to branch through an > > unrelated location to reach the destination. > > That part makes sense, but it doesn't explain why ld would do that just > for the third out of four identical function calls in the example above. > > > ld inserts veneers between individual input sections, but I don't > > think they have to go next to the same section the branch originates > > from. In the above code, it look
Re: scripts/kallsyms: Avoid ARM veneer symbols
On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote: > On Tuesday 25 June 2013, Dave Martin wrote: > > On Mon, Jun 24, 2013 at 04:01:43PM +0200, Arnd Bergmann wrote: > > > When building ARM kernels that exceed a certain size, the linker > > > will add "veneers" that allow long branches. Unfortunately, > > > using a longer kallsyms section may lead to the extra veneers > > > being needed, which leads to inconsistent kallsyms data with the > > > message > > > > > > Inconsistent kallsyms data > > > Try make KALLSYMS_EXTRA_PASS=1 as a workaround > > > > > > In some case, not just one but up to three extra passes were > > > needed before getting to a state that needed no extra veneers. > > > > Do you understand why this was happening? It sounds like there > > must have been branches crossing from one side of the kallsyms > > data to the other, triggering veneer insertion any time the > > kallsyms data grows. > > > > Branches between .{init,exit}.text and .text (crossing .rodata) seem the > > likeliest culprits, but that's guesswork on my part. > > kallsyms is part of rodata, which is between .text and .init.text. > I understand this is intentional, and we already use two passes > of kallsyms to make sure the size is right. However the assumption > is that only the data of the kallsyms table changes between runs, > not the size of it. I guess sticking the init stuff at the end makes it tidier to discard without leaving a gap in the middle of the kernel image. There might be other reasons. > > If that's whats going on, multiple kallsyms passes is actually a correct > > approach here: I think they should terminate after a number of steps > > roughly proportional to log(number of branches across .rodata). We > > could come up with a heuristic which allows us to choose the right > > limit with a high degree of reliability, since branch density in > > compiled C code is likely to be roughly. > > It also seems to be a part of the design to do exactly two passes > and treat anything beyond that as bugs, doing it the way you Sure -- I was that making the point the implementation is based on an assumption that isn't true here. > suggest would significantly increase the build time since the > kallsyms+linker stage cannot be done in parallel or sped up > using ccache. > > > But including the veneer symbols in kallsyms is arguably not all > > that useful. > > > > The main potential effect is that profiling might occasionally > > sample the PC as being in a completely bogus function which it > > never passed through at all, because of the way kallsyms tracks > > only symbol locations and not sizes (if I remember right) -- > > so a veneer will actually get accounted against some arbitrary > > adjacent function. > > > > I don't know how much this matters. > > Interesting point. No idea how often that happens. All the veneers > for one section are in one place though, so we could in theory > add a kallsyms entry for that section as long as can identify > it. We could collapse any contiguous sequence of __veneer_* symbols down to a single symbol to mark those holes. However, many kallsyms passes could still be needed: each pass might add extra veneer blocks, unless the size of kallsyms data is identical to that in the previous pass. The expected convergence rate is faster, though. > > > The easiest solution is to skip veneers in kallsyms in the > > > same way we already skip a couple of other symbols. Your suggestion of omitting the symbols completely seems to be the only way to ensure convergence in 2 passes, so far as I can see. Adding size information to every entry in kallsyms would make it possible to identify the "holes" without potentially requiring many kallsyms passes, but it would bloat the table. The extra info would be interesting only for a tiny subset of the symbols. I expect people aren't going to like that much. So I guess your original suggestion may be the best thing to do for now, after all... There is no proper reserved symbol namespace for linker-generated symbols, so a real symbol could have the name __*_veneer, at which point things start to get really confusing. But hopefully that won't happen much. I don't see any such symbol right now, and hopefully nobody will be so silly as to add one... If we eventually need to fix the bogus symbol resolution problem, I can't see an alternative to adding size info to every symbol. But we should leave that for now. It doesn't sound like a serious problem. > > The other symbols are not stripped for the purpose of making > > kallsyms terminate quickly. The mapping symbols are rather > > different: masses of symbols with duplicate names which are > > not very interesting for most people. > > Right. > > > > Signed-off-by: Arnd Bergmann > > > --- > > > diff --git a/scripts/kallsyms.c b/scripts/kallsyms.c > > > index 487ac6f..53ec0bb 100644 > > > --- a/scripts/kallsyms.c > > > +++ b/scripts/kallsyms.c > > > @@ -69,14 +69,32 @@ st
Re: [PATCH] ARM: move body of head-common.S back to text section
On Wed, Jul 03, 2013 at 08:22:35PM -0400, Paul Gortmaker wrote: > [Re: [PATCH] ARM: move body of head-common.S back to text section] On > 03/07/2013 (Wed 18:20) Russell King - ARM Linux wrote: > > > On Wed, Jul 03, 2013 at 11:30:12AM -0400, Paul Gortmaker wrote: > > > [Re: [PATCH] ARM: move body of head-common.S back to text section] On > > > 03/07/2013 (Wed 11:00) Russell King - ARM Linux wrote: > > > > > > > On Wed, Jul 03, 2013 at 01:19:07AM -0400, Paul Gortmaker wrote: > > > > > As an aside, I'm now thinking any __INIT that implicitly rely on EOF > > > > > for > > > > > closure are nasty traps waiting to happen and it might be worthwhile > > > > > to > > > > > audit and explicitly __FINIT them before someone appends to the > > > > > file... > > > > > > > > That hides a different kind of bug though - I hate __FINIT for exactly > > > > that reason. Consider this: > > > > > > Agreed - perhaps masking that it is a ".previous" just hides the fact > > > that it is more like a pop operation vs. an on/off operation, or per > > > function as we have in C. > > > > I read the info pages, because I thought it was a pop operation too. > > I was concerned that .section didn't push the previous section onto the > > stack. > > > > However, .popsection is the pseudio-op which pops. .previous just toggles > > the current section with the section immediately before it. > > > > So: > > > > .text > > .data > > .previous > > /* this is .text */ > > .previous > > /* this is .data */ > > .previous > > /* this is .text */ > > .previous > > /* this is .data */ > > Cool -- I bet we weren't the only ones thinking it was a pop. Thanks. > > Does that make __FINIT less evil than we previously assumed? I think > your example was the following pseudo-patch: > > > .text > > + .data > + > __INIT > > __FINIT > /* this below used to be text */ > > > Even if it is a toggle (vs. pop), the end text will now become data, > so the no-op __FINIT with an explicit section called out just below > it may still be the most unambiguous way to indicate what is going on. > > > > > > That seems reasonable to me. I can't think of any self auditing that is > > > reasonably simple to implement. One downside of __FINIT as a no-op vs. > > > what it is today, is that a dangling __FINIT in a file with no other > > > previous sections will emit a warning. But that is a small low value > > > corner case I think. > > > > That warning from __FINIT will only happen if there has been no section > > or .text or .data statement in the file at all. As soon as you have any > > statement setting any kind of section, .previous doesn't warn. > > > > So: > > > > .text > > ... > > __FINIT > > > > produces no warning.o > > Yep -- we are both saying the same thing here - hence why I called it a > small low value corner case. Note that .previous has another important gotcha. Consider: __INIT /* now in .text.init */ ALT_UP(...) /* now in .text.init */ __FINIT /* now in .alt.smp.text! */ .previous (or macros containing a dangling .previous) shouldn't be used unless you're absolutely certain what the previous section was. In general: label: .previous restores to the section which was current at label, only if there are no section directives in , nor anything which could contain a section directive after macro expansion. The same goes for the hidden, dangling .previous embedded in __FINIT and friends. Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: scripts/kallsyms: Avoid ARM veneer symbols
On Fri, Jul 05, 2013 at 05:42:44PM +0100, Arnd Bergmann wrote: > On Friday 05 July 2013, Dave P Martin wrote: > > On Wed, Jul 03, 2013 at 06:03:04PM +0200, Arnd Bergmann wrote: > > > On Tuesday 25 June 2013, Dave Martin wrote: > > > suggest would significantly increase the build time since the > > > kallsyms+linker stage cannot be done in parallel or sped up > > > using ccache. > > > > > > > But including the veneer symbols in kallsyms is arguably not all > > > > that useful. > > > > > > > > The main potential effect is that profiling might occasionally > > > > sample the PC as being in a completely bogus function which it > > > > never passed through at all, because of the way kallsyms tracks > > > > only symbol locations and not sizes (if I remember right) -- > > > > so a veneer will actually get accounted against some arbitrary > > > > adjacent function. > > > > > > > > I don't know how much this matters. > > > > > > Interesting point. No idea how often that happens. All the veneers > > > for one section are in one place though, so we could in theory > > > add a kallsyms entry for that section as long as can identify > > > it. > > > > We could collapse any contiguous sequence of __veneer_* symbols down > > to a single symbol to mark those holes. > > > > However, many kallsyms passes could still be needed: each pass might add > > extra veneer blocks, unless the size of kallsyms data is identical to > > that in the previous pass. The expected convergence rate is faster, > > though. > > Right, it wouldn't guarantee to converge after two passes, just make > it very likely. > > > > > > The easiest solution is to skip veneers in kallsyms in the > > > > > same way we already skip a couple of other symbols. > > > > Your suggestion of omitting the symbols completely seems to be the only > > way to ensure convergence in 2 passes, so far as I can see. > > > > > > Adding size information to every entry in kallsyms would make it possible > > to identify the "holes" without potentially requiring many kallsyms passes, > > but it would bloat the table. The extra info would be interesting only > > for a tiny subset of the symbols. I expect people aren't going to like > > that much. > > > > So I guess your original suggestion may be the best thing to do for now, > > after all... > > > > There is no proper reserved symbol namespace for linker-generated symbols, > > so a real symbol could have the name __*_veneer, at which point things > > start to get really confusing. But hopefully that won't happen much. > > I don't see any such symbol right now, and hopefully nobody will be so > > silly as to add one... > > > > If we eventually need to fix the bogus symbol resolution problem, I can't > > see an alternative to adding size info to every symbol. But we should > > leave that for now. It doesn't sound like a serious problem. > > Unfortunately I have run into additional problems now after doing a few > hundred more builds: > > * not all veneers end in _veneer, some also end in _from_arm or _from_thumb. > This is easy enough to check for in the same way I did for _veneer. I think there are a small number of patterns to check for. __*_veneer, __*_from_arm and __*_from_thumb should cover most cases. > * There are actually symbols without a name on ARM, which screws up the > kallsyms.c parser. These also seem to be veneers, but attached to some > random function: Hmmm, I don't what those are. By default, we should probably ignore those too. Maybe they have something to do with link-time relocation processing. > > $ nm obj-tmp/.tmp_vmlinux1 | head > c09e8db1 t > c09e8db5 t > c09e8db9 t# <== > c09e8dbd t > c0abfc29 t > c0008000 t $a > c0f7b640 t $a > > $ objdump -Dr obj-tmp/.tmp_vmlinux1 | grep -C 30 c09e8db. > c0851fcc : > c0851fcc: b538push{r3, r4, r5, lr} > c0851fce: b500push{lr} > c0851fd0: f7bb d8dc bl c000d18c <__gnu_mcount_nc> > c0851fd4: f240 456b movwr5, #1131 ; 0x46b > c0851fd8: 4604mov r4, r0 > c0851fda: f880 14d5 strb.w r1, [r0, #1237] ; 0x4d5 > c0851fde: 462amov r2, r5 > c0851fe0: f44f 710b mov.w r1, #556; 0x22c > c0851fe4: f7ff fe6d bl c0851cc2 > c0851fe8: 4620mov
Re: [PATCH 1/7] EFI stub documentation updates
On Tue, Aug 06, 2013 at 12:56:12AM +0100, Roy Franz wrote: > On Mon, Aug 5, 2013 at 7:12 AM, Dave Martin wrote: > > On Fri, Aug 02, 2013 at 02:29:02PM -0700, Roy Franz wrote: > >> The ARM kernel also has an EFI stub which works largely the same way > >> as the x86 stub, so move the documentation out of x86 directory and > >> update to reflect that it is generic, and add ARM specific text. > >> > >> Signed-off-by: Roy Franz > >> --- > >> Documentation/efi-stub.txt | 78 > >> > >> Documentation/x86/efi-stub.txt | 65 - > >> arch/x86/Kconfig |2 +- > >> 3 files changed, 79 insertions(+), 66 deletions(-) > >> create mode 100644 Documentation/efi-stub.txt > >> delete mode 100644 Documentation/x86/efi-stub.txt > >> > >> diff --git a/Documentation/efi-stub.txt b/Documentation/efi-stub.txt > >> new file mode 100644 > >> index 000..7837df1 > >> --- /dev/null > >> +++ b/Documentation/efi-stub.txt > >> @@ -0,0 +1,78 @@ > >> + The EFI Boot Stub > >> + --- > >> + > >> +On the x86 and ARM platforms, a bzImage can masquerade as a PE/COFF image, > >> +thereby convincing EFI firmware loaders to load it as an EFI > >> +executable. The code that modifies the bzImage header, along with the > > > > > > Minor nit, I don't think there is such a thing as "bzImage" for ARM. > > > > Cheers > > ---Dave > > > Yeah, I don't thinks so either... How about: > > On the x86 and ARM platforms, a kernel zImage/bzImage can masquerade Souds fine to me Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] Add EFI stub for ARM
On Mon, Aug 05, 2013 at 04:33:19PM +0100, Leif Lindholm wrote: > On Mon, Aug 05, 2013 at 03:11:49PM +0100, Dave Martin wrote: > > > diff --git a/arch/arm/boot/compressed/head.S > > > b/arch/arm/boot/compressed/head.S > > > index 75189f1..4c70b9e 100644 > > > --- a/arch/arm/boot/compressed/head.S > > > +++ b/arch/arm/boot/compressed/head.S > > > @@ -122,19 +122,106 @@ > > > .arm@ Always enter in ARM state > > > start: > > > .type start,#function > > > - .rept 7 > > > +#ifdef CONFIG_EFI_STUB > > > + @ Magic MSDOS signature for PE/COFF + ADD opcode > > > + .word 0x62805a4d > > > > What about BE32? > > The ARM bindings for UEFI specify that the processor must be in > little-endian mode. > > > In that case, the instruction is a coprocessor load, that loads from a > > random address to a coprocessor that almost certainly doesn't exist. > > This will probably fault. > > > > Since BE32 is only for older platforms ( > solvable, it might be sensible to make the EFI stub support depend on > > !CPU_ENDIAN_BE32. > > Well, it would make more sense to make EFI_STUB depend on EFI and > EFI depend on !CPU_ENDIAN_BE32. Which is something I can add to > my next set of general ARM UEFI patches. Thanks. That sounds fair enough. Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 6/7] Add EFI stub for ARM
On Tue, Aug 06, 2013 at 01:06:17AM +0100, Roy Franz wrote: > On Mon, Aug 5, 2013 at 8:33 AM, Leif Lindholm > wrote: > > On Mon, Aug 05, 2013 at 03:11:49PM +0100, Dave Martin wrote: > >> > diff --git a/arch/arm/boot/compressed/head.S > >> > b/arch/arm/boot/compressed/head.S > >> > index 75189f1..4c70b9e 100644 > >> > --- a/arch/arm/boot/compressed/head.S > >> > +++ b/arch/arm/boot/compressed/head.S > >> > @@ -122,19 +122,106 @@ > >> > .arm@ Always enter in ARM state > >> > start: > >> > .type start,#function > >> > - .rept 7 > >> > +#ifdef CONFIG_EFI_STUB > >> > + @ Magic MSDOS signature for PE/COFF + ADD opcode > >> > + .word 0x62805a4d Is BE8 supported? If so, this would put the bytes 62 80 5A 4D in the binary, which is not the right magic. For this magic, you could use .byte instead. To help future maintainers, I suggest noting in a comment that executing through this magic relies on little-endian instruction byte order (so, LE or BE8), and the ARM instruction set. What about the endianness of the other PE/COFF header fields? Are they always little-endian, or are some fields native-endian (and if so, how is the endianness of the header determined by the loader)? > >> > >> What about BE32? > > > > The ARM bindings for UEFI specify that the processor must be in > > little-endian mode. > > > >> In that case, the instruction is a coprocessor load, that loads from a > >> random address to a coprocessor that almost certainly doesn't exist. > >> This will probably fault. > >> > >> Since BE32 is only for older platforms ( >> solvable, it might be sensible to make the EFI stub support depend on > >> !CPU_ENDIAN_BE32. > > > > Well, it would make more sense to make EFI_STUB depend on EFI and > > EFI depend on !CPU_ENDIAN_BE32. Which is something I can add to > > my next set of general ARM UEFI patches. Thanks. > > / > > Leif > > I had EFI_STUB depend on EFI at one point during my development, but took > it out because there was no actual dependency (the stub will work fine without > other EFI features.) The features will most likely be used together, > but I wasn't > sure if we would want to enforce this with a config dependency. I > don't care one > way or the other, I'd just like the dependencies to be correct and > follow best practices. I guess it's up to you, so long as the constraint is expressed in Kconfig somehow. Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 16/17] Add EFI stub for ARM
On Thu, Aug 08, 2013 at 10:57:29PM +0100, Roy Franz wrote: > On Wed, Aug 7, 2013 at 11:05 AM, Dave Martin wrote: > > On Tue, Aug 06, 2013 at 08:45:12PM -0700, Roy Franz wrote: > >> This patch adds EFI stub support for the ARM Linux kernel. The EFI stub > >> operations similarly to the x86 stub: it is a shim between the EFI firmware > >> and the normal zImage entry point, and sets up the environment that the > >> zImage is expecting. This includes loading the initrd (optionaly) and > >> device tree from the system partition based on the kernel command line. > >> The stub updates the device tree as necessary, including adding reserved > >> memory regions and adding entries for EFI runtime services. The PE/COFF > >> "MZ" header at offset 0 results in the first instruction being an add > >> that corrupts r5, which is not used by the zImage interface. > > > > Some more comments below ... note that I haven't really looked at the C > > code in depth. > > Responses below, and I'm working on incorporating suggested changes > for the next version. I few responses-to-responses from me inline. Your repose supersedes most of this anyhow. Cheers ---Dave > > Thanks, > Roy > > > > > Cheers > > ---Dave > > > >> > >> Signed-off-by: Roy Franz > >> --- > >> arch/arm/boot/compressed/Makefile | 18 +- > >> arch/arm/boot/compressed/efi-header.S | 114 > >> arch/arm/boot/compressed/efi-stub.c | 514 > >> + > >> arch/arm/boot/compressed/head.S | 90 +- > >> 4 files changed, 728 insertions(+), 8 deletions(-) > >> create mode 100644 arch/arm/boot/compressed/efi-header.S > >> create mode 100644 arch/arm/boot/compressed/efi-stub.c > >> > >> diff --git a/arch/arm/boot/compressed/Makefile > >> b/arch/arm/boot/compressed/Makefile > >> index 7ac1610..c62826a 100644 > >> --- a/arch/arm/boot/compressed/Makefile > >> +++ b/arch/arm/boot/compressed/Makefile > >> @@ -106,8 +106,22 @@ $(addprefix $(obj)/,$(libfdt) $(libfdt_hdrs)): > >> $(obj)/%: $(srctree)/scripts/dtc/ > >> $(addprefix $(obj)/,$(libfdt_objs) atags_to_fdt.o): \ > >> $(addprefix $(obj)/,$(libfdt_hdrs)) > >> > >> +$(addprefix $(obj)/,$(libfdt_objs) efi-stub.o): \ > >> + $(addprefix $(obj)/,$(libfdt_hdrs)) > >> + > > > > Don't we make $(libfdt_objs) depend on $(libfdt_hdrs) twice, now? > > > > Would it make sense just to add efi-stub.o to the list of targets in the > > original rule? > > Yes, change made. > > > >> ifeq ($(CONFIG_ARM_ATAG_DTB_COMPAT),y) > >> -OBJS += $(libfdt_objs) atags_to_fdt.o > >> +OBJS += atags_to_fdt.o > >> +USE_LIBFDT = y > >> +endif > >> + > >> +ifeq ($(CONFIG_EFI_STUB),y) > >> +CFLAGS_efi-stub.o += -DTEXT_OFFSET=$(TEXT_OFFSET) > >> +OBJS += efi-stub.o > >> +USE_LIBFDT = y > >> +endif > >> + > >> +ifeq ($(USE_LIBFDT),y) > >> +OBJS += $(libfdt_objs) > >> endif > >> > >> targets := vmlinux vmlinux.lds \ > >> @@ -125,7 +139,7 @@ ORIG_CFLAGS := $(KBUILD_CFLAGS) > >> KBUILD_CFLAGS = $(subst -pg, , $(ORIG_CFLAGS)) > >> endif > >> > >> -ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj) > >> +ccflags-y := -fpic -mno-single-pic-base -fno-builtin -I$(obj) > >> -fno-stack-protector > > > > You don't appear to explain this change anywhere. > > Prior to my changes, even though the stack protector was not disabled, > it was not actually used. GCC uses a heuristic > based on the size of the stack whether to enable the stack protector, > and the threshold to trigger its use was not met, so no stack checking > was actually being done. In order to do stack protection, a few > __stack_chk_* functions/variable need to be provided by the > application. I worked a bit on adding these, but could not get them > working in the stub/decompressor. The x86 arch also has > "-fno-stack-protector" defined for its compressed boot stub, so I > decided to go that route as well. > > > > >> asflags-y := -DZIMAGE > >> > >> # Supply kernel BSS size to the decompressor via a linker symbol. > >> diff --git a/arch/arm/boot/compressed/efi-header.S > >> b/arch/arm/boot/compressed/efi-header.S > >> new file mode 100644 > >> index 000..6ff32cc > >> --- /dev/null > >> +++ b/arch/arm/boot/compressed/efi-header.S > >> @@ -0,0 +1,114 @@ > >> +@ Copyright (C) 2013 Linaro Ltd; > >> +@ > >> +@ This file contains the PE/COFF header that is part of the > >> +@ EFI stub. > >> +@ > >> + > >> + .org0x3c > >> + @ > >> + @ The PE header can be anywhere in the file, but for > >> + @ simplicity we keep it together with the MSDOS header > >> + @ The offset to the PE/COFF header needs to be at offset > >> + @ 0x3C in the MSDOS header. > >> + @ The only 2 fields of the MSDOS header that are used are this > >> + @ PE/COFF offset, and the "MZ" bytes at offset 0x0. > >> + @ > >> + .long pe_header @ Offset to the PE header. > > > > Is there any chance of merging this with the equivalent x86 code? > > > > The PE/COFF header is much
Re: [PATCH 33/38] arm64: Implement thread_struct whitelist for hardened usercopy
On Thu, Jan 11, 2018 at 02:03:05AM +, Kees Cook wrote: > This whitelists the FPU register state portion of the thread_struct for > copying to userspace, instead of the default entire structure. > > Cc: Catalin Marinas > Cc: Will Deacon > Cc: Christian Borntraeger > Cc: Ingo Molnar > Cc: James Morse > Cc: "Peter Zijlstra (Intel)" > Cc: Dave Martin > Cc: zijun_hu > Cc: linux-arm-ker...@lists.infradead.org > Signed-off-by: Kees Cook > --- > arch/arm64/Kconfig | 1 + > arch/arm64/include/asm/processor.h | 8 > 2 files changed, 9 insertions(+) > > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index a93339f5178f..c84477e6a884 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -90,6 +90,7 @@ config ARM64 > select HAVE_ARCH_MMAP_RND_BITS > select HAVE_ARCH_MMAP_RND_COMPAT_BITS if COMPAT > select HAVE_ARCH_SECCOMP_FILTER > + select HAVE_ARCH_THREAD_STRUCT_WHITELIST > select HAVE_ARCH_TRACEHOOK > select HAVE_ARCH_TRANSPARENT_HUGEPAGE > select HAVE_ARCH_VMAP_STACK > diff --git a/arch/arm64/include/asm/processor.h > b/arch/arm64/include/asm/processor.h > index 023cacb946c3..e58a5864ec89 100644 > --- a/arch/arm64/include/asm/processor.h > +++ b/arch/arm64/include/asm/processor.h > @@ -113,6 +113,14 @@ struct thread_struct { > struct debug_info debug; /* debugging */ > }; > > +/* Whitelist the fpsimd_state for copying to userspace. */ > +static inline void arch_thread_struct_whitelist(unsigned long *offset, > + unsigned long *size) > +{ > + *offset = offsetof(struct thread_struct, fpsimd_state); > + *size = sizeof(struct fpsimd_state); This should be fpsimd_state.user_fpsimd (fpsimd_state.cpu is important for correctly context switching and not supposed to be user-accessible. A user copy that encompasses that is definitely a bug). Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH 0/3] arm64/sve: UAPI: Disentangle ptrace.h from sigcontext.h
On Fri, Dec 14, 2018 at 07:00:07PM +, Szabolcs Nagy wrote: > On 14/12/2018 18:25, Dave Martin wrote: > > On Fri, Dec 14, 2018 at 06:13:33PM +, Szabolcs Nagy wrote: > >> On 11/12/2018 19:26, Dave Martin wrote: > >>> This patch refactors the UAPI header definitions for the Arm SVE > >>> extension to avoid multiple-definition problems when userspace mixes its > >>> own sigcontext.h definitions with the kernel's ptrace.h (which is > >>> apparently routine). > >>> > >>> A common backend header is created to hold common definitions, suitably > >>> namespaced, and with an appropriate header guard. > >>> > >>> See the commit message in patch 3 for further explanation of why this > >>> is needed. > >>> > >>> Because of the non-trivial header guard in the new sve_context.h, patch > >>> 1 adds support to headers_install.sh to munge #if defined(_UAPI_FOO) in > >>> a similar way to the current handling of #ifndef _UAPI_FOO. > >>> > >> > >> thanks for doing this. > >> > >> the patches fix the gdb build issue on musl libc with an > >> additional gdb patch: > >> https://sourceware.org/ml/gdb-patches/2018-12/msg00152.html > >> (in userspace i'd expect users relying on signal.h providing > >> whatever is in asm/sigcontext.h.) > >> > >> i think sve_context.h could be made to work with direct include, > >> even if that's not useful because there is no public api there. > >> (and then you dont need the first patch) > > > > My general view is that if you want the sigframe types userspace should > > usually include and refer to mcontext_t. > > > > ucontext.h does not expose the asm/sigcontext.h types in glibc, > but it is compatible with the inclusion of asm/sigcontext.h > (or signal.h). > > in musl ucontext.h includes signal.h and signal.h provides > the asm/sigcontext.h api with abi compatible definitions. > > > Because the prototype for sa_sigaction() specifies a void * for the > > ucontext argument, I've generally assumed that is not > > sufficient to get ucontext_t (or mcontext_t) (but maybe I'm too paranoid > > there). > > http://pubs.opengroup.org/onlinepubs/9699919799/basedefs/signal.h.html > > "The header shall define the ucontext_t type as a structure > that shall include at least the following members: > ... > mcontext_t uc_mcontext A machine-specific representation of the saved > context." > > so signal.h must define ucontext_t but mcontext_t can be opaque. > (it is opaque with posix conform feature tests to avoid > namespace pollution, but with _GNU_SOURCE defined all > asm/sigcontext.h apis are there and mcontext_t matches > struct sigcontext) I see. Sounds reasonable. > > > > Non-POSIX-flavoured software might include directly. > > In glibc/musl libc will that conflict with , or can the two > > coexist? > > if you compile e.g in standard conform mode then > i think signal.h and asm/sigcontext.h are compatible. So long as we don't break any existing usage (?) I guess this is fine. Cheers ---Dave
Re: [PATCH] arch/arm64 : fix error in dump_backtrace
On Tue, Nov 06, 2018 at 11:00:19AM +, Mark Rutland wrote: > On Tue, Nov 06, 2018 at 08:57:51AM +, Daniel Thompson wrote: > > On Tue, Nov 06, 2018 at 08:39:01AM +, Mark Rutland wrote: > > > On Tue, Nov 06, 2018 at 03:19:35PM +0800, Zhaoyang Huang wrote: > > > > From: Zhaoyang Huang > > > > > > > > In some cases, the instruction of "bl foo1" will be the last one of the > > > > foo2[1], which will cause the lr be the first instruction of the > > > > adjacent > > > > foo3[2]. Hence, the backtrace will show the weird result as bellow[3]. > > > > The patch will fix it by miner 4 of the lr when dump_backtrace > > > > > > This has come up in the past (and a similar patch has been applied, then > > > reverted). > > > > > > In general, we don't know that a function call was made via BL, and > > > therefore > > > cannot know that LR - 4 is the address of the caller. The caller could > > > set up > > > the LR as it likes, then B or BR to the callee, and depending on how the > > > basic > > > blocks get laid out in memory, LR - 4 might point at something completely > > > different. > > > > > > More ideally, the compiler wouldn't end a function with a BL. When does > > > that > > > happen, and is there some way we could arrange for that to not happen? > > > e.g. > > > somehow pad a NOP after the BL. > > > > It's a consequence of having __noreturn isn't it? __noreturn frees the > > compiler from the burden of having to produce a valid return stack... so > > it doesn't and unwinding becomes hard. > > In that case, the compiler could equally just use B rather than BL, > which this patch doesn't solve. > > The documentation for the GCC noreturn attribute [1] says: > > | In order to preserve backtraces, GCC will never turn calls to noreturn > | functions into tail calls. > > ... so clearly it's not intended to mess up backtracing. Which is a bit odd, since every call to a noreturn function is a tail- call by definition, and other tail-calls are typically optimised to a B (thus interfering with backtracing in all except the noreturn case). Avoiding this would require a change to the compiler, and since there's no obvious correct answer anyway, I guess we shouldn't hold our breath. > IIUC we mostly use noreturn to prevent warings about uninitialised > variables and such after a call to a noreturn function. I think > optimization is a secondary concern. Agreed. > We could ask the GCC folk if they can ensure that a noreturn function > call leave thes LR pointing into the caller, e.g. by padding with a NOP: > > BL > NOP > > That seems cheap enough, and would keep backtraces reliable. -fpatchable-function-entry=1,1 does almost the right thing, by inserting 1 NOP at the start of each function, and putting the function entry point after that (1) NOP. This works on gcc-8.1. I haven't experimented with earlier compilers. It could be considered an abuse, since we'd not quite be using the option for its intended purpose. It also causes every function entry point to be misaligned due to the NOP insertion, which we may or may not care about. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [RFC PATCH v6 1/9] tracing: Add array printing helpers
On Fri, Dec 05, 2014 at 07:04:12PM +, Javi Merino wrote: > From: Dave Martin > > If a trace event contains an array, there is currently no standard > way to format this for text output. Drivers are currently hacking > around this by a) local hacks that use the trace_seq functionailty > directly, or b) just not printing that information. For fixed size > arrays, formatting of the elements can be open-coded, but this gets > cumbersome for arrays of non-trivial size. > > These approaches result in non-standard content of the event format > description delivered to userspace, so userland tools needs to be > taught to understand and parse each array printing method > individually. > > This patch implements common __print__array() helpers that > tracepoint implementations can use instead of reinventing them. A > simple C-style syntax is used to delimit the array and its elements > {like,this}. > > So that the helpers can be used with large static arrays as well as > dynamic arrays, they take a pointer and element count: they can be > used with __get_dynamic_array() for use with dynamic arrays. > > Cc: Steven Rostedt > Cc: Ingo Molnar > Signed-off-by: Dave Martin > --- > include/linux/ftrace_event.h | 9 > include/trace/ftrace.h | 17 +++ > kernel/trace/trace_output.c | 51 > > 3 files changed, 77 insertions(+) > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h > index 28672e87e910..415afc53fa51 100644 > --- a/include/linux/ftrace_event.h > +++ b/include/linux/ftrace_event.h > @@ -44,6 +44,15 @@ const char *ftrace_print_bitmask_seq(struct trace_seq *p, > void *bitmask_ptr, > const char *ftrace_print_hex_seq(struct trace_seq *p, >const unsigned char *buf, int len); > > +const char *ftrace_print_u8_array_seq(struct trace_seq *p, > + const u8 *buf, int count); > +const char *ftrace_print_u16_array_seq(struct trace_seq *p, > +const u16 *buf, int count); > +const char *ftrace_print_u32_array_seq(struct trace_seq *p, > +const u32 *buf, int count); > +const char *ftrace_print_u64_array_seq(struct trace_seq *p, > +const u64 *buf, int count); > + > struct trace_iterator; > struct trace_event; > > diff --git a/include/trace/ftrace.h b/include/trace/ftrace.h > index 26b4f2e13275..15bc5d417aea 100644 > --- a/include/trace/ftrace.h > +++ b/include/trace/ftrace.h > @@ -263,6 +263,19 @@ > #undef __print_hex > #define __print_hex(buf, buf_len) ftrace_print_hex_seq(p, buf, buf_len) > > +#undef __print_u8_array > +#define __print_u8_array(array, count) \ > + ftrace_print_u8_array_seq(p, array, count) > +#undef __print_u16_array > +#define __print_u16_array(array, count) \ > + ftrace_print_u16_array_seq(p, array, count) > +#undef __print_u32_array > +#define __print_u32_array(array, count) \ > + ftrace_print_u32_array_seq(p, array, count) > +#undef __print_u64_array > +#define __print_u64_array(array, count) \ > + ftrace_print_u64_array_seq(p, array, count) > + > #undef DECLARE_EVENT_CLASS > #define DECLARE_EVENT_CLASS(call, proto, args, tstruct, assign, print) > \ > static notrace enum print_line_t \ > @@ -676,6 +689,10 @@ static inline void ftrace_test_probe_##call(void) > \ > #undef __get_dynamic_array_len > #undef __get_str > #undef __get_bitmask > +#undef __print_u8_array > +#undef __print_u16_array > +#undef __print_u32_array > +#undef __print_u64_array > > #undef TP_printk > #define TP_printk(fmt, args...) "\"" fmt "\", " __stringify(args) > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c > index c6977d5a9b12..4a6ee61f30b3 100644 > --- a/kernel/trace/trace_output.c > +++ b/kernel/trace/trace_output.c > @@ -186,6 +186,57 @@ ftrace_print_hex_seq(struct trace_seq *p, const unsigned > char *buf, int buf_len) > } > EXPORT_SYMBOL(ftrace_print_hex_seq); > > +static const char * > +ftrace_print_array_seq(struct trace_seq *p, const void *buf, int buf_len, > +bool (*iterator)(struct trace_seq *p, const char *prefix, > + const void **buf, int *buf_len)) > +{ > + const char *ret = trace_seq_buffer_ptr(p); > + const char *prefix = ""; > + > + trace_seq_putc(p, '{'); > + > + while (iterator(p, prefix, &buf, &buf_len)) > + prefix = ","; Makes sense. > + > + trace_seq_putc(p, '}'); > + trace_seq_putc(p, 0); > + > + return ret; > +} > + > +#define DEFINE_PRINT_ARRAY(type, printk_type, format) > \ > +static bool \ > +ftrace_print_array_iterator_##type(struct trace_
Re: [RFC PATCH v6 1/9] tracing: Add array printing helpers
On Mon, Dec 08, 2014 at 03:42:10PM +, Steven Rostedt wrote: > On Fri, 5 Dec 2014 19:04:12 + > "Javi Merino" wrote: [...] > > + > > +DEFINE_PRINT_ARRAY(u8, unsigned int, "0x%x"); > > +DEFINE_PRINT_ARRAY(u16, unsigned int, "0x%x"); > > +DEFINE_PRINT_ARRAY(u32, unsigned int, "0x%x"); > > +DEFINE_PRINT_ARRAY(u64, unsigned long long, "0x%llx"); > > + > > I would really like to avoid adding a bunch of macros for each type. > Can't we have something like this: > ftrace_print_array(struct trace_seq *p, void *buf, int buf_len, > int size) > { > char *prefix = ""; > void *ptr = buf; > > while (ptr < buf + buf_len) { > switch(size) { > case 8: > trace_seq_printf("%s0x%x", prefix, > *(unsigned char *)ptr); I think this should be *(u8 *) etc. Otherwise, I don't have a problem with this approach. It's less ugly than my original. > break; > case 16: > trace_seq_printf("%s0x%x", prefix, > *(unsigned short *)ptr); > break; > case 32: > trace_seq_printf("%s0x%x", prefix, > *(unsigned int *)ptr); > break; > case 64: > trace_seq_printf("%s0x%llx", prefix, > *(unsigned long long *)ptr); > break; > default: > BUG(); > } > prefix = ","; > ptr += size; > } > > } > > We probably could even make the "BUG()" into a build bug, with a little > work. That sounds possible. Javi? Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tracing: make ftrace_print_array_seq compute buf_len
On Mon, Apr 27, 2015 at 01:17:48PM +0100, Alex Bennée wrote: > The only caller to this function was getting it wrong. I favoured What caller? Wrong in what way? > pushing the calculation to as close to the need as possible rather than > fixing the one caller. This seems reasonable, but... > > Signed-off-by: Alex Bennée > --- > include/linux/ftrace_event.h | 2 +- > kernel/trace/trace_output.c | 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h > index c674ee8..e6b0262 100644 > --- a/include/linux/ftrace_event.h > +++ b/include/linux/ftrace_event.h > @@ -45,7 +45,7 @@ const char *ftrace_print_hex_seq(struct trace_seq *p, >const unsigned char *buf, int len); > > const char *ftrace_print_array_seq(struct trace_seq *p, > -const void *buf, int buf_len, > +const void *buf, int len, How is the name "len" less confusing than "buf_len"? I suggest matching the name to the equivalent argument of the __print_array macro -- i.e., "count". Cheers ---Dave [...] -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2] tracing: make ftrace_print_array_seq compute buf_len
On Wed, Apr 29, 2015 at 04:18:46PM +0100, Alex Bennée wrote: > The only caller to this function (__print_array) was getting it wrong by > passing the array length instead of buffer length. As the element size > was already being passed for other reasons it seems reasonable to push > the calculation of buffer length into the function. > > Signed-off-by: Alex Bennée That looks better, thanks. Reviewed-by: Dave Martin > > --- > v2: > - more explicit commit message > - rename len -> count to reduce ambiguity > --- > include/linux/ftrace_event.h | 2 +- > kernel/trace/trace_output.c | 3 ++- > 2 files changed, 3 insertions(+), 2 deletions(-) > > diff --git a/include/linux/ftrace_event.h b/include/linux/ftrace_event.h > index c674ee8..33a66e6 100644 > --- a/include/linux/ftrace_event.h > +++ b/include/linux/ftrace_event.h > @@ -45,7 +45,7 @@ const char *ftrace_print_hex_seq(struct trace_seq *p, >const unsigned char *buf, int len); > > const char *ftrace_print_array_seq(struct trace_seq *p, > -const void *buf, int buf_len, > +const void *buf, int count, > size_t el_size); > > struct trace_iterator; > diff --git a/kernel/trace/trace_output.c b/kernel/trace/trace_output.c > index 692bf71..25a086b 100644 > --- a/kernel/trace/trace_output.c > +++ b/kernel/trace/trace_output.c > @@ -178,12 +178,13 @@ ftrace_print_hex_seq(struct trace_seq *p, const > unsigned char *buf, int buf_len) > EXPORT_SYMBOL(ftrace_print_hex_seq); > > const char * > -ftrace_print_array_seq(struct trace_seq *p, const void *buf, int buf_len, > +ftrace_print_array_seq(struct trace_seq *p, const void *buf, int count, > size_t el_size) > { > const char *ret = trace_seq_buffer_ptr(p); > const char *prefix = ""; > void *ptr = (void *)buf; > + size_t buf_len = count * el_size; > > trace_seq_putc(p, '{'); > > -- > 2.3.5 > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [kvmtool PATCH v6 6/6] arm/kvm: arm64: Add a vcpu feature for pointer authentication
On Fri, Mar 01, 2019 at 10:37:54AM +, Amit Daniel Kachhap wrote: > Hi, > > On 2/21/19 9:24 PM, Dave Martin wrote: > > On Tue, Feb 19, 2019 at 02:54:31PM +0530, Amit Daniel Kachhap wrote: [...] > >> diff --git a/arm/aarch64/include/kvm/kvm-config-arch.h > >> b/arm/aarch64/include/kvm/kvm-config-arch.h > >> index 04be43d..2074684 100644 > >> --- a/arm/aarch64/include/kvm/kvm-config-arch.h > >> +++ b/arm/aarch64/include/kvm/kvm-config-arch.h > >> @@ -8,7 +8,9 @@ > >> "Create PMUv3 device"),\ > >> OPT_U64('\0', "kaslr-seed", &(cfg)->kaslr_seed,\ > >> "Specify random seed for Kernel Address Space "\ > >> -"Layout Randomization (KASLR)"), > >> +"Layout Randomization (KASLR)"),\ > >> +OPT_BOOLEAN('\0', "ptrauth", &(cfg)->has_ptrauth,\ > >> +"Enable address authentication"), > > > > Nit: doesn't this enable address *and* generic authentication? The > > discussion on what capababilities and enables the ABI exposes probably > > needs to conclude before we can finalise this here. > ok. > > > > However, I would recommend that we provide a single option here that > > turns both address authentication and generic authentication on, even > > if the ABI treats them independently. This is expected to be the common > > case by far. > ok > > > > We can always add more fine-grained options later if it turns out to be > > necessary. > Mark suggested to provide 2 flags [1] for Address and Generic > authentication so I was thinking of adding 2 features like, > > +#define KVM_ARM_VCPU_PTRAUTH_ADDR4 /* CPU uses pointer address > authentication */ > +#define KVM_ARM_VCPU_PTRAUTH_GENERIC5 /* CPU uses pointer generic > authentication */ > > And supply both of them concatenated in VCPU_INIT stage. Kernel KVM > would expect both feature requested together. Seems reasonable. Do you mean the kernel would treat it as an error if only one of these flags is passed to KVM_ARM_VCPU_INIT, or would KVM simply treat them as independent? [...] > >> diff --git a/arm/kvm-cpu.c b/arm/kvm-cpu.c > >> index 7780251..4ac80f8 100644 > >> --- a/arm/kvm-cpu.c > >> +++ b/arm/kvm-cpu.c > >> @@ -68,6 +68,12 @@ struct kvm_cpu *kvm_cpu__arch_init(struct kvm *kvm, > >> unsigned long cpu_id) > >> vcpu_init.features[0] |= (1UL << KVM_ARM_VCPU_PSCI_0_2); > >> } > >> > >> +/* Set KVM_ARM_VCPU_PTRAUTH if available */ > >> +if (kvm__supports_extension(kvm, KVM_CAP_ARM_PTRAUTH)) { > >> +if (kvm->cfg.arch.has_ptrauth) > >> +vcpu_init.features[0] |= ARM_VCPU_PTRAUTH_FEATURE; > >> +} > >> + > > > > I'm not too keen on requiring a dummy #define for AArch32 here. How do > > we handle other subarch-specific feature flags? Is there something we > > can reuse? > I will check it. OK Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH v6 4/6] arm64/kvm: add a userspace option to enable pointer authentication
On Fri, Mar 01, 2019 at 09:41:20AM +, Amit Daniel Kachhap wrote: > Hi, > > On 2/21/19 9:23 PM, Dave Martin wrote: > > On Tue, Feb 19, 2019 at 02:54:29PM +0530, Amit Daniel Kachhap wrote: > >> This feature will allow the KVM guest to allow the handling of > >> pointer authentication instructions or to treat them as undefined > >> if not set. It uses the existing vcpu API KVM_ARM_VCPU_INIT to > >> supply this parameter instead of creating a new API. > >> > >> A new register is not created to pass this parameter via > >> SET/GET_ONE_REG interface as just a flag (KVM_ARM_VCPU_PTRAUTH) > >> supplied is enough to enable this feature. > >> > >> Signed-off-by: Amit Daniel Kachhap > >> Cc: Mark Rutland > >> Cc: Marc Zyngier > >> Cc: Christoffer Dall > >> Cc: kvm...@lists.cs.columbia.edu [...] > >> diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c > >> index 12529df..f7bcc60 100644 > >> --- a/arch/arm64/kvm/sys_regs.c > >> +++ b/arch/arm64/kvm/sys_regs.c > >> @@ -1055,7 +1055,7 @@ static bool access_cntp_cval(struct kvm_vcpu *vcpu, > >> } > >> > >> /* Read a sanitised cpufeature ID register by sys_reg_desc */ > >> -static u64 read_id_reg(struct sys_reg_desc const *r, bool raz) > >> +static u64 read_id_reg(struct kvm_vcpu *vcpu, struct sys_reg_desc const > >> *r, bool raz) > >> { > >> u32 id = sys_reg((u32)r->Op0, (u32)r->Op1, > >>(u32)r->CRn, (u32)r->CRm, (u32)r->Op2); > >> @@ -1071,7 +1071,7 @@ static u64 read_id_reg(struct sys_reg_desc const *r, > >> bool raz) > >>(0xfUL << ID_AA64ISAR1_API_SHIFT) | > >>(0xfUL << ID_AA64ISAR1_GPA_SHIFT) | > >>(0xfUL << ID_AA64ISAR1_GPI_SHIFT); > >> -if (!kvm_supports_ptrauth()) { > >> +if (!kvm_arm_vcpu_ptrauth_allowed(vcpu)) { > >> kvm_debug("ptrauth unsupported for guests, suppressing\n"); > >> val &= ~ptrauth_mask; > >> } > >> @@ -1095,7 +1095,7 @@ static bool __access_id_reg(struct kvm_vcpu *vcpu, > >> if (p->is_write) > >> return write_to_read_only(vcpu, p, r); > >> > >> -p->regval = read_id_reg(r, raz); > >> +p->regval = read_id_reg(vcpu, r, raz); > >> return true; > >> } > > > > The SVE KVM series makes various overlapping changes to propagate vcpuo > > into the relevant places, but hopefully the rebase is not too painful. > > Many of the changes are probably virtually identical between the two > > series. > > > > See for example [1]. Maybe you could cherry-pick and drop the > > equivalent changes here (though if your series is picked up first, I > > will live with it ;) > Yes no issue. I will cherry-pick your specific patch and rebase mine on it. OK, thanks. Unfortunately it is likely to churn a bit due to review-- my v6 series will rename some stuff. Hopefully it will be stable from then on. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
debugfs and module unloading
Hi there, I have a quick debugfs question: How can I protect against a module being unloaded while debugfs files it provides are open? I can do something like the following: static int my_debugfs_file_open(struct inode *, struct file *) { if (!try_module_get()) return -EIO; /* ... */ } static int my_debugfs_file_release(struct inode *, struct file *) { /* ... */ module_put(); return 0; } static const struct file_operations my_debugfs_fops = { /* ... */ .open = my_debugfs_file_open, .release = my_debugfs_file_release, /* ... */ }; static struct device_driver my_driver { /* ... */ .owner = THIS_MODULE; /* ... */ }; static int my_module_init(void) { driver_register(&my_driver); debugfs_create_file(..., &my_debugfs_fops); } static void my_module_exit(void) { debugfs_remove_recusrive(...); driver_unregister(&my_driver); } ... but it doesn't quite work. debugfs_remove_recursive() prevents any new file handles being opened, but files already open remain open -- so it's still possible for the module text to get unloaded between the module refcount being decreased inside module_put() and the time my_debugfs_file_release() returns. The scenario to consider is when a request to unload the module races with closure of the last debugfs file. The only obvious way I can see to solve this without changing the debugfs code is to make the module impossible to unload by calling __module_get() during initialisation, before any debugfs file is created. A similar dependency problem exists when a pointer to some device instance data is passed to debugfs_create_file(). For pluggable devices, the device might go away at any time. I hoped this could be solved by calling get_device() ... put_device() in the debugfs file open and release methods, but I found that a device instance can get removed, and the module unloaded, even though the struct device refcount is not zero. Am I missing something? Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] serial/amba-pl011: fix minor bugs for pio mode
Hi, On Tue, May 05, 2015 at 04:00:25AM +0100, Leo Yan wrote: > When use pio mode, there have two issues can be observed: > > - In the commit 2240197 "serial/amba-pl011: Leave the TX IRQ alone when > the UART is not open", it will skip clearing the TX IRQ across > pl011_shutdown() and pl011_startup(); So at the next time after the > uart port has been opened, there have chance for the function > pl011_tx_chars() will not be executed if the TX IRQ will not be > triggered; finally the console cannot output anymore. > > This is caused by the uart FIFO still keep data rather than > the threshold. So revert this patch to make sure every time open the > uart port, it will force to call function pl011_tx_chars(). > > - Sometimes will output the duplicate chars. Function pl011_tx_char() > will firstly send char and check if FIFO is full, and if the FIFO is > full it will return false; Caller function will consider the char > has _NOT_ been send out and resend it again, finally will send the > duplicate chars. So change to check FIFO is full or not, if full then > return false, otherwise send out char and return true. > > Signed-off-by: Leo Yan [...] Thanks for the fixes, but I already posted patches that probably fix the issues you are observing, And Greg took them into his tty-testing branch. Can you give them a try instead of your patches? Dave Martin (2): Revert "serial/amba-pl011: Leave the TX IRQ alone when the UART is not open" serial/amba-pl011: Refactor and simplify TX FIFO handling from git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/tty.git tty-testing It would be good to have some feedback on them. Cheers ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 1/2] ARM: PIE infrastructure
On Sat, Apr 23, 2016 at 01:15:03AM +0200, Alexandre Belloni wrote: > On 04/04/2016 at 11:00:52 +0100, Russell King - ARM Linux wrote : > > > + /* Copy chunk specific code/data */ > > > + fncpy((char *)chunk->addr, code_start, code_sz); > > > > Sorry, NAK. This abuses fncpy(). There is extensive documentation on > > the proper use of this in asm/fncpy.h, and anything that does not > > conform, or which uses memcpy() to copy functions, gets an immediate > > NAK from me. fncpy() exists to avoid people doing broken things, and > > it's written in such a way to help people get it right. > > Well, do you want me to iterate and use fncpy on all the functions from > the generated binary? > > I'm not sure this is necessary as the generated binary is self contained > and doing so will force me to also ensure the offsets are kept the same. > Doing only one copy is much more convenient. However, I still need to > ensure the destination address is properly 8-byte aligned and the > flush_icache_range(). > I understand this is abusing fncpy() but it does want I need (still, I'm > planning to avoid the BUG() by always passing a properly aligned > destination address). fncpy was only intented for a single, self-contained function. It bakes in assumptions that are not going to apply to PIEs in general. The main purpose of this was to avoid (possibly buggy) reinvention of this bit of code in every driver or board file that needed to copy a function to SRAM. The PIE mechanism supersedes this approach, in that it should completely hide the mechanics of copying to SRAM from the users of PIEs -- so its worth the PIE infrastructure having it's own code to do this. Since PIEs will have their own requirements that go beyond what fncpy does, using fncpy to implement the PIE infrastructure is a misfactorage. In particular, what is the alignment requirement for a PIE? It can be anything that ELF allows, not simply "8". The "thumb bit" is obviously also meaningless for section base addresses. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH 1/3] arm64: cpufeature: Fix the type of no FP/SIMD capability
On Mon, Oct 14, 2019 at 04:45:40PM +0100, Suzuki K Poulose wrote: > > > On 14/10/2019 15:52, Dave Martin wrote: > > On Fri, Oct 11, 2019 at 06:28:43PM +0100, Suzuki K Poulose wrote: > >> > >> > >> On 11/10/2019 15:21, Dave Martin wrote: > >>> On Fri, Oct 11, 2019 at 01:13:18PM +0100, Suzuki K Poulose wrote: > Hi > >>> Dave > > On 11/10/2019 12:36, Dave Martin wrote: > > On Thu, Oct 10, 2019 at 06:15:15PM +0100, Suzuki K Poulose wrote: > >> The NO_FPSIMD capability is defined with scope SYSTEM, which implies > >> that the "absence" of FP/SIMD on at least one CPU is detected only > >> after all the SMP CPUs are brought up. However, we use the status > >> of this capability for every context switch. So, let us change > >> the scop to LOCAL_CPU to allow the detection of this capability > >> as and when the first CPU without FP is brought up. > >> > >> Also, the current type allows hotplugged CPU to be brought up without > >> FP/SIMD when all the current CPUs have FP/SIMD and we have the > >> userspace > >> up. Fix both of these issues by changing the capability to > >> BOOT_RESTRICTED_LOCAL_CPU_FEATURE. > >> > >> Fixes: 82e0191a1aa11abf ("arm64: Support systems without FP/ASIMD") > >> Cc: Will Deacon > >> Cc: Mark Rutland > >> Cc: Catalin Marinas > >> Signed-off-by: Suzuki K Poulose > >> --- > >> arch/arm64/kernel/cpufeature.c | 2 +- > >> 1 file changed, 1 insertion(+), 1 deletion(-) > >> > >> diff --git a/arch/arm64/kernel/cpufeature.c > >> b/arch/arm64/kernel/cpufeature.c > >> index 9323bcc40a58..0f9eace6c64b 100644 > >> --- a/arch/arm64/kernel/cpufeature.c > >> +++ b/arch/arm64/kernel/cpufeature.c > >> @@ -1361,7 +1361,7 @@ static const struct arm64_cpu_capabilities > >> arm64_features[] = { > >>{ > >>/* FP/SIMD is not implemented */ > >>.capability = ARM64_HAS_NO_FPSIMD, > >> - .type = ARM64_CPUCAP_SYSTEM_FEATURE, > >> + .type = ARM64_CPUCAP_BOOT_RESTRICTED_CPU_LOCAL_FEATURE, > > > > ARM64_HAS_NO_FPSIMD is really a disability, not a capability. > > > > Although we have other things that smell like this (CPU errata for > > example), I wonder whether inverting the meaning in the case would > > make the situation easier to understand. > > Yes, it is indeed a disability, more on that below. > > > > > So, we'd have ARM64_HAS_FPSIMD, with a minimum (signed) feature field > > value of 0. Then this just looks like an ARM64_CPUCAP_SYSTEM_FEATURE > > IIUC. We'd just need to invert the sense of the check in > > system_supports_fpsimd(). > > This is particularly something we want to avoid with this patch. We want > to make sure that we have the up-to-date status of the disability right > when it happens. i.e, a CPU without FP/SIMD is brought up. With > SYSTEM_FEATURE > you have to wait until we bring all the CPUs up. Also, for HAS_FPSIMD, > you must wait until all the CPUs are up, unlike the negated capability. > >>> > >>> I don't see why waiting for the random defective early CPU to come up is > >>> better than waiting for all the early CPUs to come up and then deciding. > >>> > >>> Kernel-mode NEON aside, the status of this cap should not matter until > >>> we enter userspace for the first time. > >>> > >>> The only issue is if e.g., crypto drivers that can use kernel-mode NEON > >>> probe for it before all early CPUs are up, and so cache the wrong > >>> decision. The current approach doesn't cope with that anyway AFAICT. > >> > >> This approach does in fact. With LOCAL_CPU scope, the moment a defective > >> CPU turns up, we mark the "capability" and thus the kernel cannot use > >> the neon then onwards, unlike the existing case where we have time till > >> we boot all the CPUs (even when the boot CPU may be defective). > > > > I guess that makes sense. > > > > I'm now wondering what happens if anything tries to use kernel-mode NEON > > before SVE is initialised -- which doesn't happen until cpufeatures > > configures the system features. > > > > I don't think your proposed change makes anything worse here, but it may > > need looking into. > > We could throw in a WARN_ON() in kernel_neon() to make sure that the SVE > is initialised ? Could do, at least as an experiment. Ard, do you have any thoughts on this? Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [REVIEW][PATCH 03/26] signal/arm64: Use force_sig not force_sig_fault for SIGKILL
On Thu, May 23, 2019 at 03:53:06PM +0100, Eric W. Biederman wrote: > Dave Martin writes: > > > On Thu, May 23, 2019 at 01:38:53AM +0100, Eric W. Biederman wrote: > >> It really only matters to debuggers but the SIGKILL does not have any > >> si_codes that use the fault member of the siginfo union. Correct this > >> the simple way and call force_sig instead of force_sig_fault when the > >> signal is SIGKILL. > > > > I haven't fully understood the context for this, but why does it matter > > what's in siginfo for SIGKILL? My understanding is that userspace > > (including ptrace) never gets to see it anyway for the SIGKILL case. > > Yes. In practice I think it would take tracing or something very > exotic to notice anything going wrong because the task will be killed. > > > Here it feels like SIGKILL is logically a synchronous, thread-targeted > > fault: we must ensure that no subsequent insn in current executes (just > > like other fault signal). In this case, I thought we fall back to > > SIGKILL not because there is no fault, but because we failed to > > properly diagnose or report the type of fault that occurred. > > > > So maybe handling it consistently with other faults signals makes > > sense. The fact that delivery of this signal destroys the process > > before anyone can look at the resulting siginfo feels like a > > side-effect rather than something obviously wrong. > > > > The siginfo is potentially useful diagnostic information, that we could > > subsequently provide a means to access post-mortem. > > > > I just dived in on this single patch, so I may be missing something more > > fundamental, or just being pedantic... > > Not really. I was working on another cleanup and this usage of SIGKILL > came up. > > A synchronous thread synchronous fault gets us as far as the forc_sig > family of functions. That only leaves the question of which union > member in struct siginfo we are using. The union members are _kill, > _fault, _timer, _rt, _sigchld, _sigfault, _sigpoll, and _sigsys. > > As it has prove quite error prone for people to fill out struct siginfo > in the past by hand, I have provided a couple of helper functions for > the common cases that come up such as: force_sig_fault, > force_sig_mceerr, force_sig_bnderr, force_sig_pkuerr. Each of those > helper functions takes the information needed to fill out the union > member of struct siginfo that kind of fault corresponds to. > > For the SIGKILL case the only si_code I see being passed SI_KERNEL. > The SI_KERNEL si_code corresponds to the _kill union member while > force_sig_fault fills in fields for the _fault union member. > > Because of the mismatch of which union member SIGKILL should be using > and the union member force_sig_fault applies alarm bells ring in my head > when I read the current arm64 kernel code. Somewhat doubly so because > the other fields in passed to force_sig_fault appear to be somewhat > random when SIGKILL is the signal. > > So I figured let's preserve the usage of SIGKILL as a synchronous > exception. That seems legitimate and other folks do that as well but > let's use force_sig instead of force_sig_fault instead. I don't know if > userspace will notice but at the very least we won't be providing a bad > example for other kernel code to follow and we won't wind up be making > assumptions that are true today and false tomorrow when some > implementation detail changes. > > For imformation on what signals and si_codes correspond to which > union members you can look at siginfo_layout. That function > is the keeper of the magic decoder key. Currently the only two > si_codes defined for SIGKILL are SI_KERNEL and SI_USER both of which > correspond to a _kill union member. I see. Assuming we cannot have a dummy internal si_code for this special case (probably a bad idea), I think Will's suggestion of at least pushing the special case handling down into arm64_force_sig_fault() is probably a bit cleaner here, expecially if other callers of that function may pass in SIGKILL (I haven't looked though). Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH v5 5/5] arm64/kvm: control accessibility of ptrauth key registers
On Wed, Feb 13, 2019 at 05:35:46PM +, Kristina Martsenko wrote: > On 28/01/2019 06:58, Amit Daniel Kachhap wrote: > > According to userspace settings, ptrauth key registers are conditionally > > present in guest system register list based on user specified flag > > KVM_ARM_VCPU_PTRAUTH. > > > > Signed-off-by: Amit Daniel Kachhap > > Cc: Mark Rutland > > Cc: Christoffer Dall > > Cc: Marc Zyngier > > Cc: Kristina Martsenko > > Cc: kvm...@lists.cs.columbia.edu > > Cc: Ramana Radhakrishnan > > Cc: Will Deacon > > --- > > Documentation/arm64/pointer-authentication.txt | 3 ++ > > arch/arm64/kvm/sys_regs.c | 42 > > +++--- > > 2 files changed, 34 insertions(+), 11 deletions(-) > > [...] > > diff --git a/arch/arm64/kvm/sys_regs.c b/arch/arm64/kvm/sys_regs.c [...] > > @@ -2487,18 +2493,22 @@ static int walk_one_sys_reg(const struct > > sys_reg_desc *rd, > > } > > > > /* Assumed ordered tables, see kvm_sys_reg_table_init. */ > > -static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind) > > +static int walk_sys_regs(struct kvm_vcpu *vcpu, u64 __user *uind, > > +const struct sys_reg_desc *desc, unsigned int len) > > { > > const struct sys_reg_desc *i1, *i2, *end1, *end2; > > unsigned int total = 0; > > size_t num; > > int err; > > > > +if (desc == ptrauth_reg_descs && !kvm_arm_vcpu_ptrauth_allowed(vcpu)) > > +return total; > > + > > /* We check for duplicates here, to allow arch-specific overrides. */ > > i1 = get_target_table(vcpu->arch.target, true, &num); > > end1 = i1 + num; > > -i2 = sys_reg_descs; > > -end2 = sys_reg_descs + ARRAY_SIZE(sys_reg_descs); > > +i2 = desc; > > +end2 = desc + len; > > > > BUG_ON(i1 == end1 || i2 == end2); > > > > @@ -2526,7 +2536,10 @@ unsigned long kvm_arm_num_sys_reg_descs(struct > > kvm_vcpu *vcpu) > > { > > return ARRAY_SIZE(invariant_sys_regs) > > + num_demux_regs() > > -+ walk_sys_regs(vcpu, (u64 __user *)NULL); > > ++ walk_sys_regs(vcpu, (u64 __user *)NULL, sys_reg_descs, > > +ARRAY_SIZE(sys_reg_descs)) > > ++ walk_sys_regs(vcpu, (u64 __user *)NULL, ptrauth_reg_descs, > > +ARRAY_SIZE(ptrauth_reg_descs)); > > } > > > > int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu *vcpu, u64 __user > > *uindices) > > @@ -2541,7 +2554,12 @@ int kvm_arm_copy_sys_reg_indices(struct kvm_vcpu > > *vcpu, u64 __user *uindices) > > uindices++; > > } > > > > -err = walk_sys_regs(vcpu, uindices); > > +err = walk_sys_regs(vcpu, uindices, sys_reg_descs, > > ARRAY_SIZE(sys_reg_descs)); > > +if (err < 0) > > +return err; > > +uindices += err; > > + > > +err = walk_sys_regs(vcpu, uindices, ptrauth_reg_descs, > > ARRAY_SIZE(ptrauth_reg_descs)); > > if (err < 0) > > return err; > > uindices += err; > > @@ -2575,6 +2593,7 @@ void kvm_sys_reg_table_init(void) > > BUG_ON(check_sysreg_table(cp15_regs, ARRAY_SIZE(cp15_regs))); > > BUG_ON(check_sysreg_table(cp15_64_regs, ARRAY_SIZE(cp15_64_regs))); > > BUG_ON(check_sysreg_table(invariant_sys_regs, > > ARRAY_SIZE(invariant_sys_regs))); > > +BUG_ON(check_sysreg_table(ptrauth_reg_descs, > > ARRAY_SIZE(ptrauth_reg_descs))); > > > > /* We abuse the reset function to overwrite the table itself. */ > > for (i = 0; i < ARRAY_SIZE(invariant_sys_regs); i++) > > @@ -2616,6 +2635,7 @@ void kvm_reset_sys_regs(struct kvm_vcpu *vcpu) > > > > /* Generic chip reset first (so target could override). */ > > reset_sys_reg_descs(vcpu, sys_reg_descs, ARRAY_SIZE(sys_reg_descs)); > > +reset_sys_reg_descs(vcpu, ptrauth_reg_descs, > > ARRAY_SIZE(ptrauth_reg_descs)); > > > > table = get_target_table(vcpu->arch.target, true, &num); > > reset_sys_reg_descs(vcpu, table, num); > > This isn't very scalable, since we'd need to duplicate all the above > code every time we add new system registers that are conditionally > accessible. Agreed, putting feature-specific checks in walk_sys_regs() is probably best avoided. Over time we would likely accumulate a fair number of these checks. > It seems that the SVE patches [1] solved this problem by adding a > check_present() callback into struct sys_reg_desc. It probably makes > sense to rebase onto that patch and just implement the callback for the > ptrauth key registers as well. > > [1] > https://lore.kernel.org/linux-arm-kernel/1547757219-19439-13-git-send-email-dave.mar...@arm.com/ Note, I'm currently refactoring this so that enumeration through KVM_GET_REG_LIST can be disabled independently of access to the register. This may not be the best approach, but for SVE I have a feature-specific ID register to handle too (ID_AA64ZFR0_EL1), which needs to be hidden from the enumeration but still accessible with read-as-zero behaviour. This changes the API a bit: I move to a .restrictions() callback which returns flags to say what is disabled, and this callback is used in the common code so that you don't have repeat your "feature present" check in every accessor, as is currently the case. I'm aiming to post a respun series in the next day or two
Re: [PATCH] gcc-plugins: Add structleak for more stack initialization
On Mon, Jan 16, 2017 at 08:22:24PM +0100, PaX Team wrote: > On 16 Jan 2017 at 11:54, Mark Rutland wrote: [...] > > I assume that this is only guaranteed to initialise fields in a struct, > > and not padding, is that correct? I ask due to the issue described in: > > > > https://lwn.net/Articles/417989/ > > the 'issue' is that before C11 the standard didn't make it clear that in > case of a partial initializer list the compiler has to initialize not only > the remaining fields but also padding as well. Which version of the C11 spec clarifies this? The one I'm looking at (n1570, Apr 12 2011) says: "6.7.9 21 If there are fewer initializers in a brace-enclosed list than there are elements or members of an aggregate, or fewer characters in a string literal used to initialize an array of known size than there are elements in the array, the remainder of the aggregate shall be initialized implicitly the same as objects that have static storage duration." What is meant by "the remainder of the object" is unclear. Is this just the tail of the object, or all unitialised members -- which may be sparse in the case of an initialiser with designators? And is padding (and if so, which padding) considered to be part of (the remainder of) the object for the purpose of this paragraph? This can be read with the interpretation you suggest, but the wording doesn't seem rock-solid. For the kernel, I guess it's sufficient if GCC commits to this interpretation though. A related, perhaps more interesting issue is that C11 still explicitly states that padding is undefined after assignment: "6.2.6.1 6 When a value is stored in an object of structure or union type, including in a member object, the bytes of the object representation that correspond to any padding bytes take unspecified values. 51)" "51) Thus, for example, structure assignment need not copy any padding bits." which means that in: { struct foo foo1 = FOO1_INTIALIZER; struct foo foo2; foo2 = foo1; } the contents of padding in foo2 is unspecified. Similarly, piecemeal initialisation of an object by assigning to every member leaves the padding unspecified, e.g.: { struct timeval tv; tv.tv_sec = secs; tv.tv_usec = usecs; } (On most or all arches struct timeval contains no padding, but relying implicitly on this is still a bit iffy.) It's highly likely that the kernel passes objects resulting from one of the above to put_user() and friends. It would be good if we had a way to catch at least some of these. (I don't know whether that falls naturally under this plugin.) Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [PATCH] gcc-plugins: Add structleak for more stack initialization
On Tue, Jan 17, 2017 at 06:09:49PM +0100, PaX Team wrote: > On 17 Jan 2017 at 10:42, Dave P Martin wrote: > > > On Mon, Jan 16, 2017 at 08:22:24PM +0100, PaX Team wrote: > > > the 'issue' is that before C11 the standard didn't make it clear that in > > > case of a partial initializer list the compiler has to initialize not only > > > the remaining fields but also padding as well. > > > > Which version of the C11 spec clarifies this? > > > > The one I'm looking at (n1570, Apr 12 2011) says: > > > > "6.7.9 21 If there are fewer initializers in a brace-enclosed list than > > there are elements or members of an aggregate, or fewer characters in a > > string literal used to initialize an array of known size than there are > > elements in the array, the remainder of the aggregate shall be > > initialized implicitly the same as objects that have static storage > > duration." > > > > What is meant by "the remainder of the object" is unclear. > > it's less unclear if you also take into account the rest of the > sentence that says "[initialized] the same as objects that have > static storage duration" which in turn is described at 6.7.9p10 > where it says among others: > > "if it is an aggregate, every member is initialized (recursively) >according to these rules, and any padding is initialized to zero bits" > > the "and any padding is initialized to zero bits" was the addition > in C11 (which i'm not sure was meant as a new feature for C11 or > just official enshrinement of what had always been meant). > > > Is this just the tail of the object, or all unitialised members -- > > which may be sparse in the case of an initialiser with designators? And > > is padding (and if so, which padding) considered to be part of (the > > remainder of) the object for the purpose of this paragraph? > > to me the quote about 'any padding' is for all kinds of padding. > uninitialized members are described in the same paragraph. > > > This can be read with the interpretation you suggest, but the wording > > doesn't seem rock-solid. For the kernel, I guess it's sufficient if > > GCC commits to this interpretation though. > > note that i'm not a language lawyer, merely an interpreter, so take > the above with a grain of salt. Me neither, but actually I think this interpretation doesn't make sense. The implication seems to be that padding initialisation of initialised aggregates with automatic storage is guaranteed _only_ if the initialiser is incomplete. Paragraph 19 always applies, but that only asserts "all subobjects that are not initialized explicitly shall be initialized implicitly the same as objects that have static storage duration" -- no statement about padding. Paragraph 21 guarantees initialisation of padding bytes, but applies _only_ if the initializer list is incomplete. As a random test, a recent-ish gcc might actually apply this interpretation... tst.c ---8<--- struct foo { long x; char y; long z; int w; }; void f(struct foo *); void g(long *x, char *y, long *z, int *w) { struct foo o = { .z = *z, .y = *y, .x = *x, .w = *w }; f(&o); } void h(int *y, long *z) { struct foo o = { .z = *z, .y = *y }; f(&o); } --->8--- $ gcc (Debian 6.3.0-2) 6.3.0 20161229 $ gcc -Wall -Wextra -pedantic -O2 -std=c11 -c tst.c $ objdump -d tst.o tst.o: file format elf64-littleaarch64 Disassembly of section .text: : 0: a9bd7bfdstp x29, x30, [sp, #-48]! 4: 910003fdmov x29, sp 8: 39400024ldrbw4, [x1] c: f945ldr x5, [x0] 10: 910043a0add x0, x29, #0x10 14: b9400061ldr w1, [x3] 18: f9400042ldr x2, [x2] 1c: 390063a4strbw4, [x29, #24] 20: f9000ba5str x5, [x29, #16] 24: f90013a2str x2, [x29, #32] 28: b9002ba1str w1, [x29, #40] 2c: 9400bl 0 30: a8c37bfdldp x29, x30, [sp], #48 34: d65f03c0ret 0038 : 38: a9bd7bfdstp x29, x30, [sp, #-48]! 3c: 910003fdmov x29, sp 40: b942ldr w2, [x0] 44: 910043a0add x0, x29, #0x10 48: f9400021ldr x1, [x1] 4c: a9017fbfstp xzr, xzr, [x29, #16] 50: a9027fbfstp xzr, xzr, [x29, #32] 54: 390063a2strbw2, [x29, #24] 58: f90013a1str x1, [x29, #32] 5c: 9400bl 0 60: a8c37bfdldp x29, x30, [sp], #48 64: d65f03c0ret In g(), where the initia
Re: [PATCH] gcc-plugins: Add structleak for more stack initialization
On Tue, Jan 17, 2017 at 08:25:49PM +0100, PaX Team wrote: > On 17 Jan 2017 at 18:07, Dave P Martin wrote: > > > On Tue, Jan 17, 2017 at 06:09:49PM +0100, PaX Team wrote: > > > On 17 Jan 2017 at 10:42, Dave P Martin wrote: > > > > > > > This can be read with the interpretation you suggest, but the wording > > > > doesn't seem rock-solid. For the kernel, I guess it's sufficient if > > > > GCC commits to this interpretation though. > > > > > > note that i'm not a language lawyer, merely an interpreter, so take > > > the above with a grain of salt. > > > > Me neither, but actually I think this interpretation doesn't make sense. > > The implication seems to be that padding initialisation of initialised > > aggregates with automatic storage is guaranteed _only_ if the > > initialiser is incomplete. > > however illogical it sounds, that's actually what is implied by the standard > and gcc behaves this way. my test case with a bit more verbose output than > your example (gcc shows in comments what each assignment corresponds to): > > -- cut > gcc -O2 -x c -S -fverbose-asm -o - - < struct s1 { > char c; > int l; > }; > > void f(struct s1*); > > void g(void) > { > struct s1 s1 = {}; > struct s1 s2 = { .c = 1}; > struct s1 s3 = { .l = 2}; > struct s1 s4 = { .c = 3, .l = 4}; > > f(&s1); > f(&s2); > f(&s3); > f(&s4); > } > EOF > -- cut > > > This may or may not be a bug, and may or may not have been fixed in a > > more recent version. > > i tested 4.5.4, 6.3 and 7.0 from about a month ago on amd64 and had > consistent results as above. i think it's not a bug but an intentional > optimization but i don't feel like digging it out from the gcc sources :). Interesting. So I guess we're probably stuck with it, good or bad. Do you know the rationale behind the inconsistency? > > > both examples would be handled by the plugin if the types involved already > > > have a __user annotated field (as neither foo2 nor tv are initialized, > > > only > > > assigned to), otherwise it'd need manual annotation and/or more static > > > analysis. > > > > Do you see any false positives here? I would expect a fair number of > > structures that contain __user pointers but that are never copied > > wholesale to userspace, but I've not reviewed in detail. > > false positive depends on what you consider a goal ;). for me the structleak > plugin had to serve one specific purpose (initialize a local variable that > would otherwise leak unintended kernel stack content back to userland), > everything > else didn't matter. now if you want to move the goalpost and initialize those > and only those variables that get copied to userland but aren't otherwise > guaranteed to be completely initialized then you'll need something much > smarter > than the current structleak plugin as it doesn't serve that purpose and will > both overinitialize and fail to initialize affected variables... Sure -- the hope was that structleak might cover this sort of thing without too much extra effort, but it sounds less realistic now. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: 答复: Question about the sparse memory section size
On Wed, Sep 23, 2015 at 02:16:45AM +0100, Qijiwen wrote: > In fact the free_unused_memmap function is not invoked when > CONFIG_SPARSEMEM_VMEMMAP is enabled. > The reason is that the memmap region is mapped in unit of 2MB, not in unit of > 4KB. > > Below is the source code in arch/arm64/mm/init.c: > void __init mem_init(void) > { > swiotlb_init(1); > > set_max_mapnr(pfn_to_page(max_pfn) - mem_map); > > #ifndef CONFIG_SPARSEMEM_VMEMMAP > free_unused_memmap(); > #endif Oops, you're right. But if we're not already doing so, it could still make sense to unmap useless backing memory from the vmemmap, or try harder to avoid mapping it in the first place. However, whether this is is worth it depends on what fraction of memory is wasted on your platform. After all, every page struct is "wasted" memory. Which means 1-2% of RAM is always "wasted" for a kernel with 4KB page size anyway. Cheers ---Dave [...] > > Best regards, > > Jiwen Qi > > -邮件原件- > 发件人: Dave Martin [mailto:dave.mar...@arm.com] > 发送时间: 2015年9月22日 19:17 > 收件人: Chenfeng (puck) > 抄送: catalin.mari...@arm.com; will.dea...@arm.com; mark.rutl...@arm.com; > ard.biesheu...@linaro.org; lau...@codeaurora.org; > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org; Dan zhao; > Xuyiping; Suzhuangluan; Qijiwen; fujun (F); Panshilin (Peter) > 主题: Re: Question about the sparse memory section size > > On Tue, Sep 22, 2015 at 04:14:21PM +0800, chenfeng wrote: > > Hi all, > > The sparse memory section size, SECTION_SIZE_BITS, currently is 1GB > > for arm64 by default. However, it might generate wasted memmap memory > > space for those memory sections less than 1GB. e.g. > > > > for 512MB memory section, still 14MB(sizeof(struct page) * > > PAGES_PER_SECTION) memmap needs to be reserved. The wasted memmap > > space could be eliminated by changing memory section size from 1GB to > > 512M, but still some questions to be answered, > > > > 1) why arm64 uses 1GB as default setting? > > 2) any risk to change section size from 1GB to 512MB? like, any impact > > to performance since memory section number is increased. > > For arm64 we have SPARSEMEM_VMEMMAP enabled by default, which enables much of > the wasted memmap backing memory to be reclaimed. > > Take a look at arch/arm64/mm/init.c:free_unused_memmap(). > > This should reduce the amount of actual memory wasted on unused parts of > memmap. The virtual space stays wasted as you describe, but that's plentiful > on 64-bit arches. > > You could try sticking some printks in there is you want to see how much of > the memmap the code successfully frees. > > Cheers > ---Dave >
Re: [PATCH 00/22] arm64: Consolidate CPU feature handling
On Wed, Sep 23, 2015 at 05:37:19PM +0100, Suzuki K. Poulose wrote: > On 23/09/15 16:58, Suzuki K. Poulose wrote: > > On 23/09/15 16:56, Dave P Martin wrote: > >> On Wed, Sep 16, 2015 at 03:20:58PM +0100, Suzuki K. Poulose wrote: > >>> From: "Suzuki K. Poulose" > >>> > >>> This is an updated reincarnation of my "arm64: Expose CPU feature > >>> registers" > >>> series [1], which does much more. > >> > >> [...] > >> > >> Do you get whitespace errors for some patches in this series? > >> > >> git am complains for several of the patches, but it's possibly my mail > >> chewed them up. > >> > >> If the errors are real, please fix ;) > >> > > > > Thanks for reporting, let me take a look. > > The problems were with my patches. I have fixed them locally, will send it > along in the next version. Thanks again for spotting. Ah, OK. Thanks ---Dave -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 18/18] arm64: lto: Strengthen READ_ONCE() to acquire when CLANG_LTO=y
On Tue, Jun 30, 2020 at 06:37:34PM +0100, Will Deacon wrote: > When building with LTO, there is an increased risk of the compiler > converting an address dependency headed by a READ_ONCE() invocation > into a control dependency and consequently allowing for harmful > reordering by the CPU. > > Ensure that such transformations are harmless by overriding the generic > READ_ONCE() definition with one that provides acquire semantics when > building with LTO. > > Signed-off-by: Will Deacon > --- > arch/arm64/include/asm/rwonce.h | 63 +++ > arch/arm64/kernel/vdso/Makefile | 2 +- > arch/arm64/kernel/vdso32/Makefile | 2 +- > 3 files changed, 65 insertions(+), 2 deletions(-) > create mode 100644 arch/arm64/include/asm/rwonce.h > > diff --git a/arch/arm64/include/asm/rwonce.h b/arch/arm64/include/asm/rwonce.h > new file mode 100644 > index ..515e360b01a1 > --- /dev/null > +++ b/arch/arm64/include/asm/rwonce.h > @@ -0,0 +1,63 @@ > +/* SPDX-License-Identifier: GPL-2.0 */ > +/* > + * Copyright (C) 2020 Google LLC. > + */ > +#ifndef __ASM_RWONCE_H > +#define __ASM_RWONCE_H > + > +#ifdef CONFIG_CLANG_LTO Don't we have a generic option for LTO that's not specific to Clang. Also, can you illustrate code that can only be unsafe with Clang LTO? [...] Cheers ---Dave
Re: [PATCH 4/8] arm64: Basic Branch Target Identification support
On Fri, May 24, 2019 at 06:19:10PM +0100, Mark Rutland wrote: > On Fri, May 24, 2019 at 05:12:40PM +0100, Dave Martin wrote: > > On Fri, May 24, 2019 at 04:38:48PM +0100, Mark Rutland wrote: > > > On Fri, May 24, 2019 at 03:53:06PM +0100, Dave Martin wrote: > > > > On Fri, May 24, 2019 at 02:02:17PM +0100, Mark Rutland wrote: > > > > > On Fri, May 24, 2019 at 11:25:29AM +0100, Dave Martin wrote: [...] > > > > > > diff --git a/arch/arm64/kernel/syscall.c > > > > > > b/arch/arm64/kernel/syscall.c > > > > > > index 5610ac0..85b456b 100644 > > > > > > --- a/arch/arm64/kernel/syscall.c > > > > > > +++ b/arch/arm64/kernel/syscall.c > > > > > > @@ -66,6 +66,7 @@ static void el0_svc_common(struct pt_regs *regs, > > > > > > int scno, int sc_nr, > > > > > > unsigned long flags = current_thread_info()->flags; > > > > > > > > > > > > regs->orig_x0 = regs->regs[0]; > > > > > > +regs->pstate &= ~(regs->pstate & PSR_BTYPE_MASK); > > > > > > > > > > Likewise: > > > > > > > > > > regs->pstate &= ~PSR_BTYPE_MASK; > > > > > > > > > > ... though I don't understand why that would matter to syscalls, nor > > > > > how > > > > > those bits could ever be set given we had to execute an SVC to get > > > > > here. > > > > > > > > > > What am I missing? > > > > > > > > The behaviour is counterintuivite here. The architecture guarantees to > > > > preserve BTYPE for traps, faults and asynchronous exceptions, but for a > > > > synchronous execption from normal architectural execution of an > > > > exception-generating instruction (SVC/HVC/SMC) the architecture leaves > > > > it IMP DEF whether BTYPE is preserved or zeroed in SPSR. > > > > > > I'm still missing something here. IIUC were BTYPE was non-zero, we > > > should take the BTI trap before executing the SVC/HVC/SMC, right? > > > > > > Otherwise, it would be possible to erroneously branch to an SVC/HVC/SMC, > > > which would logically violate the BTI protection. > > > > Only if the SVC (etc.) is in a BTI guarded page. Otherwise, we could > > have legitimately branched to the SVC insn directly and BTYPE would > > be nonzero, but no trap would occur. > > I agree that would be the case immediately before we execute the SVC, > but I think there's a subtlety here w.r.t. what exactly happens as an > SVC is executed. > > My understanding was that even for unguarded pages, the execution of any > (non branch/eret) instruction would zero PSTATE.BTYPE. > > For SVC it's not clear to me whether generating the SVC exception is > considered to be an effect of completing the execution of an SVC, > whether it's considered as preempting the execution of the SVC, or > whether that's IMPLEMENTATION DEFINED. > > Consequently it's not clear to me whether or not executing an SVC clears > PSTATE.BTYPE before the act of taking the exception samples PSTATE. I > would hope that it does, as this would be in keeping with the way the > ELR is updated. OTOH, the wording calls this case out quite explicitly. It seems odd to do that if the more general wording applies. I'll take another look and request clarficiation. > I think that we should try to clarify that before we commit ourselves to > the most painful interpretation here. Especially as similar would apply > to HVC and SMC, and I strongly suspect firmware in general is unlikely > to fix up the PSTATE.BTYPE of a caller. > > > We should still logically zero BTYPE across SVC in that case, because > > the SVC may itself branch: a signal could be delivered on return and > > we want the prevailing BTYPE then to be 0 for capture in the signal > > frame. Doing this zeroing in signal delivery because if the signal > > is not delivered in SVE then a nonzero BTYPE might be live, and we > > must then restore it properly on sigreturn. > > I'm not sure I follow this. > > If we deliver a signal, the kernel generates a pristine PSTATE for the > signal handler, and the interrupted context doesn't matter. > > Saving/restoring the state of the interrupted context is identical to > returning without delivering the signal, and we'd have a problem > regardless. My test looks garbled... since the point I was making was tangential, I don't elaborate it for now. > > As you observe, this scenario should be impossible if the SVC insn > > is in a guarded page, unless userspace does something foolhardy like > > catching the SIGILL and fudging BTYPE or the return address. > > I think userspace gets to pick up the pieces in this case. Much like > signal delivery, it would need to generate a sensible PSTATE itself. Agreed, there is no way to hide this kind of thing from userspace code that messes with the signal frame -- so we shouldn't try. > [...] > > > (Now I come to think of it I also need to look at rseq etc., which is > > another magic kernel-mediated branch mechanism.) Meh. Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sen
Re: [PATCH 8/9] RISC-V: User-facing API
On Wed, Jul 05, 2017 at 09:49:36AM -0700, Palmer Dabbelt wrote: > On Mon, 03 Jul 2017 16:06:39 PDT (-0700), james.ho...@imgtec.com wrote: > > On Thu, Jun 29, 2017 at 02:42:38PM -0700, Palmer Dabbelt wrote: > >> On Wed, 28 Jun 2017 15:42:37 PDT (-0700), james.ho...@imgtec.com wrote: > >> > On Wed, Jun 28, 2017 at 11:55:37AM -0700, Palmer Dabbelt wrote: > >> >> diff --git a/arch/riscv/include/uapi/asm/ucontext.h > >> >> b/arch/riscv/include/uapi/asm/ucontext.h > >> >> new file mode 100644 > >> >> index ..52eff9febcfd > >> >> --- /dev/null > >> >> +++ b/arch/riscv/include/uapi/asm/ucontext.h > >> > ... > >> >> +struct ucontext { > >> >> + unsigned long uc_flags; > >> >> + struct ucontext *uc_link; > >> >> + stack_t uc_stack; > >> >> + sigset_t uc_sigmask; > >> >> + /* glibc uses a 1024-bit sigset_t */ > >> >> + __u8 __unused[1024 / 8 - sizeof(sigset_t)]; > >> >> + /* last for future expansion */ > >> >> + struct sigcontext uc_mcontext; > >> >> +}; > >> > > >> > Any particular reason not to use the asm-generic ucontext? > >> > >> In the generic ucontext, 'uc_sigmask' is at the end of the structure so it > >> can > >> be expanded. Since we want our mcontext to be expandable as well, we > >> pre-allocate some expandable space for sigmask and then put mcontext at the > >> end. > >> > >> We stole this idea from arm64. > > > > Curious. __unused seems like overkill to be honest given that expanding > > the number of signals up to 128 causes other issues (as discovered on > > MIPS e.g. the waitpid() status, with stopsig not fitting below the exit > > code (shift 8) and core dump flag (bit 7)), but perhaps it could be > > carefully expanded by splitting the stopsig field. > > Sorry, I don't understand the intricacies of this in the slightest. In > general > we try to avoid surprises in software land in RISC-V, so whenever we do > something we go look at the most popular architectures (Intel and ARM) and try > to ensure we don't paint ourselves into any corners that they didn't. I think Catalin was concerned that putting uc_sigmask at the end breaks extensibility of sigcontext. [Catalin, please comment if I'm misquoting you ;) ] Generic ucontext seems broken in any case, when kernel/user sigset_t sizes differ: sigset_t oldmask, newmask; void handler(int n, siginfo_t *si, void *uc_) { ucontext_t *uc = uc_; oldmask = uc->uc_sigmask; uc->uc_sigmask = newmask; } With generic ucontext, this can overrun and corrupt memory if the user/ kernel sigset_t sizes differ. The only fix is to reserve space in ucontext for the larger of the two sigset sizes, which generic ucontext does not do. There's also the problem you comment on where only 7 bits of signal number are available in wait() status values: with 0x7f being magic and 0 not a valid signal number, that probably allows up to 126 signals. An arch could possibly have its own definitions to get beyond this, but it's all done by magic numbers and open-coding in core code today. i.e., the "extensibility" in generic ucontext may be bogus. So, you can commit to a sane maximum number of signals (say, 64) in your ABI, but this means that your libc sigset_t size probably needs to match and you can never grow beyond that without an ABI break. Or you can reserve enough space in ucontext for the userspace sigset_t. Using generic signal.h and ucontext.h effectively commits you to max 64 signals AFAICT. The extra space may be permanently wasted, but that's preferable to memory corruption. (Note, I don't know myself where the "1024" comes from. Are there any POSIXish systems implementing anywhere near that number of signals? Is there a real usecase for it? Maybe it's just overzealous futureproofing?) > > Looks harmless here I suppose so I defer to others. If it is the > > preferred approach does it make sense to make it the "default" for new > > architectures at some point? > > Again, this isn't really my thing, but we chose this because we thought it was > the sane way to do it. Unless we're doing something silly, I don't see why it > wouldn't be a reasonable default. This is predicated on having expandable > architectural state, otherwise putting sigmask at the end seems sane. Note, the contents of sigcontext are nonportable but are nonetheless ABI, and some userspace software does expect to be able to poke about in there, modify the signal return state, etc. Additionally the whole signal frame cannot safely exceed MINSIGSTKSZ in size (which looks like 2K if you're relying on generic signal.h -- not a big number when compared against upcoming vector architectures). Going beyond MINSIGSTKSZ is a user ABI break, as is any non-probeable change to the contents of struct sigcontext, including changes to its size. We are burned by this on arm64 with SVE: arm64 has its own MINSIGSTKSZ (5K), which is not enough for the biggest possible SVE implemetati
Re: arm64: unhandled level 0 translation fault
On Thu, Dec 14, 2017 at 02:34:50PM +, Geert Uytterhoeven wrote: > Hi Catalin, Will, Dave, > > On Tue, Dec 12, 2017 at 11:20 AM, Geert Uytterhoeven > wrote: > > During userspace (Debian jessie NFS root) boot on arm64: > > > > rpcbind[1083]: unhandled level 0 translation fault (11) at 0x0008, > > esr 0x9204, in dash[adf77000+1a000] > > CPU: 0 PID: 1083 Comm: rpcbind Not tainted > > 4.15.0-rc3-arm64-renesas-02176-g14f9a1826e48e355 #51 > > Hardware name: Renesas Salvator-X 2nd version board based on r8a7795 ES2.0+ > > (DT) > > This is a quad Cortex A57. > > > pstate: 8000 (Nzcv daif -PAN -UAO) > > pc : 0xadf8a51c > > lr : 0xadf8ac08 > > sp : cffeac00 > > x29: cffeac00 x28: adfa1000 > > x27: cffebf7c x26: cffead20 > > x25: cea1c5f0 x24: > > x23: adfa1000 x22: adfa1000 > > x21: x20: 0008 > > x19: x18: cffeb500 > > x17: a22babfc x16: adfa1ae8 > > x15: a2363588 x14: > > x13: 0020 x12: 0010 > > x11: 0101010101010101 x10: adfa1000 > > x9 : ff81 x8 : adfa2000 > > x7 : x6 : > > x5 : adfa2338 x4 : adfa2000 > > x3 : adfa2338 x2 : > > x1 : adfa28b0 x0 : adfa4c30 > > > > Sometimes it happens with other processes, but the main address, esr, and > > pstate values are always the same. > > > > I regularly run arm64/for-next/core (through bi-weekly renesas-drivers > > releases, so the last time was two weeks ago), but never saw the issue > > before until today, so probably v4.15-rc1 is OK. > > Unfortunately it doesn't happen during every boot, which makes it > > cumbersome to bisect. > > > > My first guess was UNMAP_KERNEL_AT_EL0, but even after disabling that, > > and even without today's arm64/for-next/core merged in, I still managed to > > reproduce the issue, so I believe it was introduced in v4.15-rc2 or > > v4.15-rc3. > > > > Once, when the kernel message above wasn't shown, I got an error from > > userspace, which may be related: > > *** Error in `/bin/sh': free(): invalid pointer: 0xdd970988 *** > > With more boots (10 instead of 6) to declare a kernel good, I bisected this > to commit 9de52a755cfb6da5 ("arm64: fpsimd: Fix failure to restore FPSIMD > state after signals"). > > Reverting that commit on top of v4.15-rc3 fixed the issue for me. Good work on the bisect -- I'll need to have a think about this... That patch fixes a genuine problem so we can't just revert it. What if you revert _just this function_ back to what it was in v4.14? Cheers ---Dave IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
Re: [RESEND PATCH] arm64: v8.4: Support for new floating point multiplication variant
On Sat, Dec 09, 2017 at 03:28:42PM +, Dongjiu Geng wrote: > ARM v8.4 extensions include support for new floating point > multiplication variant instructions to the AArch64 SIMD Do we have any human-readable description of what the new instructions do? Since the v8.4 spec itself only describes these as "New Floating Point Multiplication Variant", I wonder what "FHM" actually stands for. Maybe something like "widening half-precision floating-point multiply accumulate" is acceptable wording consistent with the existing architecture, but I just made that up, so it's not official ;) > instructions set. Let the userspace know about it via a > HWCAP bit and MRS emulation. > > Cc: Suzuki K Poulose > Signed-off-by: Dongjiu Geng > --- > My platform supports this feature, so I need to add it. > --- > Documentation/arm64/cpu-feature-registers.txt | 4 +++- > arch/arm64/include/asm/sysreg.h | 1 + > arch/arm64/include/uapi/asm/hwcap.h | 1 + > arch/arm64/kernel/cpufeature.c| 2 ++ > arch/arm64/kernel/cpuinfo.c | 1 + > 5 files changed, 8 insertions(+), 1 deletion(-) > > diff --git a/Documentation/arm64/cpu-feature-registers.txt > b/Documentation/arm64/cpu-feature-registers.txt > index bd9b3fa..a70090b 100644 > --- a/Documentation/arm64/cpu-feature-registers.txt > +++ b/Documentation/arm64/cpu-feature-registers.txt > @@ -110,7 +110,9 @@ infrastructure: > x--x > | Name | bits | visible | > |--| > - | RES0 | [63-48] |n| > + | RES0 | [63-52] |n| > + |--| > + | FHM | [51-48] |y| You also need to update Documentation/arm64/elf_hwcaps.txt. Otherwise, looks OK. Cheers ---Dave > |--| > | DP | [47-44] |y| > |--| > diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h > index 08cc885..1818077 100644 > --- a/arch/arm64/include/asm/sysreg.h > +++ b/arch/arm64/include/asm/sysreg.h > @@ -419,6 +419,7 @@ > #define SCTLR_EL1_CP15BEN(1 << 5) > > /* id_aa64isar0 */ > +#define ID_AA64ISAR0_FHM_SHIFT 48 > #define ID_AA64ISAR0_DP_SHIFT44 > #define ID_AA64ISAR0_SM4_SHIFT 40 > #define ID_AA64ISAR0_SM3_SHIFT 36 > diff --git a/arch/arm64/include/uapi/asm/hwcap.h > b/arch/arm64/include/uapi/asm/hwcap.h > index cda76fa..f018c3d 100644 > --- a/arch/arm64/include/uapi/asm/hwcap.h > +++ b/arch/arm64/include/uapi/asm/hwcap.h > @@ -43,5 +43,6 @@ > #define HWCAP_ASIMDDP(1 << 20) > #define HWCAP_SHA512 (1 << 21) > #define HWCAP_SVE(1 << 22) > +#define HWCAP_ASIMDFHM (1 << 23) > > #endif /* _UAPI__ASM_HWCAP_H */ > diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c > index c5ba009..bc7e707 100644 > --- a/arch/arm64/kernel/cpufeature.c > +++ b/arch/arm64/kernel/cpufeature.c > @@ -123,6 +123,7 @@ static int __init register_cpu_hwcaps_dumper(void) > * sync with the documentation of the CPU feature register ABI. > */ > static const struct arm64_ftr_bits ftr_id_aa64isar0[] = { > + ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, > ID_AA64ISAR0_FHM_SHIFT, 4, 0), > ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, > ID_AA64ISAR0_DP_SHIFT, 4, 0), > ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, > ID_AA64ISAR0_SM4_SHIFT, 4, 0), > ARM64_FTR_BITS(FTR_VISIBLE, FTR_STRICT, FTR_LOWER_SAFE, > ID_AA64ISAR0_SM3_SHIFT, 4, 0), > @@ -991,6 +992,7 @@ static bool has_no_fpsimd(const struct > arm64_cpu_capabilities *entry, int __unus > HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM3_SHIFT, FTR_UNSIGNED, > 1, CAP_HWCAP, HWCAP_SM3), > HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_SM4_SHIFT, FTR_UNSIGNED, > 1, CAP_HWCAP, HWCAP_SM4), > HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_DP_SHIFT, FTR_UNSIGNED, 1, > CAP_HWCAP, HWCAP_ASIMDDP), > + HWCAP_CAP(SYS_ID_AA64ISAR0_EL1, ID_AA64ISAR0_FHM_SHIFT, FTR_UNSIGNED, > 1, CAP_HWCAP, HWCAP_ASIMDFHM), > HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 0, > CAP_HWCAP, HWCAP_FP), > HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_FP_SHIFT, FTR_SIGNED, 1, > CAP_HWCAP, HWCAP_FPHP), > HWCAP_CAP(SYS_ID_AA64PFR0_EL1, ID_AA64PFR0_ASIMD_SHIFT, FTR_SIGNED, 0, > CAP_HWCAP, HWCAP_ASIMD), > diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c > index 1e25545..7f94623 100644 > --- a/arch/arm64/kernel/cpuinfo.c > +++ b/arch/arm64/kernel/cpuinfo.c > @@ -76,6 +76,7 @@ > "asimddp", > "sha512", > "sve", > + "asimdfhm", > NUL