Re: [PATCH 03/10] mfd / platform: cros_ec: Miscellaneous character device to talk with the EC
On Thu, 06 Jun 2019, Randy Dunlap wrote: > > On Thu, 2019-06-06 at 16:51 +0200, Greg Kroah-Hartman wrote: > >> Again, don't be noisy, it's not hard, and is how things have been > >> trending for many years now. > > Ack that. Not to say that this particular print is acceptable, but there are places where a low-level (dbg/info) print on successful probe is helpful. Driver initialisation is important! There's a big difference between drivers 'being noisy', spilling all sorts of information that may well be useful or interesting to a driver developer, but has little value to anyone else, and providing a single print to say that a device has been detected and successfully initialised/probed. I recently fell victim to a silent, but fully functional device. Successful device initialisation should not be silent when debugging has been set to the highest level IMHO. And yes, of course turning on debugging for Driver Core works, but is not practical for all cases and is certainly not the first port of call when figuring out why initialisation seems to be failing for a single particular device. Truly surplus churn should absolutely be removed from the boot log, or at the very least downgraded, leaving only truly useful information such as highlighting a newly detected device for example. If the user wants an even more silent boot log, they should turn the log level down a notch. That is why we have log levels after all. Simply removing all useful prints regardless of log-level is not the way to go IMHO. -- Lee Jones [李琼斯] Linaro Services Technical Lead Linaro.org │ Open source software for ARM SoCs Follow Linaro: Facebook | Twitter | Blog
[RESEND PATCH] Documentation/stackprotector: powerpc supports stack protector
powerpc architecture (both 64-bit and 32-bit) supports stack protector mechanism since some time now [see commit 06ec27aea9fc ("powerpc/64: add stack protector support")]. Update stackprotector arch support documentation to reflect the same. Cc: Jonathan Corbet Cc: Michael Ellerman Cc: linux-doc@vger.kernel.org Cc: linux-ker...@vger.kernel.org Cc: linuxppc-...@lists.ozlabs.org Signed-off-by: Bhupesh Sharma --- Resend, this time Cc'ing Jonathan and doc-list. Documentation/features/debug/stackprotector/arch-support.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/features/debug/stackprotector/arch-support.txt b/Documentation/features/debug/stackprotector/arch-support.txt index ea521f3e..32bbdfc64c32 100644 --- a/Documentation/features/debug/stackprotector/arch-support.txt +++ b/Documentation/features/debug/stackprotector/arch-support.txt @@ -22,7 +22,7 @@ | nios2: | TODO | |openrisc: | TODO | | parisc: | TODO | -| powerpc: | TODO | +| powerpc: | ok | | riscv: | TODO | |s390: | TODO | | sh: | ok | -- 2.7.4
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Fri, 2019-06-07 at 13:43 -0700, Andy Lutomirski wrote: > > On Jun 7, 2019, at 12:49 PM, Yu-cheng Yu wrote: > > > > On Fri, 2019-06-07 at 11:29 -0700, Andy Lutomirski wrote: > > > > On Jun 7, 2019, at 10:59 AM, Dave Hansen wrote: > > > > > > > > > On 6/7/19 10:43 AM, Peter Zijlstra wrote: > > > > > I've no idea what the kernel should do; since you failed to answer the > > > > > question what happens when you point this to garbage. > > > > > > > > > > Does it then fault or what? > > > > > > > > Yeah, I think you'll fault with a rather mysterious CR2 value since > > > > you'll go look at the instruction that faulted and not see any > > > > references to the CR2 value. > > > > > > > > I think this new MSR probably needs to get included in oops output when > > > > CET is enabled. > > > > > > This shouldn’t be able to OOPS because it only happens at CPL 3, > > > right? We > > > should put it into core dumps, though. > > > > > > > > > > > Why don't we require that a VMA be in place for the entire bitmap? > > > > Don't we need a "get" prctl function too in case something like a JIT is > > > > running and needs to find the location of this bitmap to set bits > > > > itself? > > > > > > > > Or, do we just go whole-hog and have the kernel manage the bitmap > > > > itself. Our interface here could be: > > > > > > > > prctl(PR_MARK_CODE_AS_LEGACY, start, size); > > > > > > > > and then have the kernel allocate and set the bitmap for those code > > > > locations. > > > > > > Given that the format depends on the VA size, this might be a good > > > idea. I > > > bet we can reuse the special mapping infrastructure for this — the VMA > > > could > > > be a MAP_PRIVATE special mapping named [cet_legacy_bitmap] or similar, and > > > we > > > can even make special rules to core dump it intelligently if needed. And > > > we > > > can make mremap() on it work correctly if anyone (CRIU?) cares. > > > > > > Hmm. Can we be creative and skip populating it with zeros? The CPU > > > should > > > only ever touch a page if we miss an ENDBR on it, so, in normal operation, > > > we > > > don’t need anything to be there. We could try to prevent anyone from > > > *reading* it outside of ENDBR tracking if we want to avoid people > > > accidentally > > > wasting lots of memory by forcing it to be fully populated when the read > > > it. > > > > > > The one downside is this forces it to be per-mm, but that seems like a > > > generally reasonable model anyway. > > > > > > This also gives us an excellent opportunity to make it read-only as seen > > > from > > > userspace to prevent exploits from just poking it full of ones before > > > redirecting execution. > > > > GLIBC sets bits only for legacy code, and then makes the bitmap read- > > only. That > > avoids most issues: > > How does glibc know the linear address space size? We don’t want LA64 to > break old binaries because the address calculation changed. When an application starts, its highest stack address is determined. It uses that as the maximum the bitmap needs to cover. Yu-cheng
Re: [PATCH v3 00/33] Convert files to ReST - part 1
On Sun, Jun 09, 2019 at 09:29:40AM -0300, Mauro Carvalho Chehab wrote: > Em Sun, 9 Jun 2019 11:16:43 +0200 > > Will there be a web page (e.g. kernel.org), which contains always the > > latest upstream version? > > Yes: > > https://www.kernel.org/doc/html/latest/ > > I guess this one is based on Linus tree. > > Jon also maintains a version at: > > https://static.lwn.net/kerneldoc/ > > I guess that one is based on docs-next branch from the Docs tree. > > Btw, if you want to build it for yourself, you could use: > > make htmldocs Thanks a lot! > > I can pick these up for s390. Or do you want to send the whole series > > in one go upstream? > > Yeah, feel free to pick them via the s390 tree. Ok, applied! :)
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Sat, 2019-06-08 at 22:52 +0200, Pavel Machek wrote: > Hi! > > > > I've no idea what the kernel should do; since you failed to answer the > > > question what happens when you point this to garbage. > > > > > > Does it then fault or what? > > > > Yeah, I think you'll fault with a rather mysterious CR2 value since > > you'll go look at the instruction that faulted and not see any > > references to the CR2 value. > > > > I think this new MSR probably needs to get included in oops output when > > CET is enabled. > > > > Why don't we require that a VMA be in place for the entire bitmap? > > Don't we need a "get" prctl function too in case something like a JIT is > > running and needs to find the location of this bitmap to set bits itself? > > > > Or, do we just go whole-hog and have the kernel manage the bitmap > > itself. Our interface here could be: > > > > prctl(PR_MARK_CODE_AS_LEGACY, start, size); > > > > and then have the kernel allocate and set the bitmap for those code > > locations. > > For the record, that sounds like a better interface than userspace knowing > about the bitmap formats... > Pavel Initially we implemented the bitmap that way. To manage the bitmap, every time the application issues a syscall for a .so it loads, and the kernel does copy_from_user() & copy_to_user() (or similar things). If a system has a few legacy .so files and every application does the same, it can take a long time to boot up. Yu-cheng
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Fri, 2019-06-07 at 15:27 -0700, Andy Lutomirski wrote: > > On Jun 7, 2019, at 2:09 PM, Dave Hansen wrote: > > > > On 6/7/19 1:06 PM, Yu-cheng Yu wrote: > > > > Huh, how does glibc know about all possible past and future legacy code > > > > in the application? > > > > > > When dlopen() gets a legacy binary and the policy allows that, it will > > > manage > > > the bitmap: > > > > > > If a bitmap has not been created, create one. > > > Set bits for the legacy code being loaded. > > > > I was thinking about code that doesn't go through GLIBC like JITs. > > CRIU is another consideration: it would be rather annoying if CET programs > can’t migrate between LA57 and normal machines. When a machine migrates, does its applications' addresses change? If no, then the bitmap should still work, right? Yu-cheng
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Fri, 2019-06-07 at 14:09 -0700, Dave Hansen wrote: > On 6/7/19 1:06 PM, Yu-cheng Yu wrote: > > > Huh, how does glibc know about all possible past and future legacy code > > > in the application? > > > > When dlopen() gets a legacy binary and the policy allows that, it will > > manage > > the bitmap: > > > > If a bitmap has not been created, create one. > > Set bits for the legacy code being loaded. > > I was thinking about code that doesn't go through GLIBC like JITs. If JIT manages the bitmap, it knows where it is. It can always read the bitmap again, right? Yu-cheng
Re: [PATCH v3 09/33] docs: fault-injection: convert docs to ReST and rename to *.rst
In data Sunday, June 9, 2019 4:26:59 AM CEST, Mauro Carvalho Chehab ha scritto: > The conversion is actually: > - add blank lines and identation in order to identify paragraphs; > - fix tables markups; > - add some lists markups; > - mark literal blocks; > - adjust title markups. > > At its new index.rst, let's add a :orphan: while this is not linked to > the main index.rst file, in order to avoid build warnings. > > Signed-off-by: Mauro Carvalho Chehab > --- > ...ault-injection.txt => fault-injection.rst} | 265 +- > Documentation/fault-injection/index.rst | 20 ++ > ...r-inject.txt => notifier-error-inject.rst} | 18 +- > ...injection.txt => nvme-fault-injection.rst} | 174 ++-- > ...rovoke-crashes.txt => provoke-crashes.rst} | 40 ++- > Documentation/process/4.Coding.rst| 2 +- > .../translations/it_IT/process/4.Coding.rst | 2 +- Limited to translations/it_IT Acked-by: Federico Vaga > .../translations/zh_CN/process/4.Coding.rst | 2 +- > drivers/misc/lkdtm/core.c | 2 +- > include/linux/fault-inject.h | 2 +- > lib/Kconfig.debug | 2 +- > tools/testing/fault-injection/failcmd.sh | 2 +- > 12 files changed, 290 insertions(+), 241 deletions(-) > rename Documentation/fault-injection/{fault-injection.txt => > fault-injection.rst} (68%) create mode 100644 > Documentation/fault-injection/index.rst > rename Documentation/fault-injection/{notifier-error-inject.txt => > notifier-error-inject.rst} (83%) rename > Documentation/fault-injection/{nvme-fault-injection.txt => > nvme-fault-injection.rst} (19%) rename > Documentation/fault-injection/{provoke-crashes.txt => provoke-crashes.rst} > (45%) >
Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
On Fri, 2019-06-07 at 19:01 +0100, Dave Martin wrote: > On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote: > > An ELF file's .note.gnu.property indicates features the executable file > > can support. For example, the property GNU_PROPERTY_X86_FEATURE_1_AND > > indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or > > GNU_PROPERTY_X86_FEATURE_1_SHSTK. > > > > With this patch, if an arch needs to setup features from ELF properties, > > it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific > > arch_setup_property(). > > > > For example, for X86_64: > > > > int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool inter) > > { > > int r; > > uint32_t property; > > > > r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND, > > &property); > > ... > > } > > Although this code works for the simple case, I have some concerns about > some aspects of the implementation here. There appear to be some bounds > checking / buffer overrun issues, and the code seems quite complex. > > Maybe this patch tries too hard to be compatible with toolchains that do > silly things such as embedding huge notes in an executable, or mixing > NT_GNU_PROPERTY_TYPE_0 in a single PT_NOTE with a load of junk not > relevant to the loader. I wonder whether Linux can dictate what > interpretation(s) of the ELF specs it is prepared to support, rather than > trying to support absolutely anything. To me, looking at PT_GNU_PROPERTY and not trying to support anything is a logical choice. And it breaks only a limited set of toolchains. I will simplify the parser and leave this patch as-is for anyone who wants to back-port. Are there any objections or concerns? Yu-cheng
Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
On Mon, Jun 10, 2019 at 09:29:04AM -0700, Yu-cheng Yu wrote: > On Fri, 2019-06-07 at 19:01 +0100, Dave Martin wrote: > > On Thu, Jun 06, 2019 at 01:06:41PM -0700, Yu-cheng Yu wrote: > > > An ELF file's .note.gnu.property indicates features the executable file > > > can support. For example, the property GNU_PROPERTY_X86_FEATURE_1_AND > > > indicates the file supports GNU_PROPERTY_X86_FEATURE_1_IBT and/or > > > GNU_PROPERTY_X86_FEATURE_1_SHSTK. > > > > > > With this patch, if an arch needs to setup features from ELF properties, > > > it needs CONFIG_ARCH_USE_GNU_PROPERTY to be set, and a specific > > > arch_setup_property(). > > > > > > For example, for X86_64: > > > > > > int arch_setup_property(void *ehdr, void *phdr, struct file *f, bool > > > inter) > > > { > > > int r; > > > uint32_t property; > > > > > > r = get_gnu_property(ehdr, phdr, f, GNU_PROPERTY_X86_FEATURE_1_AND, > > >&property); > > > ... > > > } > > > > Although this code works for the simple case, I have some concerns about > > some aspects of the implementation here. There appear to be some bounds > > checking / buffer overrun issues, and the code seems quite complex. > > > > Maybe this patch tries too hard to be compatible with toolchains that do > > silly things such as embedding huge notes in an executable, or mixing > > NT_GNU_PROPERTY_TYPE_0 in a single PT_NOTE with a load of junk not > > relevant to the loader. I wonder whether Linux can dictate what > > interpretation(s) of the ELF specs it is prepared to support, rather than > > trying to support absolutely anything. > > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a > logical choice. And it breaks only a limited set of toolchains. > > I will simplify the parser and leave this patch as-is for anyone who wants to > back-port. Are there any objections or concerns? No objection from me ;) But I'm biased. Hopefully this change should allow substantial simplification. For one thing, PT_GNU_PROPERTY tells its file offset and size directly in its phdrs entry. That should save us a lot of effort on the kernel side. Cheers ---Dave
Re: [PATCH v7 22/27] binfmt_elf: Extract .note.gnu.property from an ELF file
* Yu-cheng Yu: > To me, looking at PT_GNU_PROPERTY and not trying to support anything is a > logical choice. And it breaks only a limited set of toolchains. > > I will simplify the parser and leave this patch as-is for anyone who wants to > back-port. Are there any objections or concerns? Red Hat Enterprise Linux 8 does not use PT_GNU_PROPERTY and is probably the largest collection of CET-enabled binaries that exists today. My hope was that we would backport the upstream kernel patches for CET, port the glibc dynamic loader to the new kernel interface, and be ready to run with CET enabled in principle (except that porting userspace libraries such as OpenSSL has not really started upstream, so many processes where CET is particularly desirable will still run without it). I'm not sure if it is a good idea to port the legacy support if it's not part of the mainline kernel because it comes awfully close to creating our own private ABI. Thanks, Florian
Re: [PATCH v3 13/33] docs: infiniband: convert docs to ReST and rename to *.rst
On Sat, Jun 08, 2019 at 11:27:03PM -0300, Mauro Carvalho Chehab wrote: > The InfiniBand docs are plain text with no markups. > So, all we needed to do were to add the title markups and > some markup sequences in order to properly parse tables, > lists and literal blocks. > > At its new index.rst, let's add a :orphan: while this is not linked to > the main index.rst file, in order to avoid build warnings. > > Signed-off-by: Mauro Carvalho Chehab > --- > .../{core_locking.txt => core_locking.rst}| 64 ++- > Documentation/infiniband/index.rst| 23 > .../infiniband/{ipoib.txt => ipoib.rst} | 24 ++-- > .../infiniband/{opa_vnic.txt => opa_vnic.rst} | 108 +- > .../infiniband/{sysfs.txt => sysfs.rst} | 4 +- > .../{tag_matching.txt => tag_matching.rst}| 5 + > .../infiniband/{user_mad.txt => user_mad.rst} | 33 -- > .../{user_verbs.txt => user_verbs.rst}| 12 +- > drivers/infiniband/core/user_mad.c| 2 +- > drivers/infiniband/ulp/ipoib/Kconfig | 2 +- > 10 files changed, 174 insertions(+), 103 deletions(-) > rename Documentation/infiniband/{core_locking.txt => core_locking.rst} (78%) > create mode 100644 Documentation/infiniband/index.rst > rename Documentation/infiniband/{ipoib.txt => ipoib.rst} (90%) > rename Documentation/infiniband/{opa_vnic.txt => opa_vnic.rst} (63%) > rename Documentation/infiniband/{sysfs.txt => sysfs.rst} (69%) > rename Documentation/infiniband/{tag_matching.txt => tag_matching.rst} (98%) > rename Documentation/infiniband/{user_mad.txt => user_mad.rst} (90%) > rename Documentation/infiniband/{user_verbs.txt => user_verbs.rst} (93%) Looks OK to me, do you want to run these patches through the docs tree or through RDMA? Given that we've generally pushed doc updates through rdma, I think I'd prefer the latter? Jonathan? Thanks, Jason
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
* Yu-cheng Yu: > On Fri, 2019-06-07 at 14:09 -0700, Dave Hansen wrote: >> On 6/7/19 1:06 PM, Yu-cheng Yu wrote: >> > > Huh, how does glibc know about all possible past and future legacy code >> > > in the application? >> > >> > When dlopen() gets a legacy binary and the policy allows that, it will >> > manage >> > the bitmap: >> > >> > If a bitmap has not been created, create one. >> > Set bits for the legacy code being loaded. >> >> I was thinking about code that doesn't go through GLIBC like JITs. > > If JIT manages the bitmap, it knows where it is. > It can always read the bitmap again, right? The problem are JIT libraries without assembler code which can be marked non-CET, such as liborc. Our builds (e.g., orc-0.4.29-2.fc30.x86_64) currently carries the IBT and SHSTK flag, although the entry points into the generated code do not start with ENDBR, so that a jump to them will fault with the CET enabled. Thanks, Florian
Re: [PATCH v3 13/33] docs: infiniband: convert docs to ReST and rename to *.rst
On Mon, 10 Jun 2019 14:27:12 -0300 Jason Gunthorpe wrote: > Looks OK to me, do you want to run these patches through the docs tree > or through RDMA? > > Given that we've generally pushed doc updates through rdma, I think > I'd prefer the latter? Jonathan? Whichever works best for you is fine with me; go ahead and take them. jon
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 9:05 AM, Yu-cheng Yu wrote: > On Fri, 2019-06-07 at 14:09 -0700, Dave Hansen wrote: >> On 6/7/19 1:06 PM, Yu-cheng Yu wrote: Huh, how does glibc know about all possible past and future legacy code in the application? >>> When dlopen() gets a legacy binary and the policy allows that, it will >>> manage >>> the bitmap: >>> >>> If a bitmap has not been created, create one. >>> Set bits for the legacy code being loaded. >> I was thinking about code that doesn't go through GLIBC like JITs. > If JIT manages the bitmap, it knows where it is. > It can always read the bitmap again, right? Let's just be clear: The design proposed here is that all code mappers (anybody wanting to get legacy non-CET code into the address space): 1. Know about CET 2. Know where the bitmap is, and identify the part that needs to be changed 3. Be able to mprotect() the bitmap to be writable (undoing glibc's PROT_READ) 4. Set the bits in the bitmap for the legacy code 5. mprotect() the bitmap back to PROT_READ Do the non-glibc code mappers have glibc interfaces for this? Otherwise, how could a bunch of JITs in a big multi-threaded application possibly coordinate the mprotect()s? Won't they race with each other?
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 8:22 AM, Yu-cheng Yu wrote: >> How does glibc know the linear address space size? We don’t want LA64 to >> break old binaries because the address calculation changed. > When an application starts, its highest stack address is determined. > It uses that as the maximum the bitmap needs to cover. Huh, I didn't think we ran code from the stack. ;) Especially given the way that we implemented the new 5-level-paging address space, I don't think that expecting code to be below the stack is a good universal expectation.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, 2019-06-10 at 11:02 -0700, Dave Hansen wrote: > On 6/10/19 8:22 AM, Yu-cheng Yu wrote: > > > How does glibc know the linear address space size? We don’t want LA64 to > > > break old binaries because the address calculation changed. > > > > When an application starts, its highest stack address is determined. > > It uses that as the maximum the bitmap needs to cover. > > Huh, I didn't think we ran code from the stack. ;) > > Especially given the way that we implemented the new 5-level-paging > address space, I don't think that expecting code to be below the stack > is a good universal expectation. Yes, you make a good point. However, allowing the application manage the bitmap is the most efficient and flexible. If the loader finds a legacy lib is beyond the bitmap can cover, it can deal with the problem by moving the lib to a lower address; or re-allocate the bitmap. If the loader cannot allocate a big bitmap to cover all 5-level address space (the bitmap will be large), it can put all legacy lib's at lower address. We cannot do these easily in the kernel. Yu-cheng
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 12:38 PM, Yu-cheng Yu wrote: >>> When an application starts, its highest stack address is determined. >>> It uses that as the maximum the bitmap needs to cover. >> Huh, I didn't think we ran code from the stack. ;) >> >> Especially given the way that we implemented the new 5-level-paging >> address space, I don't think that expecting code to be below the stack >> is a good universal expectation. > Yes, you make a good point. However, allowing the application manage the > bitmap > is the most efficient and flexible. If the loader finds a legacy lib is > beyond > the bitmap can cover, it can deal with the problem by moving the lib to a > lower > address; or re-allocate the bitmap. How could the loader reallocate the bitmap and coordinate with other users of the bitmap? > If the loader cannot allocate a big bitmap to cover all 5-level > address space (the bitmap will be large), it can put all legacy lib's > at lower address. We cannot do these easily in the kernel. This is actually an argument to do it in the kernel. The kernel can always allocate the virtual space however it wants, no matter how large. If we hide the bitmap behind a kernel API then we can put it at high 5-level user addresses because we also don't have to worry about the high bits confusing userspace.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, Jun 10, 2019 at 12:52 PM Dave Hansen wrote: > > On 6/10/19 12:38 PM, Yu-cheng Yu wrote: > >>> When an application starts, its highest stack address is determined. > >>> It uses that as the maximum the bitmap needs to cover. > >> Huh, I didn't think we ran code from the stack. ;) > >> > >> Especially given the way that we implemented the new 5-level-paging > >> address space, I don't think that expecting code to be below the stack > >> is a good universal expectation. > > Yes, you make a good point. However, allowing the application manage the > > bitmap > > is the most efficient and flexible. If the loader finds a legacy lib is > > beyond > > the bitmap can cover, it can deal with the problem by moving the lib to a > > lower > > address; or re-allocate the bitmap. > > How could the loader reallocate the bitmap and coordinate with other > users of the bitmap? > > > If the loader cannot allocate a big bitmap to cover all 5-level > > address space (the bitmap will be large), it can put all legacy lib's > > at lower address. We cannot do these easily in the kernel. > > This is actually an argument to do it in the kernel. The kernel can > always allocate the virtual space however it wants, no matter how large. > If we hide the bitmap behind a kernel API then we can put it at high > 5-level user addresses because we also don't have to worry about the > high bits confusing userspace. > That's a fairly compelling argument. The bitmap is one bit per page, right? So it's smaller than the address space by a factor of 8*2^12 == 2^15. This means that, if we ever get full 64-bit linear addresses reserved entirely for userspace (which could happen if my perennial request to Intel to split user and kernel addresses completely happens), then we'll need 2^48 bytes for the bitmap, which simply does not fit in the address space of a legacy application.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, 2019-06-10 at 12:52 -0700, Dave Hansen wrote: > On 6/10/19 12:38 PM, Yu-cheng Yu wrote: > > > > When an application starts, its highest stack address is determined. > > > > It uses that as the maximum the bitmap needs to cover. > > > > > > Huh, I didn't think we ran code from the stack. ;) > > > > > > Especially given the way that we implemented the new 5-level-paging > > > address space, I don't think that expecting code to be below the stack > > > is a good universal expectation. > > > > Yes, you make a good point. However, allowing the application manage the > > bitmap > > is the most efficient and flexible. If the loader finds a legacy lib is > > beyond > > the bitmap can cover, it can deal with the problem by moving the lib to a > > lower > > address; or re-allocate the bitmap. > > How could the loader reallocate the bitmap and coordinate with other > users of the bitmap? Assuming the loader actually chooses to re-allocate, it can copy the old bitmap over to the new before doing the switch. But, I agree, the other choice is easier; the loader can simply put the lib at lower address. AFAIK, the loader does not request high address in mmap(). > > > If the loader cannot allocate a big bitmap to cover all 5-level > > address space (the bitmap will be large), it can put all legacy lib's > > at lower address. We cannot do these easily in the kernel. > > This is actually an argument to do it in the kernel. The kernel can > always allocate the virtual space however it wants, no matter how large. > If we hide the bitmap behind a kernel API then we can put it at high > 5-level user addresses because we also don't have to worry about the > high bits confusing userspace. We actually tried this. The kernel needs to reserve the bitmap space in the beginning for every CET-enabled app, regardless of actual needs. On each memory request, the kernel then must consider a percentage of allocated space in its calculation, and on systems with less memory this quickly becomes a problem.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 1:27 PM, Yu-cheng Yu wrote: >>> If the loader cannot allocate a big bitmap to cover all 5-level >>> address space (the bitmap will be large), it can put all legacy lib's >>> at lower address. We cannot do these easily in the kernel. >> This is actually an argument to do it in the kernel. The kernel can >> always allocate the virtual space however it wants, no matter how large. >> If we hide the bitmap behind a kernel API then we can put it at high >> 5-level user addresses because we also don't have to worry about the >> high bits confusing userspace. > We actually tried this. The kernel needs to reserve the bitmap space in the > beginning for every CET-enabled app, regardless of actual needs. I don't think this is a problem. In fact, I think reserving the space is actually the only sane behavior. If you don't reserve it, you fundamentally limit where future legacy instructions can go. One idea is that we always size the bitmap for the 48-bit addressing systems. Legacy code probably doesn't _need_ to go in the new address space, and if we do this we don't have to worry about the gigantic 57-bit address space bitmap. > On each memory request, the kernel then must consider a percentage of > allocated space in its calculation, and on systems with less memory > this quickly becomes a problem. I'm not sure what you're referring to here? Are you referring to our overcommit limits?
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, 2019-06-10 at 13:43 -0700, Dave Hansen wrote: > On 6/10/19 1:27 PM, Yu-cheng Yu wrote: > > > > If the loader cannot allocate a big bitmap to cover all 5-level > > > > address space (the bitmap will be large), it can put all legacy lib's > > > > at lower address. We cannot do these easily in the kernel. > > > > > > This is actually an argument to do it in the kernel. The kernel can > > > always allocate the virtual space however it wants, no matter how large. > > > If we hide the bitmap behind a kernel API then we can put it at high > > > 5-level user addresses because we also don't have to worry about the > > > high bits confusing userspace. > > > > We actually tried this. The kernel needs to reserve the bitmap space in the > > beginning for every CET-enabled app, regardless of actual needs. > > I don't think this is a problem. In fact, I think reserving the space > is actually the only sane behavior. If you don't reserve it, you > fundamentally limit where future legacy instructions can go. > > One idea is that we always size the bitmap for the 48-bit addressing > systems. Legacy code probably doesn't _need_ to go in the new address > space, and if we do this we don't have to worry about the gigantic > 57-bit address space bitmap. > > > On each memory request, the kernel then must consider a percentage of > > allocated space in its calculation, and on systems with less memory > > this quickly becomes a problem. > > I'm not sure what you're referring to here? Are you referring to our > overcommit limits? Yes.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 1:58 PM, Yu-cheng Yu wrote: >>> On each memory request, the kernel then must consider a percentage of >>> allocated space in its calculation, and on systems with less memory >>> this quickly becomes a problem. >> I'm not sure what you're referring to here? Are you referring to our >> overcommit limits? > Yes. My assumption has always been that these large, potentially sparse hardware tables *must* be mmap()'d with MAP_NORESERVE specified. That should keep them from being problematic with respect to overcommit.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, 2019-06-10 at 15:02 -0700, Dave Hansen wrote: > On 6/10/19 1:58 PM, Yu-cheng Yu wrote: > > > > On each memory request, the kernel then must consider a percentage of > > > > allocated space in its calculation, and on systems with less memory > > > > this quickly becomes a problem. > > > > > > I'm not sure what you're referring to here? Are you referring to our > > > overcommit limits? > > > > Yes. > > My assumption has always been that these large, potentially sparse > hardware tables *must* be mmap()'d with MAP_NORESERVE specified. That > should keep them from being problematic with respect to overcommit. Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and VM_DONTDUMP. The bitmap will cover only 48-bit address space. We then create PR_MARK_CODE_AS_LEGACY. The kernel will set the bitmap, but it is going to be slow. Perhaps we still let the app fill the bitmap? Yu-cheng
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 3:40 PM, Yu-cheng Yu wrote: > Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and > VM_DONTDUMP. The bitmap will cover only 48-bit address space. Could you make sure to discuss the downsides of only doing a 48-bit address space? What are the reasons behind and implications of VM_DONTDUMP? > We then create PR_MARK_CODE_AS_LEGACY. The kernel will set the bitmap, but it > is going to be slow. Slow compared to what? We're effectively adding one (quick) system call to a path that, today, has at *least* half a dozen syscalls and probably a bunch of page faults. Heck, we can probably avoid the actual page fault to populate the bitmap if we're careful. That alone would put a syscall on equal footing with any other approach. If the bit setting crossed a page boundary it would probably win. > Perhaps we still let the app fill the bitmap? I think I'd want to see some performance data on it first.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On Mon, Jun 10, 2019 at 3:59 PM Dave Hansen wrote: > > > We then create PR_MARK_CODE_AS_LEGACY. The kernel will set the bitmap, but > > it > > is going to be slow. > > Slow compared to what? We're effectively adding one (quick) system call > to a path that, today, has at *least* half a dozen syscalls and probably > a bunch of page faults. Heck, we can probably avoid the actual page > fault to populate the bitmap if we're careful. That alone would put a > syscall on equal footing with any other approach. If the bit setting > crossed a page boundary it would probably win. > > > Perhaps we still let the app fill the bitmap? > > I think I'd want to see some performance data on it first. Updating legacy bitmap in user space from kernel requires long q; get_user(q, ...); q |= mask; put_user(q, ...); instead of *p |= mask; get_user + put_user was quite slow when we tried before. -- H.J.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 4:20 PM, H.J. Lu wrote: >>> Perhaps we still let the app fill the bitmap? >> I think I'd want to see some performance data on it first. > Updating legacy bitmap in user space from kernel requires > > long q; > > get_user(q, ...); > q |= mask; > put_user(q, ...); > > instead of > > *p |= mask; > > get_user + put_user was quite slow when we tried before. Numbers, please. There are *lots* of ways to speed something like that up if you have actual issues with it. For instance, you can skip the get_user() for whole bytes. You can write bits with 0's for unallocated address space. You can do user_access_begin/end() to avoid bunches of STAC/CLACs... The list goes on and on. :)
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
> On Jun 10, 2019, at 3:59 PM, Dave Hansen wrote: > >> On 6/10/19 3:40 PM, Yu-cheng Yu wrote: >> Ok, we will go back to do_mmap() with MAP_PRIVATE, MAP_NORESERVE and >> VM_DONTDUMP. The bitmap will cover only 48-bit address space. > > Could you make sure to discuss the downsides of only doing a 48-bit > address space? > > What are the reasons behind and implications of VM_DONTDUMP? > >> We then create PR_MARK_CODE_AS_LEGACY. The kernel will set the bitmap, but >> it >> is going to be slow. > > Slow compared to what? We're effectively adding one (quick) system call > to a path that, today, has at *least* half a dozen syscalls and probably > a bunch of page faults. Heck, we can probably avoid the actual page > fault to populate the bitmap if we're careful. That alone would put a > syscall on equal footing with any other approach. If the bit setting > crossed a page boundary it would probably win. > >> Perhaps we still let the app fill the bitmap? > > I think I'd want to see some performance data on it first. Trying to summarize: If we manage the whole thing in user space, we are basically committing to only covering 48 bits — otherwise the whole model falls apart in quite a few ways. We gain some simplicity in the kernel. If we do it in the kernel, we still have to decide how much address space to cover. We get to play games like allocating the bitmap above 2^48, but then we might have CRIU issues if we migrate to a system with fewer BA bits. I doubt that the performance matters much one way or another. I just don’t expect any of this to be a bottleneck. Another benefit of kernel management: we could plausibly auto-clear the bits corresponding to munmapped regions. Is this worth it? And a maybe-silly benefit: if we manage it in the kernel, we could optimize the inevitable case where the bitmap contains pages that are all ones :). If it’s in userspace, KSM could do the, but that will be inefficient at best.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
On 6/10/19 4:54 PM, Andy Lutomirski wrote: > Another benefit of kernel management: we could plausibly auto-clear > the bits corresponding to munmapped regions. Is this worth it? I did it for MPX. I think I even went to the trouble of zapping the whole pages that got unused. But, MPX tables took 80% of the address space, worst-case. This takes 0.003% :) The only case it would really matter would be a task was long-running, used legacy executables/JITs, and was mapping/unmapping text all over the address space. That seems rather unlikely.
Re: [PATCH v7 03/14] x86/cet/ibt: Add IBT legacy code bitmap setup function
> On Jun 10, 2019, at 5:08 PM, Dave Hansen wrote: > >> On 6/10/19 4:54 PM, Andy Lutomirski wrote: >> Another benefit of kernel management: we could plausibly auto-clear >> the bits corresponding to munmapped regions. Is this worth it? > > I did it for MPX. I think I even went to the trouble of zapping the > whole pages that got unused. > > But, MPX tables took 80% of the address space, worst-case. This takes > 0.003% :) The only case it would really matter would be a task was > long-running, used legacy executables/JITs, and was mapping/unmapping > text all over the address space. That seems rather unlikely. Every wasted page still costs 4K plus page table overhead. The worst case is a JIT that doesn’t clean up and leaks legacy bitmap memory all over. We can blame the JIT, but the actual attribution could be complicated. It also matters when you unmap one thing, map something else, and are sad when the legacy bits are still set. Admittedly, it’s a bit hard to imagine the exploit that takes advantage of this.