On Tue, Sep 10, 2019 at 09:28:10AM -0500, Steve Wahl wrote: > On Mon, Sep 09, 2019 at 11:14:14AM +0300, Kirill A. Shutemov wrote: > > On Fri, Sep 06, 2019 at 04:29:50PM -0500, Steve Wahl wrote: > > > ... > > > The answer is to invalidate the pages of this table outside the > > > address range occupied by the kernel before the page table is > > > activated. This patch has been validated to fix this problem on our > > > hardware. > > > > If the goal is to avoid *any* mapping of the reserved region to stop > > speculation, I don't think this patch will do the job. We still (likely) > > have the same memory mapped as part of the identity mapping. And it > > happens at least in two places: here and before on decompression stage. > > I imagine you are likely correct, ideally you would not map any > reserved pages in these spaces. > > I've been reading the code to try to understand what you say above. > For identity mappings in the kernel, I see level2_ident_pgt mapping > the first 1G.
This is for XEN case. Not sure how relevant it is for you. > And I see early_dyanmic_pgts being set up with an identity mapping of > the kernel that seems to be pretty well restricted to the range _text > through _end. Right, but rounded to 2M around the place kernel was decompressed to. Some of reserved areas from the listing below are smaller then 2M or not aligned to 2M. > Within the decompression code, I see an identity mapping of the first > 4G set up within the 32 bit code. I believe we go past that to the > startup_64 entry point. (I don't know how common that path is, but I > don't have a way to test it without figuring out how to force it.) Kernel can start in 64-bit mode directly and in this case we inherit page tables from bootloader/BIOS. They trusted to provide identity mapping to cover at least kernel (plus some more essential stuff), but it's free to map more. > From a pragmatic standpoint, the guy who can verify this for me is on > vacation, but I believe our BIOS won't ever place the halt-causing > ranges in a space below 4GiB. Which explains why this patch works for > our hardware. (We do have reserved regions below 4G, just not the > ones that hardware causes a halt for accessing.) > > In case it helps you picture the situation, our hardware takes a small > portion of RAM from the end of each NUMA node (or it might be pairs or > quads of NUMA nodes, I'm not entirely clear on this at the moment) for > its own purposes. Here's a section of our e820 table: > > [ 0.000000] BIOS-e820: [mem 0x000000007c000000-0x000000008fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000f8000000-0x00000000fbffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x00000000fe000000-0x00000000fe010fff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000000100000000-0x0000002f7fffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000002f80000000-0x000000303fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000003040000000-0x0000005f7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000005f7c000000-0x000000603fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000006040000000-0x0000008f7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000008f7c000000-0x000000903fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000009040000000-0x000000bf7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x000000bf7c000000-0x000000c03fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x000000c040000000-0x000000ef7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x000000ef7c000000-0x000000f03fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x000000f040000000-0x0000011f7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000011f7c000000-0x000001203fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000012040000000-0x0000014f7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000014f7c000000-0x000001503fffffff] reserved > [ 0.000000] BIOS-e820: [mem 0x0000015040000000-0x0000017f7bffffff] usable > [ 0.000000] BIOS-e820: [mem 0x0000017f7c000000-0x000001803fffffff] reserved It would be interesting to know which of them are problematic. > Our problem occurs when KASLR (or kexec) places the kernel close > enough to the end of one of the usable sections, and the 1G of 1:1 > mapped space includes a portion of the following reserved section, and > speculation touches the reserved area. Are you sure that it's speculative access to blame? Speculative access must not cause change in architectural state. -- Kirill A. Shutemov