OK, I wasted :-) way too much time, but here's a text file that can be comment-ified or stored somewhere alongside the code or whatever...
(While drawing this I realized that there's at least one "wasted" page if the machine has .5 TB or less: we can just leave zero slots in the corresponding L4 direct-map entries. But that would require switching to the bcopy() method also mentioned below. Or indexing into vmspace0.vm_pmap.pm_pml4, which is basically the same thing.) Chris ----- There are six -- or sometimes five -- sets of pages allocated here at boot time to map physical memory in two ways. Note that each page, regardless of level, stores 512 PTEs (or PDEs or PDPs, but let's just use PTE here and prefix it with "level" as needed: 4, 3, 2, or 1.) There is one page for the top level, L4, page table entries. Each L4 PTE maps 512 GB of space. Unless it's marked "invalid", no L4 PTE can be marked "stop here": it either is marked as "this address is invalid", or it points to one physically-adressed page full of L3 PTEs. Eventually, those L3 PTEs will map-or-reject half a terabyte. 512 entries, each mapping .5 TB, allow us to map 256 TB, which is as much as the hardware supports (there are, in effect, only 48 virtual address bits: the top 16 bits must match the 47th bit). The L4 entry halfway down, at PML4PML4I, is set to point back to this page itself; that's the "recursive page table" for user space, which we do nothing else with at boot time. We need (up to) NDMPML4E pages, each holding 512 L3 PTEs, for the direct map space. If the processor supports 1 GB pages, an L3 PTE can be marked with "stop here" and these L3 PTEs each grant (or forbid) access to 1 GB of physical space at a time. A system with, say, 3 GB of RAM starting at 0 can map it all with three L3 PTEs: "address 0 is valid for 1GB", "address 1GB is valid for 1GB", "address 2GB is valid for 1GB". The remaining L3 PTEs are zero, making the remaining address space invalid. If the processor does not support 1 GB pages, or if there is less than 1 GB of RAM "at the end" (e.g., if the system has 4.5 GB), the L3 PTEs may need to point to more pages holding L2 PTEs. These L2 PTEs always support 2 MB pages. Each page of L2 PTEs maps 1 GB. So a machine with 4.5 GB and 1 GB mappings needs one L3 page with four valid 1 GB L3 PTEs and then one L3 PTE pointing to one page of L2 PTEs. That one page of L2 PTEs is half-filled, containing 256 2MB-sized PTEs, mapping the 512 MB. The remaining half of that page is zero, making the remaining addresses invalid. Pictorially, and adding the names of the physical page(s), thus far we have this. (Note, the L4 PTE page is drawn more than twice as tall as the L3 and L2 pages, just to get space for arrows.) LEVEL 4: LEVEL 3: LEVEL 2: _._ KPML4phys v \ +---------+ | | 0: | | |---------| | | 1: | | DMPDPphys DMPDphys ( ... ) | .-> +---------+ +----------------+ | 127: | | / | 0: 0GB | .-> | 0: 4GB | |---------| | | | 1: 1GB | / | 1: 4GB+2MB | PML4PML4I: | 128: *--|--/ | | 2: 2GB | / | 2: 4GB+4MB | |---------| | | 3: 3GB | / ( ... ) | 129: | | | 4: *--|-/ | 255: 4.5GB-2MB | | ... | | | 5: | | 256: | ________ |---------| | ( ... ) | 257: | / DMPML4I: | *--|-----/ | 511: | ( ... ) NDMPML4E |---------| +---------+ +----------------+ \________ | *--|---------> | 0: | |---------| | 1: | | | | 2: | (These are used only |---------| | 3: | if the system has more | ... | ( ... ) than 512 GB) ( |---------| ) | 509: | ( | 510: see below ) | 510: | ( |---------| ) | 511: | ( | 511: see below ) +---------+ +---------+ If the hardware supports 1GB pages, "ndm1g" is the number of gigabyte entries (4 in the example above). Otherwise it's just zero. Meanwhile "ndmpdp" is the number of gigabytes of RAM that need to be mapped, in this case 5. Thus, if ndmpdp > ndm1g, we need ndmpdp-ndm1g pages to hold some L2 PTEs. Now we get to the weirder case of the kernel itself (both its non-direct-mapped dynamically allocated virtual memory, and its text/data/bss). The branch offset limitations encourage the placement of the kernel's text, etc., in the last 2 GB of virtual space, i.e., starting at 0xffff.ffff.f800.0000. But, we want a reasonable amount of room for dynamic VM. So we give the kernel at least 512 GB of VM -- that's one L4 PTE -- while making sure that the text snuggles up close to the end of the space, in that last 2 GB of the at-least-512-GB area. Meanwhile, the boot loader has loaded the kernel into relatively low physical memory addresses. If KPML4I is 511 (and it actually is), this uses the final L4 slot to map the kernel. If we want to allow kernel VM to have more than 512 GB available, though, we need extra space below KPML4I, i.e., starting at KPMLBASE. So we allocate NKPML4E pages that we set up as L3 PTEs, and point the last NKPML4E slots in the L4 page table here. If NKPML4E is 4, for instance, we will have this: last part of KPML4phys: ( ... ) .----> [page #0 of all-zero L3 PTEs] | DMPML4I | / ( ... ) | .--> [page #1 of all-zero L3 PTEs] | 507: | | / | 508: *--|--/ | .-> [page #2 of all-zero L3 PTEs] | 509: *--|----/ | | 510: *--|------/ | 511: *--|---------> [page #3 of L3 PTEs, see below] +---------+ The reason for having those "empty" (all-zero) PTE pages is that whenever new processes are created, in pmap_pinit(), they have their (new) L4 PTE page set up to point to the *same* physical pages that the kernel is using. Thus, if the kernel creates or destroys any level-3-or-below mapping by writing into any of the above four pages, that mapping is also created/destroyed in all processes. Similarly, the NDMPML4 pages starting at DMPDPphys are mapped identically in all processes. The kernel can therefore "borrow" a user pmap at any time, i.e., there's no need to adjust the CPU's CR4 on entry to the kernel. (If we used bcopy() to copy the kernel pmap's NKPML4E and NDMPML4E entries into the new pmap, the L3 pages would not have to be physically contiguous, but the KVA ones would still all have to exist. It's free to allocate physically contiguous pages here anyway though.) So, the last NKPML4E slots in KPML4phys point to the following page tables, which use all of L3, L2, and L1 style PTEs. (Note that we did not need any L1 PTEs for the direct map, which always uses 2MB or 1GB super-pages.) LEVEL 3: LEVEL 2: LEVEL 1: (assuming NKPML4=4) (nkpt pages) KPDPphys KPTphys +---------+ +---------------+ page 0 | 0: | .-> | 0: 0 KB | | 1: | / | 1: 4 KB | | 2: | / | 2: 8 KB | | 3: | / | 3: 12 KB | ( ... ) | ( ... ) | 509: | | | 509: 2MB-12KB | | 510: | | | 510: 2MB-8KB | | 511: | | | 511: 2MB-4KB | +---------+ | +---------------+ page 1 | 0: | | .-> | 0: 2 MB | | 1: | | / | 1: 2MB+4KB | | 2: | | | ( ... ) | 3: | | | ( ... ) ( ... ) | | +---------------+ | 509: | | | .-> ( ... ) | 510: | | | | ( ... ) | 511: | KPDphys | | | +---------------+ +---------+ +---------+ | | | ..( ... ... ... ) page 2 | 0: | .---> | 0: *--|--/ | | . [etc] | 1: | / | 1: *--|---/ | . | 2: | | | 2: *--|-----/ . | 3: | | | 3: *--|---.... ( ... ) | ( ... ) | 509: | | | 509: ...| | 510: | | | 510: ...| | 511: | | | 511: ...| +---------+ | +---------+ page 3 | 0: | | .-> | 0: ...| | 1: | | / ( ... ) | 2: | | | ( ... ) | 3: | | | ( ... ) ( ... ) | | ( ... ) | 509: | | | ( ... ) | 510: *--|--/ | ( ... ) | 511: *--|----/ | 511: | +---------+ +---------+ There are nkpdpe pages at KPDphys, where nkpdpe is either 1 or 2. One page maps 1 GB, and the other page maps the remaining 1 GB. Remember that kernel text+data+bss lives in the final 2 GB of the virtual address space, so there cannot be more than 2 GB. These one or two pages map nkpt pages at KPTphys. _______________________________________________ freebsd-hackers@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-hackers To unsubscribe, send any mail to "freebsd-hackers-unsubscr...@freebsd.org"