On Thu, Mar 11, 2021 at 08:06:53PM +0100, David Hildenbrand wrote: > This looks essentially good to me, except some parts in > mhp_supports_memmap_on_memory() > > > +bool mhp_supports_memmap_on_memory(unsigned long size) > > +{ > > + unsigned long pageblock_size = PFN_PHYS(pageblock_nr_pages); > > + unsigned long remaining_mem = size - PMD_SIZE;
Hi David, thanks for the review! > This looks weird. I think what you want to test is that > > > a) "nr_vmemmap_pages * sizeof(struct page)" spans complete PMDs (IOW, we > won't map too much via the altmap when populating a PMD in the vmemmap) > > b) "remaining = size - nr_vmemmap_pages * sizeof(struct page)" spans > complete pageblock. We do not know the nr_vmemmap_pages at this point in time, although it is easy to pre-compute. For every section we populate, we use PMD_SIZE. So, PMD_SIZE/PAGE_SIZE lays the nr_vmemmap_pages that are used for populating a single section. But let me explain the reasoning I use in the current code: I will enumarate the assumptions that must hold true in order to support the feature together with their check: - We span a single memory block size == memory_block_size_bytes() - The vmemmap pages span a complete PMD and no more than a PMD. !(PMD_SIZE % sizeof(struct page)) - The vmemmap pages and the pages exposed to the buddy have to cover full pageblocks remaining_mem = size - PMD_SIZE; IS_ALIGNED(remaining_mem, pageblock_size) Although this check only covers the range without the vmemmap pages, one could argue that since we use only a PMD_SIZE at a time, we know that PMD_SIZE is pageblock aligned, so the vmemmap range is PMD_SIZE as well. Now, I see how this might be confusing and rather incomplete. So I guess a better and more clear way to write it would be: bool mhp_supports_memmap_on_memory(unsigned long size) { unsigned long nr_vmemmap_pages = PMD_SIZE / PAGE_SIZE; unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page); unsigned long remaining_size = size - vmemmap_size; return memmap_on_memory && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && !(PMD_SIZE % vmemmap_size) && IS_ALIGNED(vmemmap_size, pageblock_size) && remaining_size && IS_ALIGNED(remaining_size, pageblock_size); } Note that above check is only for a single section, but if assumptions hold true for a single section, it will for many as well. We could be orthodox and do: bool mhp_supports_memmap_on_memory(unsigned long size) { unsigned long nr_sections = (1ULL << SECTION_SHIFT) / memory_block_size_bytes; unsigned long nr_vmemmap_pages = (PMD_SIZE / PAGE_SIZE) * nr_sections; unsigned long vmemmap_size = nr_vmemmap_pages * sizeof(struct page); unsigned long remaining_size = size - vmemmap_size; return memmap_on_memory && IS_ENABLED(CONFIG_MHP_MEMMAP_ON_MEMORY) && size == memory_block_size_bytes() && !(PMD_SIZE % vmemmap_size) && IS_ALIGNED(vmemmap_size, pageblock_size) && remaining_size && IS_ALIGNED(remaining_size, pageblock_size); } to check for all sections, but I do not think it is necessary. What do you think? > I suggest a restructuring, compressing the information like: > > " > Besides having arch support and the feature enabled at runtime, we need a > few more assumptions to hold true: > > a) We span a single memory block: memory onlining/offlining happens in > memory block granularity. We don't want the vmemmap of online memory blocks > to reside on offline memory blocks. In the future, we might want to support > variable-sized memory blocks to make the feature more versatile. > > b) The vmemmap pages span complete PMDs: We don't want vmemmap code to > populate memory from the altmap for unrelated parts (i.e., other memory > blocks). > > c) The vmemmap pages (and thereby the pages that will be exposed to the > buddy) have to cover full pageblocks: memory onlining/offlining code > requires applicable ranges to be page-aligned, for example, to set the > migratetypes properly. > " I am fine with the above, I already added it, thanks. > Do we have to special case / protect against the vmemmap optimization for > hugetlb pages? Or is that already blocked somehow and I missed it? Yes, hugetlb-vmemmap feature disables vmemmap on PMD population [1] [1] https://patchwork.kernel.org/project/linux-mm/patch/20210308102807.59745-7-songmuc...@bytedance.com/ -- Oscar Salvador SUSE L3