On Mon, Apr 07, 2025 at 11:16:26AM -0300, Jason Gunthorpe wrote: > On Sun, Apr 06, 2025 at 07:11:14PM +0300, Mike Rapoport wrote: > > > > > We know what the future use case is for the folio preservation, all > > > > > the drivers and the iommu are going to rely on this. > > > > > > > > We don't know how much of the preservation will be based on folios. > > > > > > I think almost all of it. Where else does memory come from for drivers? > > > > alloc_pages()? vmalloc()? > > alloc_pages is a 0 order "folio". vmalloc is an array of 0 order > folios (?)
According to current Matthew's plan [1] vmalloc is misc memory :) > > How about we find some less ambiguous term? Using "folio" for memory > > returned from kmalloc is really confusing. And even alloc_pages() does not > > treat all memory it returns as folios. > > > > How about we call them ranges? ;-) > > memdescs if you want to be forward looking. It is not ranges. > > The point very much is that they are well defined allocations from the > buddy allocator that can be freed back to the buddy allocator. We > provide an API sort of like alloc_pages/folio_alloc to get the pointer > back out and that is the only way to use it. > > KHO needs to provide a way to give back an allocated struct page/folio > that can be freed back to the buddy alloactor, of the proper > order. Whatever you call that function it belongs to KHO as it is > KHO's primary responsibility to manage the buddy allocator and the > struct pages. > > Today initializing the folio is the work required to do that. Ok, let's stick with memdesc then. Put aside the name it looks like we do agree that KHO needs to provide a way to preserve memory allocated from buddy along with some of the metadata describing that memory, like order for multi-order allocations. The issue I see with bitmaps is that there's nothing except the order that we can save. And if sometime later we'd have to recreate memdesc for that memory, that would mean allocating a correct data structure, i.e. struct folio, struct slab, struct vmalloc maybe. I'm not sure we are going to preserve slabs at least at the foreseeable future, but vmalloc seems like something that we'd have to address. > > I did and experiment with preserving 8G of memory allocated with randomly > > chosen order. For each order (0 to 10) I've got roughly 1000 "folios". I > > measured time kho_mem_deserialize() takes with xarrays + bitmaps vs maple > > tree based implementation. The maple tree outperformed by factor of 10 and > > it's serialized data used 6 times less memory. > > That seems like it means most of your memory ended up contiguous and > the maple tree didn't split nodes to preserve order. :\ I was cheating to some extent but not that much. I preserved order in kho_mem_info_t and if the folios next to each other were of different orders they were not merged into a single maple tree node. But in case all memory is free and not fragmented my understanding is that buddy will allocate folios of the same order next to each other and so they could be merged in the maple tree. > Also the bitmap scanning to optimize the memblock reserve isn't > implemented for xarray.. I don't think this is representative.. I believe that even with optimization of bitmap scanning maple tree would perform much better when the memory is not fragmented. And when it is fragmented both will need to call memblock_reserve() similar number of times and there won't be real difference. Of course maple tree will consume much more memory in the worst case. [1] https://kernelnewbies.org/MatthewWilcox/Memdescs > Jason > -- Sincerely yours, Mike.