On Mon, Apr 07, 2025 at 02:03:05PM -0300, Jason Gunthorpe wrote: > On Mon, Apr 07, 2025 at 07:31:21PM +0300, Mike Rapoport wrote: > > > > Ok, let's stick with memdesc then. Put aside the name it looks like we do > > agree that KHO needs to provide a way to preserve memory allocated from > > buddy along with some of the metadata describing that memory, like order > > for multi-order allocations. > > +1 > > > The issue I see with bitmaps is that there's nothing except the order that > > we can save. And if sometime later we'd have to recreate memdesc for that > > memory, that would mean allocating a correct data structure, i.e. struct > > folio, struct slab, struct vmalloc maybe. > > Yes. The caller would have to take care of this using a caller > specific serialization of any memdesc data. Like slab would have to > presumably record the object size and the object allocation bitmap. > > > I'm not sure we are going to preserve slabs at least at the foreseeable > > future, but vmalloc seems like something that we'd have to address. > > And I suspect vmalloc doesn't need to preserve any memdesc information? > It can all be recreated
vmalloc does not have anything in memdesc now, just plain order-0 pages from alloc_pages variants. Now we've settled with terminology, and given that currently memdesc == struct page, I think we need kho_preserve_folio(struct *folio) for actual struct folios and, apparently other high order allocations, and kho_preserve_pages(struct page *, int nr) for memblock, vmalloc and alloc_pages_exact. On the restore path kho_restore_folio() will recreate multi-order thingy by doing parts of what prep_new_page() does. And kho_restore_pages() will recreate order-0 pages as if they were allocated from buddy. If the caller needs more in its memdesc, it is responsible to fill in the missing bits. > > > Also the bitmap scanning to optimize the memblock reserve isn't > > > implemented for xarray.. I don't think this is representative.. > > > > I believe that even with optimization of bitmap scanning maple tree would > > perform much better when the memory is not fragmented. > > Hard to guess, bitmap scanning is not free, especially if there are > lots of zeros, but memory allocating maple tree nodes and locking them > is not free either so who knows where things cross over.. > > > And when it is fragmented both will need to call memblock_reserve() > > similar number of times and there won't be real difference. Of > > course maple tree will consume much more memory in the worst case. > > Yes. > > bitmaps are bounded like the comment says, 512K for 16G of memory with > arbitary order 0 fragmentation. > > Assuming absolute worst case fragmentation maple tree (@24 bytes per > range, alternating allocated/freed pattern) would require around > 50M. Then almost doubled since we have the maple tree and then the > serialized copy. > > 100Mb vs 512k - I will pick the 512K :) Nah, memory is cheap nowadays :) Ok, let's start with bitmaps and then see what are the actual bottlenecks we have to optimize. > Jason -- Sincerely yours, Mike.