Hi Mike, On Fri, Apr 04 2025, Mike Rapoport wrote:
[...] > As for the optimizations of memblock reserve path, currently it what hurts > the most in my and Pratyush experiments. They are not very representative, > but still, preserving lots of pages/folios spread all over would have it's > toll on the mm initialization. And I don't think invasive changes to how > buddy and memory map initialization are the best way to move forward and > optimize that. Quite possibly we'd want to be able to minimize amount of > *ranges* that we preserve. > > So from the three alternatives we have now (xarrays + bitmaps, tables + > bitmaps and maple tree for ranges) maple tree seems to be the simplest and > efficient enough to start with. But you'd need to somehow serialize the maple tree ranges into some format. So you would either end up going back to the kho_mem ranges we had, or have to invent something more complex. The sample code you wrote is pretty much going back to having kho_mem ranges. And if you say that we should minimize the amount of ranges, the table + bitmaps is still a fairly good data structure. You can very well have a higher order table where your entire range is a handful of bits. This lets you track a small number of ranges fairly efficiently -- both in terms of memory and in terms of CPU. I think the only place where it doesn't work as well as a maple tree is if you want to merge or split a lot ranges quickly. But if you say that you only want to have a handful of ranges, does that really matter? Also, I think the allocation pattern depends on which use case you have in mind. For hypervisor live update, you might very well only have a handful of ranges. The use case I have in mind is for taking a userspace process, quickly checkpointing it by dumping its memory contents to a memfd, and restoring it after KHO. For that, the ability to do random sparse allocations quickly helps a lot. So IMO the table works well for both sparse and dense allocations. So why have a data structure that only solves one problem when we can have one that solves both? And honestly, I don't think the table is that much more complex either -- both in terms of understanding the idea and in terms of code -- the whole thing is like 200 lines. Also, I think changes to buddy initialization _is_ the way to optimize boot times. Having maple tree ranges and moving them around into memblock ranges does not really scale very well for anything other than a handful of ranges, and we shouldn't limit ourselves to that without good reason. > > Preserving folio orders with it is really straighforward and until we see > some real data of how the entire KHO machinery is used, I'd prefer simple > over anything else. -- Regards, Pratyush Yadav