On Thu, Apr 03, 2025 at 03:50:04PM +0000, Pratyush Yadav wrote: > The patch currently has a limitation where it does not free any of the > empty tables after a unpreserve operation. But Changyuan's patch also > doesn't do it so at least it is not any worse off.
We do we even have unpreserve? Just discard the entire KHO operation in a bulk. > When working on this patch, I realized that kho_mem_deserialize() is > currently _very_ slow. It takes over 2 seconds to make memblock > reservations for 48 GiB of 0-order pages. I suppose this can later be > optimized by teaching memblock_free_all() to skip preserved pages > instead of making memblock reservations. Yes, this was my prior point of not having actual data to know what the actual hot spots are.. This saves a few ms on an operation that takes over 2 seconds :) > +typedef unsigned long khomem_desc_t; This should be more like: union { void *table; phys_addr_t table_phys; }; Since we are not using the low bits right now and it is alot cheaper to convert from va to phys only once during the final step. __va is not exactly fast. > +#define PTRS_PER_LEVEL (PAGE_SIZE / sizeof(unsigned long)) > +#define KHOMEM_L1_BITS (PAGE_SIZE * BITS_PER_BYTE) > +#define KHOMEM_L1_MASK ((1 << ilog2(KHOMEM_L1_BITS)) - 1) > +#define KHOMEM_L1_SHIFT (PAGE_SHIFT) > +#define KHOMEM_L2_SHIFT (KHOMEM_L1_SHIFT + > ilog2(KHOMEM_L1_BITS)) > +#define KHOMEM_L3_SHIFT (KHOMEM_L2_SHIFT + > ilog2(PTRS_PER_LEVEL)) > +#define KHOMEM_L4_SHIFT (KHOMEM_L3_SHIFT + > ilog2(PTRS_PER_LEVEL)) > +#define KHOMEM_PFN_MASK PAGE_MASK This all works better if you just use GENMASK and FIELD_GET > +static int __khomem_table_alloc(khomem_desc_t *desc) > +{ > + if (khomem_desc_none(*desc)) { Needs READ_ONCE > +struct kho_mem_track { > + /* Points to L4 KHOMEM descriptor, each order gets its own table. */ > + struct xarray orders; > +}; I think it would be easy to add a 5th level and just use bits 63:57 as a 6 bit order. Then you don't need all this stuff either. > +int kho_preserve_folio(struct folio *folio) > +{ > + unsigned long pfn = folio_pfn(folio); > + unsigned int order = folio_order(folio); > + int err; > + > + if (!kho_enable) > + return -EOPNOTSUPP; > + > + down_read(&kho_out.tree_lock); This lock still needs to go away > +static void kho_mem_serialize(void) > +{ > + struct kho_mem_track *tracker = &kho_mem_track; > + khomem_desc_t *desc; > + unsigned long order; > + > + xa_for_each(&tracker->orders, order, desc) { > + if (WARN_ON(order >= NR_PAGE_ORDERS)) > + break; > + kho_out.mem_tables[order] = *desc; Missing the virt_to_phys? > + nr_tables = min_t(unsigned int, len / sizeof(*tables), NR_PAGE_ORDERS); > + for (order = 0; order < nr_tables; order++) > + khomem_walk_preserved((khomem_desc_t *)&tables[order], order, Missing phys_to_virt Please dont' remove the KHOSER stuff, and do use it with proper structs and types. It is part of keeping this stuff understandable. Jason