On 4 Feb 2022, at 8:56, Oscar Salvador wrote: > On Wed, Jan 19, 2022 at 02:06:20PM -0500, Zi Yan wrote: >> From: Zi Yan <z...@nvidia.com> >> >> alloc_contig_range() worked at MAX_ORDER-1 granularity to avoid merging >> pageblocks with different migratetypes. It might unnecessarily convert >> extra pageblocks at the beginning and at the end of the range. Change >> alloc_contig_range() to work at pageblock granularity. >> >> It is done by restoring pageblock types and split >pageblock_order free >> pages after isolating at MAX_ORDER-1 granularity and migrating pages >> away at pageblock granularity. The reason for this process is that >> during isolation, some pages, either free or in-use, might have >pageblock >> sizes and isolating part of them can cause free accounting issues. >> Restoring the migratetypes of the pageblocks not in the interesting >> range later is much easier. > > Hi Zi Yan, > > Due to time constraints I only glanced over, so some comments below > about stuff that caught my eye:
Thanks for looking at the patches! > >> +static inline void split_free_page_into_pageblocks(struct page *free_page, >> + int order, struct zone *zone) >> +{ >> + unsigned long pfn; >> + >> + spin_lock(&zone->lock); >> + del_page_from_free_list(free_page, zone, order); >> + for (pfn = page_to_pfn(free_page); >> + pfn < page_to_pfn(free_page) + (1UL << order); > > It migt make sense to have a end_pfn variable so that does not have to > be constantly evaluated. Or maybe the compiler is clever enough to only > evualuate it once. Sure. Will add end_pfn. > >> + pfn += pageblock_nr_pages) { >> + int mt = get_pfnblock_migratetype(pfn_to_page(pfn), pfn); >> + >> + __free_one_page(pfn_to_page(pfn), pfn, zone, pageblock_order, >> + mt, FPI_NONE); >> + } >> + spin_unlock(&zone->lock); > > It is possible that free_page's order is already pageblock_order, so I > would add a one-liner upfront to catch that case and return, otherwise > we do the delete_from_freelist-and-free_it_again dance. Make sense. Will do. > >> + /* Save the migratepages of the pageblocks before start and after end */ >> + num_pageblock_to_save = (alloc_start - isolate_start) / >> pageblock_nr_pages >> + + (isolate_end - alloc_end) / >> pageblock_nr_pages; >> + saved_mt = >> + kmalloc_array(num_pageblock_to_save, >> + sizeof(unsigned char), GFP_KERNEL); >> + if (!saved_mt) >> + return -ENOMEM; >> + >> + num = save_migratetypes(saved_mt, isolate_start, alloc_start); >> + >> + num = save_migratetypes(&saved_mt[num], alloc_end, isolate_end); > > I really hope we can put all this magic within start_isolate_page_range, > and the counterparts in undo_isolate_page_range. > That is my plan too. > Also, I kinda dislike the &saved_mt thing. I thought about some other > approaches but nothing that wasn't too specific for this case, and I > guess we want that function to be as generic as possible. > I do not like it either. This whole save and restore thing should go away once I make MIGRATE_ISOLATE a standalone bit, so that the original migrateypes will not be overwritten during isolation. Hopefully, I can work on it soon to get rid of this hunk completely. >> + /* >> + * Split free page spanning [alloc_end, isolate_end) and put the >> + * pageblocks in the right migratetype list >> + */ >> + for (outer_end = alloc_end; outer_end < isolate_end;) { >> + unsigned long begin_pfn = outer_end; >> + >> + order = 0; >> + while (!PageBuddy(pfn_to_page(outer_end))) { >> + if (++order >= MAX_ORDER) { >> + outer_end = begin_pfn; >> + break; >> + } >> + outer_end &= ~0UL << order; >> + } >> + >> + if (outer_end != begin_pfn) { >> + order = buddy_order(pfn_to_page(outer_end)); >> + >> + /* >> + * split the free page has start page and put the >> pageblocks >> + * in the right migratetype list >> + */ >> + VM_BUG_ON(outer_end + (1UL << order) <= begin_pfn); > > How could this possibily happen? Right. It is not possible. Will remove it. > >> + { >> + struct page *free_page = pfn_to_page(outer_end); >> + >> + split_free_page_into_pageblocks(free_page, >> order, cc.zone); >> + } >> + outer_end += 1UL << order; >> + } else >> + outer_end = begin_pfn + 1; >> } > > I think there are cases could optimize for. If the page has already been > split in pageblock by the outer_start loop, we could skip this outer_end > logic altogether. > > E.g: An order-10 page is split in two pageblocks. There's nothing else > to be done, right? We could skip this. Yes. I will think about it more and do some optimization. -- Best Regards, Yan, Zi
signature.asc
Description: OpenPGP digital signature