On 13 Aug 2025, at 17:12, Wei Yang wrote: > On Tue, Aug 12, 2025 at 11:55:10AM -0400, Zi Yan wrote: > [...] >> +/* >> + * gather_folio_orders - scan through [vaddr_start, len) and record folio >> orders >> + * @vaddr_start: start vaddr >> + * @len: range length >> + * @pagemap_fd: file descriptor to /proc/<pid>/pagemap >> + * @kpageflags_fd: file descriptor to /proc/kpageflags >> + * @orders: output folio order array >> + * @nr_orders: folio order array size >> + * >> + * gather_folio_orders() scan through [vaddr_start, len) and check all >> folios >> + * within the range and record their orders. All order-0 pages will be >> recorded. > > I feel a little confused about the description here. Especially on the > behavior when the range is not aligned on folio boundary.
I was too ambitious on this function. It is intended to just check after split folio orders. I will move the function to split_huge_page_test.c and rename it to gather_after_split_folio_orders() and check_after_split_folio_orders(). > > See following code at 1) and 2). > >> + * Non-present vaddr is skipped. >> + * >> + * >> + * Return: 0 - no error, -1 - unhandled cases >> + */ >> +static int gather_folio_orders(char *vaddr_start, size_t len, >> + int pagemap_fd, int kpageflags_fd, >> + int orders[], int nr_orders) >> +{ >> + uint64_t page_flags = 0; >> + int cur_order = -1; >> + char *vaddr; >> + >> + if (!pagemap_fd || !kpageflags_fd) >> + return -1; > > If my understanding is correct, we use open() to get a file descriptor. > > On error it returns -1. And 0 is a possible valid value, but usually used by > stdin. The code may work in most cases, but seems not right. Will fix it to if (pagemap_fd == -1 || kpageflags_fd == -1) > >> + if (nr_orders <= 0) >> + return -1; >> + > > Maybe we want to check orders[] here too? > >> + for (vaddr = vaddr_start; vaddr < vaddr_start + len;) { >> + char *next_folio_vaddr; >> + int status; >> + >> + status = get_page_flags(vaddr, pagemap_fd, kpageflags_fd, >> + &page_flags); >> + if (status < 0) >> + return -1; >> + >> + /* skip non present vaddr */ >> + if (status == 1) { >> + vaddr += psize(); >> + continue; >> + } >> + >> + /* all order-0 pages with possible false postive (non folio) */ > > Do we still false positive case? Non-present page returns 1, which is handled > above. Any order-0 non folio will be counted, like GFP_KERNEL pages. > >> + if (!(page_flags & (KPF_COMPOUND_HEAD | KPF_COMPOUND_TAIL))) { >> + orders[0]++; >> + vaddr += psize(); >> + continue; >> + } >> + >> + /* skip non thp compound pages */ >> + if (!(page_flags & KPF_THP)) { >> + vaddr += psize(); >> + continue; >> + } >> + >> + /* vpn points to part of a THP at this point */ >> + if (page_flags & KPF_COMPOUND_HEAD) >> + cur_order = 1; >> + else { >> + /* not a head nor a tail in a THP? */ >> + if (!(page_flags & KPF_COMPOUND_TAIL)) >> + return -1; > > When reaches here, we know (page_flags & (KPF_COMPOUND_HEAD | > KPF_COMPOUND_TAIL)). > So we have at least one of it set. > > Looks not possible to hit it? Will remove it. > >> + >> + vaddr += psize(); >> + continue; > > 1) > > In case vaddr points to the middle of a large folio, this will skip this folio > and count from next one. > >> + } >> + >> + next_folio_vaddr = vaddr + (1UL << (cur_order + pshift())); >> + >> + if (next_folio_vaddr >= vaddr_start + len) >> + break; >> + >> + while ((status = get_page_flags(next_folio_vaddr, pagemap_fd, >> + kpageflags_fd, >> + &page_flags)) >= 0) { >> + /* >> + * non present vaddr, next compound head page, or >> + * order-0 page >> + */ >> + if (status == 1 || >> + (page_flags & KPF_COMPOUND_HEAD) || >> + !(page_flags & (KPF_COMPOUND_HEAD | >> KPF_COMPOUND_TAIL))) { >> + if (cur_order < nr_orders) { >> + orders[cur_order]++; >> + cur_order = -1; >> + vaddr = next_folio_vaddr; >> + } >> + break; >> + } >> + >> + /* not a head nor a tail in a THP? */ >> + if (!(page_flags & KPF_COMPOUND_TAIL)) >> + return -1; >> + >> + cur_order++; >> + next_folio_vaddr = vaddr + (1UL << (cur_order + >> pshift())); > > 2) > > If (vaddr_start + len) points to the middle of a large folio and folio is more > than order 1 size, we may continue the loop and still count this last folio. > Because we don't check next_folio_vaddr and (vaddr_start + len). > > A simple chart of these case. > > vaddr_start + len > | | > v v > +---------------------+ +-----------------+ > |folio 1 | |folio 2 | > +---------------------+ +-----------------+ > > folio 1 is not counted, but folio 2 is counted. > > So at 1) and 2) handles the boundary differently. Not sure this is designed > behavior. If so I think it would be better to record in document, otherwise > the behavior is not obvious to user. Will document it. > >> + } >> + >> + if (status < 0) >> + return status; >> + } >> + if (cur_order > 0 && cur_order < nr_orders) >> + orders[cur_order]++; > > Another boundary case here. > > If we come here because (next_folio_vaddr >= vaddr_start + len) in the for > loop instead of the while loop. This means we found the folio head at vaddr, > but the left range (vaddr_start + len - vaddr) is less than or equal to order > 1 page size. > > But we haven't detected the real end of this folio. If this folio is more than > order 1 size, we still count it an order 1 folio. Yes. Will document it. Thanks for the review. > >> + return 0; >> +} >> + >> +int check_folio_orders(char *vaddr_start, size_t len, int pagemap_fd, >> + int kpageflags_fd, int orders[], int nr_orders) >> +{ >> + int *vaddr_orders; >> + int status; >> + int i; >> + >> + vaddr_orders = (int *)malloc(sizeof(int) * nr_orders); >> + >> + if (!vaddr_orders) >> + ksft_exit_fail_msg("Cannot allocate memory for vaddr_orders"); >> + >> + memset(vaddr_orders, 0, sizeof(int) * nr_orders); >> + status = gather_folio_orders(vaddr_start, len, pagemap_fd, >> + kpageflags_fd, vaddr_orders, nr_orders); >> + if (status) >> + goto out; >> + >> + status = 0; >> + for (i = 0; i < nr_orders; i++) >> + if (vaddr_orders[i] != orders[i]) { >> + ksft_print_msg("order %d: expected: %d got %d\n", i, >> + orders[i], vaddr_orders[i]); >> + status = -1; >> + } >> + >> +out: >> + free(vaddr_orders); >> + return status; >> +} > > -- > Wei Yang > Help you, Help me Best Regards, Yan, Zi