From: Yijia Wang <[email protected]> check_file_mmap faults a page in the middle of a file mapping and expects the readahead window to have populated further pages *after* it. With a large base page size this is wrong and the test fails reliably, e.g. on arm64 with 64K pages:
# mincore_selftest.c:260:check_file_mmap:Expected ra_pages (0) > 0 (0) # mincore_selftest.c:261:check_file_mmap:No read-ahead pages found in memory not ok 4 global.check_file_mmap The read-around window is read_ahead_kb / page_size pages wide and is centred on the faulting page (mm/filemap.c): ra->start = max(0, pgoff - ra_pages / 2) With the default 128K budget that is 32 pages on 4K but only 2 pages on 64K, so the window becomes [pgoff-1, pgoff] and the single extra page lands *before* the faulted page. The forward-only scan finds nothing. This is not a huge-page (THP) effect; it is just a byte-sized readahead budget divided by a large page size plus a fault-centred window. Count readahead pages on both sides of the faulted page, and skip the readahead check up front when there is no usable read-around window: either it is narrower than two pages (readahead disabled, or the base page size so large that read_ahead_kb covers a single page), or the file has no backing block device whose read_ahead_kb can be read (e.g. on tmpfs). When the window is wide enough the EXPECT_GT(ra_pages, 0) assertion is kept, so a kernel that should read ahead but does not is still caught. Signed-off-by: Yijia Wang <[email protected]> --- How the problem was tracked down, step by step: 1. On arm64 with 64K base pages, check_file_mmap fails: # mincore_selftest.c:260:check_file_mmap:Expected ra_pages (0) > 0 (0) # mincore_selftest.c:261:check_file_mmap:No read-ahead pages found in memory not ok 4 global.check_file_mmap 2. The test faults the middle page and scans only *forward* for the readahead pages, so "ra_pages == 0" means nothing was read in after the faulted page. 3. Readahead is a byte budget (read_ahead_kb, default 128K) turned into a page count: ra_pages = read_ahead_kb / page_size. That is 32 pages on 4K but only 2 pages on 64K. 4. The mmap read-around window is centred on the fault: ra->start = max(0, pgoff - ra_pages / 2). With 2 pages the window is [pgoff-1, pgoff] - the one extra page lands *before* the fault. 5. Measured directly on the 64K box, varying read_ahead_kb, faulting page 32: read_ahead_kb=64 (1 page): resident: 32 (no neighbour) read_ahead_kb=128 (2 pages): resident: 31 32 (neighbour before) read_ahead_kb=192 (3 pages): resident: 31 32 33 (before and after) So the neighbour really is before the fault, and a 1-page window brings in no neighbour at all. This is not a THP effect. 6. A file with no backing block device (e.g. tmpfs) has no read_ahead_kb to read and no block-device readahead at all; faulting the middle page brings in only that page (before=0 after=0). The fix therefore skips when the window cannot be determined, not just when it is too narrow. Open question for the list: which fix do you prefer? (a) Move the fault towards the start of the mapping so the centred window always leaves a page *after* it, keeping the original forward-only scan. Minimal, but the page count after the fault is ceil(ra_pages/2) - 1, which is 0 whenever ra_pages <= 2; only a fault within the first ra_pages/2 pages (e.g. page 0) has a page after it. That changes the test's intent - it would exercise readahead at the boundary, not in the interior - and still fails when read_ahead_kb yields a 1-page window or on tmpfs. (b) Scan both sides of the faulted page (this patch), and skip up front when there is no usable read-around window. A little more code, but page-size-agnostic and it keeps the assertion whenever a neighbour is actually expected. I went with (b); happy to switch to (a) if that is preferred. An earlier posting of mine changed check_huge_pages instead - that was a misdiagnosis (on the 64K box that subtest merely SKIPs and was never the failure). Please disregard it; this version supersedes it. .../selftests/mincore/mincore_selftest.c | 82 ++++++++++++++++++- 1 file changed, 80 insertions(+), 2 deletions(-) diff --git a/tools/testing/selftests/mincore/mincore_selftest.c b/tools/testing/selftests/mincore/mincore_selftest.c index cdd022c1c..960851286 100644 --- a/tools/testing/selftests/mincore/mincore_selftest.c +++ b/tools/testing/selftests/mincore/mincore_selftest.c @@ -12,6 +12,8 @@ #include <unistd.h> #include <stdlib.h> #include <sys/mman.h> +#include <sys/stat.h> +#include <sys/sysmacros.h> #include <string.h> #include <fcntl.h> @@ -174,6 +176,48 @@ TEST(check_huge_pages) } +/* + * Return the size of the mmap read-around window, in pages, for the block + * device backing the file referred to by @fd, or -1 if it cannot be + * determined. The window size is the device's read_ahead_kb divided by the + * page size; the kernel centres this window on the faulting page, so it must + * be at least two pages wide for any neighbouring page to be read in. + */ +static long readahead_window_pages(int fd, long page_size) +{ + char path[64]; + struct stat st; + long ra_kb; + FILE *f; + + if (fstat(fd, &st)) + return -1; + + /* + * read_ahead_kb lives in the owning disk's queue/ directory. For a + * whole-disk device that is the device's own queue/; for a partition + * it is one level up ("..") at the parent disk. + */ + snprintf(path, sizeof(path), "/sys/dev/block/%u:%u/queue/read_ahead_kb", + major(st.st_dev), minor(st.st_dev)); + f = fopen(path, "r"); + if (!f) { + snprintf(path, sizeof(path), + "/sys/dev/block/%u:%u/../queue/read_ahead_kb", + major(st.st_dev), minor(st.st_dev)); + f = fopen(path, "r"); + if (!f) + return -1; + } + if (fscanf(f, "%ld", &ra_kb) != 1) { + fclose(f); + return -1; + } + fclose(f); + + return (ra_kb << 10) / page_size; +} + /* * Test mincore() behavior on a file-backed page. * No pages should be loaded into memory right after the mapping. Then, @@ -194,6 +238,7 @@ TEST(check_file_mmap) int fd; int i; int ra_pages = 0; + long ra_window; page_size = sysconf(_SC_PAGESIZE); vec_size = FILE_SIZE / page_size; @@ -224,6 +269,22 @@ TEST(check_file_mmap) SKIP(goto out_close, "fallocate not supported by filesystem."); } + /* + * mmap read-around brings in a window of pages centred on the + * faulting page. Its width is the backing device's read_ahead_kb + * divided by the page size. If that window is narrower than two + * pages - because readahead is disabled, or the base page size is so + * large that read_ahead_kb covers a single page - then no + * neighbouring page can ever be read in and the readahead part of + * this test does not apply, so skip it. The same goes for a file with + * no backing block device (e.g. on tmpfs), where the window cannot be + * determined and there is no block-device readahead to exercise. + */ + ra_window = readahead_window_pages(fd, page_size); + if (ra_window < 2) + SKIP(goto out_close, + "no usable readahead window for this configuration."); + /* * Map the whole file, the pages shouldn't be fetched yet. */ @@ -242,8 +303,11 @@ TEST(check_file_mmap) } /* - * Touch a page in the middle of the mapping. We expect the next - * few pages (the readahead window) to be populated too. + * Touch a page in the middle of the mapping. We expect the + * surrounding pages (the readahead window) to be populated too. + * The kernel centres the mmap read-around window on the faulting + * page, so with a large base page size the readahead pages may + * land before the touched page rather than after it. */ addr[FILE_SIZE / 2] = 1; retval = mincore(addr, FILE_SIZE, vec); @@ -252,11 +316,25 @@ TEST(check_file_mmap) TH_LOG("Page not found in memory after use"); } + /* Count readahead pages that landed before the touched page. */ + i = FILE_SIZE / 2 / page_size - 1; + while (i >= 0 && vec[i]) { + ra_pages++; + i--; + } + + /* Count readahead pages that landed after the touched page. */ i = FILE_SIZE / 2 / page_size + 1; while (i < vec_size && vec[i]) { ra_pages++; i++; } + + /* + * The readahead window is at least two pages wide here (narrow + * windows were skipped above), so the kernel must have brought in at + * least one neighbouring page on one side of the faulted page. + */ EXPECT_GT(ra_pages, 0) { TH_LOG("No read-ahead pages found in memory"); } -- 2.43.0

