On 26/08/2025 08:07, Dev Jain wrote: > We observed uffd-stress selftest failure on arm64 and intermittent failures > on x86 too: > running ./uffd-stress hugetlb-private 128 32 > > bounces: 17, mode: rnd read, ERROR: UFFDIO_COPY error: -12 (errno=12, > @uffd-common.c:617) [FAIL] > not ok 18 uffd-stress hugetlb-private 128 32 # exit=1 > > For this particular case, the number of free hugepages from run_vmtests.sh > will be 128, and the test will allocate 64 hugepages in the source > location. The stress() function will start spawning threads which will > operate on the destination location, triggering uffd-operations like > UFFDIO_COPY from src to dst, which means that we will require 64 more > hugepages for the dst location. > > Let us observe the locking_thread() function. It will lock the mutex kept > at dst, triggering uffd-copy. Suppose that 127 (64 for src and 63 for dst) > hugepages have been reserved. In case of BOUNCE_RANDOM, it may happen that > two threads trying to lock the mutex at dst, try to do so at the same > hugepage number. If one thread succeeds in reserving the last hugepage, > then the other thread may fail in alloc_hugetlb_folio(), returning -ENOMEM. > I can confirm that this is indeed the case by this hacky patch: > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 753f99b4c718..39eb21d8a91b 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c > @@ -6929,6 +6929,11 @@ int hugetlb_mfill_atomic_pte(pte_t *dst_pte, > > folio = alloc_hugetlb_folio(dst_vma, dst_addr, false); > if (IS_ERR(folio)) { > + pte_t *actual_pte = hugetlb_walk(dst_vma, dst_addr, > PMD_SIZE); > + if (actual_pte) { > + ret = -EEXIST; > + goto out; > + } > ret = -ENOMEM; > goto out; > } > > This code path gets triggered indicating that the PMD at which one thread > is trying to map a hugepage, gets filled by a racing thread. > > Therefore, instead of using freepgs to compute the amount of memory, > use freepgs - 10, so that the test still has some extra hugepages to use. > Note that, in case this value underflows, there is a check for the number > of free hugepages in the test itself, which will fail, so we are safe. > > Signed-off-by: Dev Jain <dev.j...@arm.com> > --- > tools/testing/selftests/mm/run_vmtests.sh | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/tools/testing/selftests/mm/run_vmtests.sh > b/tools/testing/selftests/mm/run_vmtests.sh > index 471e539d82b8..6a9f435be7a1 100755 > --- a/tools/testing/selftests/mm/run_vmtests.sh > +++ b/tools/testing/selftests/mm/run_vmtests.sh > @@ -326,7 +326,7 @@ CATEGORY="userfaultfd" run_test ${uffd_stress_bin} anon > 20 16 > # the size of the free pages we have, which is used for *each*. > # uffd-stress expects a region expressed in MiB, so we adjust > # half_ufd_size_MB accordingly. > -half_ufd_size_MB=$(((freepgs * hpgsize_KB) / 1024 / 2)) > +half_ufd_size_MB=$((((freepgs - 10) * hpgsize_KB) / 1024 / 2))
Why 10? I don't know much about uffd-stress but the comment at the top says it runs 3 threads per CPU, so does the number of potential races increase with the number of CPUs? Perhaps this number needs to be a function of nrcpu? I tested it and it works though so: Tested-by: Ryan Roberts <ryan.robe...@arm.com> > CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb > "$half_ufd_size_MB" 32 > CATEGORY="userfaultfd" run_test ${uffd_stress_bin} hugetlb-private > "$half_ufd_size_MB" 32 > CATEGORY="userfaultfd" run_test ${uffd_stress_bin} shmem 20 16