On 4/28/2026 3:02 PM, Lorenzo Stoakes wrote:
> On Mon, Apr 27, 2026 at 09:43:30AM +0000, fujunjie wrote:
>> process_madvise() used to validate the advice while walking each
>> imported iovec. If the vector has zero total length, vector_madvise()
>> does not enter the loop and can return success without checking whether
>> the advice value is valid.
>>
>> For a local mm, such as process_madvise(PIDFD_SELF, ...), the remote-only
>> process_madvise_remote_valid() check is skipped. As a result, an invalid
>> advice can be reported as success when the vector has zero total length.
>> This differs from madvise(), which rejects an invalid advice before
>> returning success for a zero-length range.
> 
> Oops! :) Thanks for taking a look at this.
> 
>>
>> Validate the generic madvise behavior at the syscall-facing entry points
>> before any vector walk. In process_madvise(), do this before the
>> remote-only advice restriction so unsupported advice is rejected with the
>> same priority for local and remote mm. Then keep the per-range helper
>> focused on address/length validation, avoiding repeated behavior checks
>> for every iovec.
> 
> The whole thing is a little bit of a mess to be honest I think we could clean
> this up a bit more, see below (+ attached patch).
> 
>>
>> Valid zero-length requests remain no-ops and continue to return 0. Add a
> 
> NIT: 'no-ops' -> 'a noop'. What is a valid zero-length request? Surely it's
> never valid?
> 
>> selftest that covers invalid advice with a zero-length iovec and an empty
>> vector, while also checking that a valid zero-length request still
>> succeeds.
> 
> Thanks appreciate you adding a self-test!
> 
>>
>> Fixes: 021781b01275 ("mm/madvise: unrestrict process_madvise() for current 
>> process")
>> Signed-off-by: fujunjie <[email protected]>
>> ---
>> v2:
>> - Validate behavior at the syscall-facing entry points and leave the range
>>   helper for address/length checks, avoiding repeated behavior checks in the
>>   iovec loop.
>> - Put the generic process_madvise() behavior check before
>>   process_madvise_remote_valid(), as suggested by David.
>> - Keep the zero-length selftest coverage from v1.
>>
>> Testing:
>> Built bzImage and tools/testing/selftests/mm/process_madv. In QEMU, the
>> process_madv selftest reports 7/7 passed.
>>
>>  mm/madvise.c                              | 29 ++++++++++++++++-------------
>>  tools/testing/selftests/mm/process_madv.c | 29 +++++++++++++++++++++++++++++
>>  2 files changed, 45 insertions(+), 13 deletions(-)
>>
>> diff --git a/mm/madvise.c b/mm/madvise.c
>> index 69708e953cf56..ce238dd96f158 100644
>> --- a/mm/madvise.c
>> +++ b/mm/madvise.c
>> @@ -1834,13 +1834,10 @@ static void madvise_finish_tlb(struct 
>> madvise_behavior *madv_behavior)
>>              tlb_finish_mmu(madv_behavior->tlb);
>>  }
>>
>> -static bool is_valid_madvise(unsigned long start, size_t len_in, int 
>> behavior)
>> +static bool is_valid_madvise_range(unsigned long start, size_t len_in)
>>  {
>>      size_t len;
>>
>> -    if (!madvise_behavior_valid(behavior))
>> -            return false;
>> -
> 
> So taking this out makes sense, but I think we're now in a bit of a confused
> state where is_valid_madvise_range() checks:
> 
> * Whether start is page-aligned.
> * Whether len_in was (size_t)(small neg) aligned up -> 0
> * Whether start + len overflows
> 
> Returning a boolean
> 
> And madvise_should_skip():
> 
> * Calls is_valid_madvise_range()
> * Checks if range is empty
> 
> Returning a boolean, setting output variable *err to error if the former
> failed.
> 
> We also hamfistedly check "start + PAGE_ALIGN(len_in) == start" which, unless
> I'm much mistaken, is equivalent to !len_in, except for the small negative
> aligning to zero case which we already checked.
> 
> And to make it all more fun, both functions return opposite booleans :))
> 
> So I think we should put this all into one function while we're here, get rid 
> of
> the confusing output variable, return an error code, and have the callers
> manually check for !len_in also (compilers will optimise this into something
> sensible).
> 
> This makes it clear at the call sites we skip empty ranges, cleans stuff up 
> and
> makes clear what we check the input range for.
> 
> I've attached a patch that you can apply on top of yours to make it clear 
> what I
> mean here (no need for attribution!)
> 
> I checked and your test passes! :)
> 
> Cheers, Lorenzo
> 
>>      if (!PAGE_ALIGNED(start))
>>              return false;
>>      len = PAGE_ALIGN(len_in);
>> @@ -1859,17 +1856,15 @@ static bool is_valid_madvise(unsigned long start, 
>> size_t len_in, int behavior)
>>   * madvise_should_skip() - Return if the request is invalid or nothing.
>>   * @start:  Start address of madvise-requested address range.
>>   * @len_in: Length of madvise-requested address range.
>> - * @behavior:       Requested madvise behavior.
>>   * @err:    Pointer to store an error code from the check.
>>   *
>> - * If the specified behaviour is invalid or nothing would occur, we skip the
>> - * operation.  This function returns true in the cases, otherwise false.  In
>> - * the former case we store an error on @err.
>> + * If the specified range is invalid or nothing would occur, we skip the
>> + * operation.  This function returns true in these cases, otherwise false.  
>> In
>> + * the former case we store an error in @err.
> 
> OK looks good, thanks for fixing the typo :)
> 
>>   */
>> -static bool madvise_should_skip(unsigned long start, size_t len_in,
>> -            int behavior, int *err)
>> +static bool madvise_should_skip(unsigned long start, size_t len_in, int 
>> *err)
>>  {
>> -    if (!is_valid_madvise(start, len_in, behavior)) {
>> +    if (!is_valid_madvise_range(start, len_in)) {
>>              *err = -EINVAL;
>>              return true;
>>      }
>> @@ -2013,7 +2008,10 @@ int do_madvise(struct mm_struct *mm, unsigned long 
>> start, size_t len_in, int beh
>>              .tlb = &tlb,
>>      };
>>
>> -    if (madvise_should_skip(start, len_in, behavior, &error))
>> +    if (!madvise_behavior_valid(behavior))
>> +            return -EINVAL;
>> +
>> +    if (madvise_should_skip(start, len_in, &error))
>>              return error;
>>      error = madvise_lock(&madv_behavior);
>>      if (error)
>> @@ -2056,7 +2054,7 @@ static ssize_t vector_madvise(struct mm_struct *mm, 
>> struct iov_iter *iter,
>>              size_t len_in = iter_iov_len(iter);
>>              int error;
>>
>> -            if (madvise_should_skip(start, len_in, behavior, &error))
>> +            if (madvise_should_skip(start, len_in, &error))
>>                      ret = error;
>>              else
>>                      ret = madvise_do_behavior(start, len_in, 
>> &madv_behavior);
>> @@ -2131,6 +2129,11 @@ SYSCALL_DEFINE5(process_madvise, int, pidfd, const 
>> struct iovec __user *, vec,
>>              goto release_task;
>>      }
>>
>> +    if (!madvise_behavior_valid(behavior)) {
>> +            ret = -EINVAL;
>> +            goto release_mm;
>> +    }
>> +
>>      /*
>>       * We need only perform this check if we are attempting to manipulate a
>>       * remote process's address space.
>> diff --git a/tools/testing/selftests/mm/process_madv.c 
>> b/tools/testing/selftests/mm/process_madv.c
>> index cd4610baf5d7d..9a7e2788fcc50 100644
>> --- a/tools/testing/selftests/mm/process_madv.c
>> +++ b/tools/testing/selftests/mm/process_madv.c
>> @@ -309,6 +309,35 @@ TEST_F(process_madvise, invalid_vlen)
>>      ASSERT_EQ(munmap(map, pagesize), 0);
>>  }
>>
>> +/*
>> + * Test that invalid advice is rejected even when the iovec has zero total
>> + * length. A zero-length advice is a no-op for valid advice, but invalid
>> + * advice should still fail with EINVAL.
>> + */
>> +TEST_F(process_madvise, invalid_advice_zero_length)
>> +{
>> +    struct iovec vec = {
>> +            .iov_base = NULL,
>> +            .iov_len = 0,
>> +    };
>> +    int pidfd = self->pidfd;
>> +    ssize_t ret;
>> +
>> +    errno = 0;
>> +    ret = sys_process_madvise(pidfd, &vec, 1, -1, 0);
>> +    ASSERT_EQ(ret, -1);
>> +    ASSERT_EQ(errno, EINVAL);
>> +
>> +    errno = 0;
>> +    ret = sys_process_madvise(pidfd, &vec, 1, MADV_DONTNEED, 0);
>> +    ASSERT_EQ(ret, 0);
>> +
>> +    errno = 0;
>> +    ret = sys_process_madvise(pidfd, NULL, 0, -1, 0);
>> +    ASSERT_EQ(ret, -1);
>> +    ASSERT_EQ(errno, EINVAL);
>> +}
>> +
>>  /*
>>   * Test process_madvise() with an invalid flag value. Currently, only a flag
>>   * value of 0 is supported. This test is reserved for the future, e.g., if
>> base-commit: 1b55f8358e35a67bf3969339ea7b86988af92f66
>> --
>> 2.34.1
>>
> 
> ----8<----
> From 55ea8f619b9772d8c517e2dd15d7bb7558ce5da2 Mon Sep 17 00:00:00 2001
> From: Lorenzo Stoakes <[email protected]>
> Date: Tue, 28 Apr 2026 07:48:30 +0100
> Subject: [PATCH] fixups
> 
> Signed-off-by: Lorenzo Stoakes <[email protected]>
> ---
>  mm/madvise.c | 55 +++++++++++++++++++++-------------------------------
>  1 file changed, 22 insertions(+), 33 deletions(-)
> 
> diff --git a/mm/madvise.c b/mm/madvise.c
> index ce238dd96f15..865fe7fb3d81 100644
> --- a/mm/madvise.c
> +++ b/mm/madvise.c
> @@ -1834,45 +1834,31 @@ static void madvise_finish_tlb(struct 
> madvise_behavior *madv_behavior)
>               tlb_finish_mmu(madv_behavior->tlb);
>  }
> 
> -static bool is_valid_madvise_range(unsigned long start, size_t len_in)
> +/**
> + * check_input_range() - Check if the requested range is valid.
> + * @start:   Start address of madvise-requested address range.
> + * @len_in:  Length of madvise-requested address range.
> + *
> + * Returns: 0 if the input range is valid, otherwise an error code.
> + */
> +static int check_input_range(unsigned long start, size_t len_in)
>  {
>       size_t len;
> 
>       if (!PAGE_ALIGNED(start))
> -             return false;
> +             return -EINVAL;
> +
>       len = PAGE_ALIGN(len_in);
> 
>       /* Check to see whether len was rounded up from small -ve to zero */
>       if (len_in && !len)
> -             return false;
> +             return -EINVAL;
> 
> +     /* Overflow? */
>       if (start + len < start)
> -             return false;
> -
> -     return true;
> -}
> +             return -EINVAL;
> 
> -/*
> - * madvise_should_skip() - Return if the request is invalid or nothing.
> - * @start:   Start address of madvise-requested address range.
> - * @len_in:  Length of madvise-requested address range.
> - * @err:     Pointer to store an error code from the check.
> - *
> - * If the specified range is invalid or nothing would occur, we skip the
> - * operation.  This function returns true in these cases, otherwise false.  
> In
> - * the former case we store an error in @err.
> - */
> -static bool madvise_should_skip(unsigned long start, size_t len_in, int *err)
> -{
> -     if (!is_valid_madvise_range(start, len_in)) {
> -             *err = -EINVAL;
> -             return true;
> -     }
> -     if (start + PAGE_ALIGN(len_in) == start) {
> -             *err = 0;
> -             return true;
> -     }
> -     return false;
> +     return 0;
>  }
> 
>  static bool is_madvise_populate(struct madvise_behavior *madv_behavior)
> @@ -2010,12 +1996,13 @@ int do_madvise(struct mm_struct *mm, unsigned long 
> start, size_t len_in, int beh
> 
>       if (!madvise_behavior_valid(behavior))
>               return -EINVAL;
> -
> -     if (madvise_should_skip(start, len_in, &error))
> +     error = check_input_range(start, len_in);
> +     if (error || !len_in)
>               return error;
>       error = madvise_lock(&madv_behavior);
>       if (error)
>               return error;
> +
>       madvise_init_tlb(&madv_behavior);
>       error = madvise_do_behavior(start, len_in, &madv_behavior);
>       madvise_finish_tlb(&madv_behavior);
> @@ -2054,10 +2041,12 @@ static ssize_t vector_madvise(struct mm_struct *mm, 
> struct iov_iter *iter,
>               size_t len_in = iter_iov_len(iter);
>               int error;
> 
> -             if (madvise_should_skip(start, len_in, &error))
> -                     ret = error;
> -             else
> +             error = check_input_range(start, len_in);
> +             if (len_in && !error)
>                       ret = madvise_do_behavior(start, len_in, 
> &madv_behavior);
> +             else
> +                     ret = error;
> +
>               /*
>                * An madvise operation is attempting to restart the syscall,
>                * but we cannot proceed as it would not be correct to repeat
> --
> 2.54.0

Thanks, Lorenzo. That looks clearer to me too.

Since this is already in mm-unstable, I'll wait for Andrew's preference.
If a respin is needed, I'll fold this in together with SJ's comments.

Best regards,
fujunjie


Reply via email to