On 6/24/25 16:41, Christoph Berg wrote:
> Re: Bertrand Drouvot
>> Yes, I think compat_uptr_t usage is missing in do_pages_stat() (while it's 
>> used
>> in do_pages_move()).
> 
> I was also reading the kernel source around that place but you spotted
> the problem before me. This patch resolves the issue here:
> 
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 8cf0f9c9599..542c81ec3ed 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -2444,7 +2444,13 @@ static int do_pages_stat(struct mm_struct *mm, 
> unsigned long nr_pages,
>               if (copy_to_user(status, chunk_status, chunk_nr * 
> sizeof(*status)))
>                       break;
> 
> -             pages += chunk_nr;
> +             if (in_compat_syscall()) {
> +                     compat_uptr_t __user *pages32 = (compat_uptr_t __user 
> *)pages;
> +
> +                     pages32 += chunk_nr;
> +                     pages = (const void __user * __user *) pages32;
> +             } else
> +                     pages += chunk_nr;
>               status += chunk_nr;
>               nr_pages -= chunk_nr;
>       }
> 
> 
>> Having a chunk size <= DO_PAGES_STAT_CHUNK_NR ensures we are not affected
>> by the wrong pointer arithmetic.
> 
> Good idea. Buggy kernels will be around for some time.
> 

If it's a reliable fix, then I guess we can do it like this. But won't
that be a performance penalty on everyone? Or does the system split the
array into 16-element chunks anyway, so this makes no difference?

Anyway, maybe we should start by reporting this to the kernel people. Do
you want me to do that, or shall one of you take care of that? I suppose
that'd be better, as you already wrote a fix / know the code better.

>> +               #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= 
>> DO_PAGES_STAT_CHUNK_NR (do_pages_stat())*/
>> +
>> +               for (uint64 chunk_start = 0; chunk_start < 
>> shm_ent_page_count; chunk_start += NUMA_QUERY_CHUNK_SIZE) {
> 
> Perhaps optimize it to this:
> 
> #if sizeof(void *) == 4
> #define NUMA_QUERY_CHUNK_SIZE 16  /* has to be <= DO_PAGES_STAT_CHUNK_NR 
> (do_pages_stat())*/
> #else
> #define NUMA_QUERY_CHUNK_SIZE 1024
> #endif
> 
> ... or some other bigger number.
> 

Hmm, maybe. I guess that'd hurt only fully 32-bit systems, but that also
seems like a non-issue.

> The loop could also include CHECK_FOR_INTERRUPTS();
> 
>>> I don't have much
>>> experience with the kernel code, so don't want to rely too much on my
>>> interpretation of it.
>>
>> I don't have that much experience too but I think the issue is in 
>> do_pages_stat()
>> and that "pages += chunk_nr" should be advanced by sizeof(compat_uptr_t) 
>> instead.
> 
> Me neither, but I'll try submit this fix.
> 

+1

Thanks to both of you for the report and the investigation.


regards

-- 
Tomas Vondra



Reply via email to