On Mon, Jun 08, 2026 at 04:34:06AM -0400, Michael S. Tsirkin wrote:
> The NUMA interleave index was computed as two separate terms:
>
>     *ilx += vma->vm_pgoff >> order;
>     *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
>
> This has two problems:
>
> 1. When vm_start is not aligned to the folio size, the
>    subtraction before the shift lets low bits affect the
>    result via borrows.

This feels really vague. Do you have examples of where the calculation has
been impacted like this?

How is this a problem that affects things in practice?

>
> 2. For file-backed VMAs, shifting vm_pgoff and the VMA
>    offset independently loses carries between them, giving
>    wrong chunk indices when vm_pgoff is not aligned to order.

Similar comments to the above. An example would be helpful here.

>
> Combine into a single expression that adds vm_pgoff and

Combining this kind of thing into a single expression is the complete
opposite of what we want to do when refactoring code. No thanks.

> the page-granularity VMA offset first, then shifts once:
>
>     *ilx += (vma->vm_pgoff +
>             (addr >> PAGE_SHIFT) -
>             (vma->vm_start >> PAGE_SHIFT)) >> order;
>
> For anonymous VMAs, vm_pgoff equals vm_start >> PAGE_SHIFT,

This is completely incorrect.

For anonymous VMAs:

vm_pgoff = vm_start_at_first_fault >> PAGE_SHIFT.

So if you remap a _faulted_ VMA, the vm_start changes, vm_pgoff does
not. The two terms can be COMPLETELY independent.

> so the vm_pgoff and vm_start terms cancel and the result

No, they do not.

> reduces to addr >> (PAGE_SHIFT + order), same as before.

No, it doesn't.

>
> For file-backed VMAs, the sum vm_pgoff + (addr >> PAGE_SHIFT)
> - (vm_start >> PAGE_SHIFT) gives the file page offset of addr.

That's the page offset in a VMA for both anon and file-backed?

(addr - vm_start) >> PAGE_SHIFT is the page offset into a VMA (canonically
determined by linear_page_index())

> Shifting by order gives the correct file chunk index.
>
> Signed-off-by: Michael S. Tsirkin <[email protected]>

You're claiming we have an incorrect calculation here, but are not
providing a Fixes patch or Cc: stable or sending this separately as a fix?

> Assisted-by: Claude:claude-opus-4-6
> Reviewed-by: Gregory Price <[email protected]>
> ---
>  mm/mempolicy.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
>
> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 4e4421b22b59..d139b074a599 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -2048,8 +2048,9 @@ struct mempolicy *get_vma_policy(struct vm_area_struct 
> *vma,
>               pol = get_task_policy(current);
>       if (pol->mode == MPOL_INTERLEAVE ||
>           pol->mode == MPOL_WEIGHTED_INTERLEAVE) {
> -             *ilx += vma->vm_pgoff >> order;
> -             *ilx += (addr - vma->vm_start) >> (PAGE_SHIFT + order);
> +             *ilx += (vma->vm_pgoff +
> +                     (addr >> PAGE_SHIFT) -
> +                     (vma->vm_start >> PAGE_SHIFT)) >> order;

This is horrible code. Not only are you doing everything in a single
expression for some reason, you're also making the parens confusing and not
explaining what you're doing here at all.

The code before was at least tractable, this is objectively making it
worse.

And anyway, the canonical way to find the page offset into a VMA is
linear_page_index():

static inline pgoff_t linear_page_index(const struct vm_area_struct *vma,
                                        const unsigned long address)
{
        pgoff_t pgoff;
        pgoff = (address - vma->vm_start) >> PAGE_SHIFT;
        pgoff += vma->vm_pgoff;
        return pgoff;
}

So isn't a far better solution therefore:

        const pgoff_t index = linear_page_index(vma, addr);

        *ilx += index >> order;

Which has the benefit of being readable, uses the canonical method for
determining page offset in the VMA + eliminates the open-coded stuff?

>       }
>       return pol;
>  }
> --
> MST
>

Thanks, Lorenzo

Reply via email to