On Fri, Jul 18, 2025 at 01:04:30PM +0200, David Hildenbrand wrote:
> >
> > Yeah sorry I was in 'what locks do we need' mode and hadn't shifted back 
> > here,
> > but I guess the intent is that the caller _must_ hold this lock.
> >
> > I know it's nitty and annoying (sorry!) but as asserting seems to not be a
> > possibility here, could we spell these out as a series of points like:
> >
> > /*
> >   * The caller MUST hold the following locks:
> >   *
> >   * - Leaf page table lock
> >   * - Appropriate VMA lock to keep VMA stable
> >   */
> >
> > I don't _actually_ think you need the rmap lock then, as none of the page 
> > tables
> > you access would be impacted by any rmap action afaict, with these locks 
> > held.
>
> I don't enjoy wrong comments ;)
>
> This can be called from rmap code when doing a vm_normal_page() while
> holding the PTL.

OK so this is just underlining the confusion here (not your fault! I mean in
general with page table code, really).

Would this possibly work better then?:

/*
 * The caller MUST prevent page table teardown (by holding mmap, vma or rmap 
lock)
 * and MUST hold the leaf page table lock.
 */

>
> Really, I think we are over-thinking a helper that is triggered in specific
> context when the world is about to collide.

Indeed, but I think it's important to be clear as to what is required for this
to work (even though, as I understand it, we're already in trouble and if page
tables are corrupted or something like this we may even NULL ptr deref anyway so
it's best effort).

>
> This is not your general-purpose API.
>
> Maybe I should have never added a comment. Maybe I should just not have done
> this patch, because I really don't want to do more than the bare minimum to
> print_bad_page_map().

No no David no :P come back to the light sir...

This is a good patch I don't mean to dissuade you, I just want to make things
clear in a confusing bit of the kernel as we in mm as usual make life hard for
ourselves...

I think the locking comment _is_ important, as for far too long in mm we've had
_implicit_ understanding of where the locks should be at any given time, which
constantly blows up underneath us.

I'd rather keep things as clear as possible so even the intellectually
challenged such as myself are subject to less confusion.

Anyway TL;DR if you do something similar to suggestion above it's all good. Just
needs clarification.

>
> Because I deeply detest it, and no comments we will add will change that.

So do I! *clinks glass* but yes, it's horrid. But there's value in improving
quality of horrid code and refactoring to the least-worst version.

And we can attack it in a more serious way later.

[snip]

> > > > > +static void print_bad_page_map(struct vm_area_struct *vma,
> > > > > +             unsigned long addr, unsigned long long entry, struct 
> > > > > page *page)
> > > > > +{
> > > > > +     struct address_space *mapping;
> > > > > +     pgoff_t index;
> > > > > +
> > > > > +     if (is_bad_page_map_ratelimited())
> > > > > +             return;
> > > > >
> > > > >       mapping = vma->vm_file ? vma->vm_file->f_mapping : NULL;
> > > > >       index = linear_page_index(vma, addr);
> > > > >
> > > > > -     pr_alert("BUG: Bad page map in process %s  pte:%08llx 
> > > > > pmd:%08llx\n",
> > > > > -              current->comm,
> > > > > -              (long long)pte_val(pte), (long long)pmd_val(*pmd));
> > > > > +     pr_alert("BUG: Bad page map in process %s  entry:%08llx", 
> > > > > current->comm, entry);
> > > >
> > > > Sort of wonder if this is even useful if you don't know what the 'entry'
> > > > is? But I guess the dump below will tell you.
> > >
> > > You probably missed in the patch description:
> > >
> > > "Whether it is a PTE or something else will usually become obvious from 
> > > the
> > > page table dump or from the dumped stack. If ever required in the future, 
> > > we
> > > could pass the entry level type similar to "enum rmap_level". For now, 
> > > let's
> > > keep it simple."
> >
> > Yeah sorry I glossed over the commit msg, and now I pay for it ;) OK this
> > is fine then.
>
> Let me play with indicating the page table level, but it's the kind of stuff
> I wouldn't want to do in this series here.

Sure understood. I don't mean to hold this up with nits. Am happy to be
flexible, just thinking out loud generally.

>
> > >
> > > >
> > > > Then we have VM_IO, which strictly must not have an associated page 
> > > > right?
> > >
> > > VM_IO just means read/write side-effects, I think you could have ones with
> > > an memmap easily ... e.g., memory section (128MiB) spanning both memory 
> > > and
> > > MMIO regions.
> >
> > Hmm, but why not have two separate VMAs? I guess I need to look into more
> > what this flag actually effects.
>
> Oh, I meant, that we might have a "struct page" for MMIO memory (pfn_valid()
> == true).
>
> In a MIXEDMAP that will get refcounted. Not sure if there are users that use
> VM_IO in a MIXEDMAP, I would assume so but didn't check.
>
> So VM_IO doesn't really interact with vm_normal_page(), really. It's all
> about PFNMAP and MIXEDMAP.

Thanks, yeah am thinking more broadly about VM_SPECIAL here. May go off and do
some work on that... VM_UNMERGEABLE might be better.

>
> --
> Cheers,
>
> David / dhildenb
>

Cheers, Lorenzo

Reply via email to