Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Andi Kleen
> - Quickly identify all user processes mapping a given page. Can be done with /proc/*/pagemap today. Maybe it's not "quick" because it won't use the rmap chains, but is that a serious issue? > - Determine if and where the kernel maps the page, which is also > important given the opportunity to r

Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Yosry Ahmed
On Wed, Nov 20, 2024 at 8:14 AM Pasha Tatashin wrote: > > On Tue, Nov 19, 2024 at 2:36 PM Yosry Ahmed wrote: > > > > On Tue, Nov 19, 2024 at 11:30 AM Pasha Tatashin > > wrote: > > > > > > On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin > > > wrote: > > > > > > > > On Tue, Nov 19, 2024 at 10:08:

Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Pasha Tatashin
> > /* Use static buffer, for the caller is holding oom_lock. */ > > static char buf[PAGE_SIZE]; > > > > seq_buf_init(&s, buf, sizeof(buf)); > > memory_stat_format(memcg, &s); > > seq_buf_do_printk(&s, KERN_INFO); > > } > > > > This is a callosal

Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Pasha Tatashin
On Wed, Nov 20, 2024 at 10:29 AM Andi Kleen wrote: > > Pasha Tatashin writes: > > > Page Detective is a new kernel debugging tool that provides detailed > > information about the usage and mapping of physical memory pages. > > > > It is often known that a particular page is corrupted, but it is h

Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Pasha Tatashin
On Tue, Nov 19, 2024 at 2:36 PM Yosry Ahmed wrote: > > On Tue, Nov 19, 2024 at 11:30 AM Pasha Tatashin > wrote: > > > > On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin > > wrote: > > > > > > On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > > > > On Mon, Nov 18, 2024 at 8:09 PM G

Re: [RFCv1 0/6] Page Detective

2024-11-20 Thread Andi Kleen
Pasha Tatashin writes: > Page Detective is a new kernel debugging tool that provides detailed > information about the usage and mapping of physical memory pages. > > It is often known that a particular page is corrupted, but it is hard to > extract more information about such a page from live sys

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Roman Gushchin
On Tue, Nov 19, 2024 at 11:35:47AM -0800, Yosry Ahmed wrote: > On Tue, Nov 19, 2024 at 11:30 AM Pasha Tatashin > wrote: > > > > On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin > > wrote: > > > > > > On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > > > > On Mon, Nov 18, 2024 at 8:

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Yosry Ahmed
On Tue, Nov 19, 2024 at 11:30 AM Pasha Tatashin wrote: > > On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin > wrote: > > > > On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > > > On Mon, Nov 18, 2024 at 8:09 PM Greg KH > > > wrote: > > > > > > > > On Mon, Nov 18, 2024 at 05:08:42

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Pasha Tatashin
On Tue, Nov 19, 2024 at 1:23 PM Roman Gushchin wrote: > > On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > > On Mon, Nov 18, 2024 at 8:09 PM Greg KH wrote: > > > > > > On Mon, Nov 18, 2024 at 05:08:42PM -0500, Pasha Tatashin wrote: > > > > Additionally, using crash/drgn is not fe

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Roman Gushchin
On Tue, Nov 19, 2024 at 10:08:36AM -0500, Pasha Tatashin wrote: > On Mon, Nov 18, 2024 at 8:09 PM Greg KH wrote: > > > > On Mon, Nov 18, 2024 at 05:08:42PM -0500, Pasha Tatashin wrote: > > > Additionally, using crash/drgn is not feasible for us at this time, it > > > requires keeping external tool

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Matthew Wilcox
On Tue, Nov 19, 2024 at 01:52:00PM +0100, Jann Horn wrote: > > I will take reference, as we already do that for memcg purpose, but > > have not included dump_page(). > > Note that taking a reference on the page does not make all of > dump_page() fine; in particular, my understanding is that > foli

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Jann Horn
On Tue, Nov 19, 2024 at 4:14 PM Pasha Tatashin wrote: > On Tue, Nov 19, 2024 at 7:52 AM Jann Horn wrote: > > On Tue, Nov 19, 2024 at 2:30 AM Pasha Tatashin > > wrote: > > > > Can you point me to where a refcounted reference to the page comes > > > > from when page_detective_metadata() calls dump

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Pasha Tatashin
On Mon, Nov 18, 2024 at 8:09 PM Greg KH wrote: > > On Mon, Nov 18, 2024 at 05:08:42PM -0500, Pasha Tatashin wrote: > > Additionally, using crash/drgn is not feasible for us at this time, it > > requires keeping external tools on our hosts, also it requires > > approval and a security review for ea

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Pasha Tatashin
On Tue, Nov 19, 2024 at 7:52 AM Jann Horn wrote: > > On Tue, Nov 19, 2024 at 2:30 AM Pasha Tatashin > wrote: > > > Can you point me to where a refcounted reference to the page comes > > > from when page_detective_metadata() calls dump_page_lvl()? > > > > I am sorry, I remembered incorrectly, we a

Re: [RFCv1 0/6] Page Detective

2024-11-19 Thread Jann Horn
On Tue, Nov 19, 2024 at 2:30 AM Pasha Tatashin wrote: > > Can you point me to where a refcounted reference to the page comes > > from when page_detective_metadata() calls dump_page_lvl()? > > I am sorry, I remembered incorrectly, we are getting reference right > after dump_page_lvl() in page_detec

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Pasha Tatashin
> Can you point me to where a refcounted reference to the page comes > from when page_detective_metadata() calls dump_page_lvl()? I am sorry, I remembered incorrectly, we are getting reference right after dump_page_lvl() in page_detective_memcg() -> folio_try_get(); I will move the folio_try_get()

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Greg KH
On Mon, Nov 18, 2024 at 05:08:42PM -0500, Pasha Tatashin wrote: > Additionally, using crash/drgn is not feasible for us at this time, it > requires keeping external tools on our hosts, also it requires > approval and a security review for each script before deployment in > our fleet. So it's ok to

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Jann Horn
On Mon, Nov 18, 2024 at 11:24 PM Pasha Tatashin wrote: > On Mon, Nov 18, 2024 at 7:54 AM Jann Horn wrote: > > > > On Mon, Nov 18, 2024 at 12:17 PM Lorenzo Stoakes > > wrote: > > > On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > > > > It operates through the Linux debugfs interf

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Pasha Tatashin
On Mon, Nov 18, 2024 at 7:54 AM Jann Horn wrote: > > On Mon, Nov 18, 2024 at 12:17 PM Lorenzo Stoakes > wrote: > > On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > > > It operates through the Linux debugfs interface, with two files: "virt" > > > and "phys". > > > > > > The "virt"

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Pasha Tatashin
On Mon, Nov 18, 2024 at 2:11 PM Roman Gushchin wrote: > > On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > > Page Detective is a new kernel debugging tool that provides detailed > > information about the usage and mapping of physical memory pages. > > > > It is often known that a

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Roman Gushchin
On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > Page Detective is a new kernel debugging tool that provides detailed > information about the usage and mapping of physical memory pages. > > It is often known that a particular page is corrupted, but it is hard to > extract more inf

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Jann Horn
On Mon, Nov 18, 2024 at 12:17 PM Lorenzo Stoakes wrote: > On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > > It operates through the Linux debugfs interface, with two files: "virt" > > and "phys". > > > > The "virt" file takes a virtual address and PID and outputs information > >

Re: [RFCv1 0/6] Page Detective

2024-11-18 Thread Lorenzo Stoakes
On Sat, Nov 16, 2024 at 05:59:16PM +, Pasha Tatashin wrote: > Page Detective is a new kernel debugging tool that provides detailed > information about the usage and mapping of physical memory pages. > > It is often known that a particular page is corrupted, but it is hard to > extract more info

[RFCv1 0/6] Page Detective

2024-11-16 Thread Pasha Tatashin
Page Detective is a new kernel debugging tool that provides detailed information about the usage and mapping of physical memory pages. It is often known that a particular page is corrupted, but it is hard to extract more information about such a page from live system. Examples are: - Checksum fai