On 12/11/20 at 04:16pm, Rahul Gopakumar wrote: > Hi Baoquan, > > We re-evaluated your last patch and it seems to be fixing the > initial performance bug reported. During our previous testing, > we did not apply the patch rightly hence it was reporting > some issues. > > Here is the dmesg log confirming no delay in the draft patch. > > Vanilla (5.10 rc3) > ------------------ > > [ 0.024011] On node 2 totalpages: 89391104 > [ 0.024012] Normal zone: 1445888 pages used for memmap > [ 0.024012] Normal zone: 89391104 pages, LIFO batch:63 > [ 2.054646] ACPI: PM-Timer IO Port: 0x448 --------------> 2 secs delay > > Patch > ------ > > [ 0.024166] On node 2 totalpages: 89391104 > [ 0.024167] Normal zone: 1445888 pages used for memmap > [ 0.024167] Normal zone: 89391104 pages, LIFO batch:63 > [ 0.026694] ACPI: PM-Timer IO Port: 0x448 --------------> No delay > > Attached dmesg logs. Let me know if anything is needed from our end.
I posted formal patchset to fix this issue. The patch 1 is doing the fix, and almost the same as the draft v2 patch I attached in this thread. Please feel free to help test and add your Tested-by: tag in the patch thread if possible. > > > > From: Rahul Gopakumar <gopakum...@vmware.com> > Sent: 24 November 2020 8:33 PM > To: b...@redhat.com <b...@redhat.com> > Cc: linux...@kvack.org <linux...@kvack.org>; linux-kernel@vger.kernel.org > <linux-kernel@vger.kernel.org>; a...@linux-foundation.org > <a...@linux-foundation.org>; natechancel...@gmail.com > <natechancel...@gmail.com>; ndesaulni...@google.com > <ndesaulni...@google.com>; clang-built-li...@googlegroups.com > <clang-built-li...@googlegroups.com>; rost...@goodmis.org > <rost...@goodmis.org>; Rajender M <ma...@vmware.com>; Yiu Cho Lau > <lauyi...@vmware.com>; Peter Jonasson <pjonas...@vmware.com>; Venkatesh > Rajaram <rajar...@vmware.com> > Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel > > Hi Baoquan, > > We applied the new patch to 5.10 rc3 and tested it. We are still > observing the same page corruption issue which we saw with the > old patch. This is causing 3 secs delay in boot time. > > Attached dmesg log from the new patch and also from vanilla > 5.10 rc3 kernel. > > There are multiple lines like below in the dmesg log of the > new patch. > > "BUG: Bad page state in process swapper pfn:ab08001" > > ________________________________________ > From: b...@redhat.com <b...@redhat.com> > Sent: 22 November 2020 6:38 AM > To: Rahul Gopakumar > Cc: linux...@kvack.org; linux-kernel@vger.kernel.org; > a...@linux-foundation.org; natechancel...@gmail.com; ndesaulni...@google.com; > clang-built-li...@googlegroups.com; rost...@goodmis.org; Rajender M; Yiu Cho > Lau; Peter Jonasson; Venkatesh Rajaram > Subject: Re: Performance regressions in "boot_time" tests in Linux 5.8 Kernel > > On 11/20/20 at 03:11am, Rahul Gopakumar wrote: > > Hi Baoquan, > > > > To which commit should we apply the draft patch. We tried applying > > the patch to the commit 3e4fb4346c781068610d03c12b16c0cfb0fd24a3 > > (the one we used for applying the previous patch) but it fails. > > I tested on 5.10-rc3+. You can append below change to the old patch in > your testing kernel. > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index fa6076e1a840..5e5b74e88d69 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -448,6 +448,8 @@ defer_init(int nid, unsigned long pfn, unsigned long > end_pfn) > if (end_pfn < pgdat_end_pfn(NODE_DATA(nid))) > return false; > > + if (NODE_DATA(nid)->first_deferred_pfn != ULONG_MAX) > + return true; > /* > * We start only with one section of pages, more pages are added as > * needed until the rest of deferred pages are initialized.