On Wed, Dec 23, 2020 at 10:50 AM Jakub Kicinski <k...@kernel.org> wrote:
>
> On Wed, 23 Dec 2020 02:21:09 -0600 Lijun Pan wrote:
> > On Tue, Dec 22, 2020 at 8:48 PM Jakub Kicinski <k...@kernel.org> wrote:
> > > On Sat, 19 Dec 2020 15:40:34 -0600 Lijun Pan wrote:
> > > > Commit f9c6cea0b385 ("ibmvnic: Skip fatal error reset after passive 
> > > > init")
> > > > says "If the passive
> > > > CRQ initialization occurs before the FATAL reset task is processed,
> > > > the FATAL error reset task would try to access a CRQ message queue
> > > > that was freed, causing an oops. The problem may be most likely to
> > > > occur during DLPAR add vNIC with a non-default MTU, because the DLPAR
> > > > process will automatically issue a change MTU request.
> > > > Fix this by not processing fatal error reset if CRQ is passively
> > > > initialized after client-driven CRQ initialization fails."
> > > >
> > > > Even with this commit, we still see similar kernel crashes. In order
> > > > to completely solve this problem, we'd better continue the fatal error
> > > > reset, capture the kernel crash, and try to fix it from that end.
> > >
> > > This basically reverts the quoted fix. Does the quoted fix make things
> > > worse? Otherwise we should leave the code be until proper fix is found.
> >
> > Yes, I think the quoted commit makes things worse. It skips the specific
> > reset condition, but that does not fix the problem it claims to fix.
>
> Okay, let's make sure the commit message explains how it makes things
> worse.

I will reword the commit message.

>
> > The effective fix is upstream SHA 0e435befaea4 and a0faaa27c716. So I
> > think reverting it to the original "else" condition is the right thing to 
> > do.
>
> Hm. So the problem is fixed? But the commit message says "we still see
> similar kernel crashes", that's present tense suggesting that crashes
> are seen on current net/master. Are you saying that's not the case and
> after 0e435befaea4 and a0faaa27c716 there are no more crashes?

This patch was formed before I submitted 0e435befaea4 and a0faaa27c716, so
I used the wording "we still see similar kernel crashes". I will modify
the commit message before I submit v2 of this patch.
After 0e435befaea4 and a0faaa27c716, I don't see any crashes as described
in this quoted commit even without this quoted commit.
That's why I am sure this quoted commit does not fix the described problem
and I want to revert it.

Reply via email to