On Thu, Dec 10, 2020 at 09:16:24AM +0200, Martin Zaharinov wrote: > And one other > From other mailing I see you send patch to Denys Fedoryshchenko this patch is > : > > diff --git a/drivers/net/ppp/ppp_generic.c > b/drivers/net/ppp/ppp_generic.c > > index 255a5def56e9..2acf4b0eabd1 100644 > --- a/drivers/net/ppp/ppp_generic.c > +++ b/drivers/net/ppp/ppp_generic.c > @@ -3161,6 +3161,15 @@ ppp_connect_channel(struct channel *pch, int > unit) > > goto outl; > > ppp_lock(ppp); > + spin_lock_bh(>downl); > + if (!pch->chan) { > + /* Don't connect unregistered channels */ > + ppp_unlock(ppp); > + spin_unlock_bh(>downl); > + ret = -ENOTCONN; > + goto outl; > + } > + spin_unlock_bh(>downl); > if (pch->file.hdrlen > ppp->file.hdrlen) > ppp->file.hdrlen = pch->file.hdrlen; > hdrlen = pch->file.hdrlen + 2; /* for protocol bytes */
This was a quick untested patch that I sent to help debugging Denys' problem. It has a lock inversion problem that I fixed before I formally submitted it upstream. I even warned about it in the original thread: https://lore.kernel.org/netdev/20180302174328.gd1...@alphalink.fr/ > But in official stable kernel three In ppp_generic.c is this : > > spin_lock_bh(&pch->downl); > if (!pch->chan) { > /* Don't connect unregistered channels */ > spin_unlock_bh(&pch->downl); > ppp_unlock(ppp); > ret = -ENOTCONN; > goto outl; } > spin_unlock_bh(&pch->downl); This one is correct. > It is normal to unlock ppp after spin_unlock ? > shouldn't it be as you wrote it? > In your patch first : > > + ppp_unlock(ppp); > + spin_unlock_bh(>downl); No, nested locks have to be released in the reverse order they were acquired. > But in stable kernel is : > > spin_unlock_bh(&pch->downl); > ppp_unlock(ppp); This is correct, and has been correctly backported to 4.14-stable. > > On 9 Dec 2020, at 20:10, Guillaume Nault <gna...@redhat.com> wrote: > > > > On Wed, Dec 09, 2020 at 06:57:44PM +0200, Martin Zaharinov wrote: > >>> On 9 Dec 2020, at 18:40, Guillaume Nault <gna...@redhat.com> wrote: > >>> On Wed, Dec 09, 2020 at 04:47:52PM +0200, Martin Zaharinov wrote: > >>>> Hi All > >>>> > >>>> I have problem with latest kernel release > >>>> And the problem is base on this late problem : > >>>> > >>>> > >>>> https://www.mail-archive.com/search?l=netdev@vger.kernel.org&q=subject:%22Re%5C%3A+ppp%5C%2Fpppoe%2C+still+panic+4.15.3+in+ppp_push%22&o=newest&f=1 > >>>> > >>>> I have same problem in kernel 5.6 > now I use kernel 5.9.13 and have > >>>> same problem. > >>>> > >>>> > >>>> In kernel 5.9.13 now don’t have any crashes in dimes but in one moment > >>>> accel service stop with defunct and in log have many of this line : > >>>> > >>>> > >>>> error: vlan608: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> error: vlan617: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> error: vlan679: ioctl(PPPIOCCONNECT): Transport endpoint is not connected > >>>> > >>>> In one moment connected user bump double or triple and after that > >>>> service defunct and need wait to drop all session to start . > >>>> > >>>> I talk with accel-ppp team and they said this is kernel related problem > >>>> and to back to kernel 4.14 there is not this problem. > >>>> > >>>> Problem is come after kernel 4.15 > and not have solution to this moment. > >>> > >>> I'm sorry, I don't understand. > >>> Do you mean that v4.14 worked fine (no crash, no ioctl() error)? > >>> Did the problem start appearing in v4.15? Or did v4.15 work and the > >>> problem appeared in v4.16? > >> > >> In Telegram group I talk with Sergey and Dimka and told my the problem is > >> come after changes from 4.14 to 4.15 > >> Sergey write this : "as I know, there was a similar issue in kernel 4.15 > >> so maybe it is still not fixed" > > > > Ok, but what is your experience? Do you have a kernel version where > > accel-ppp reports no ioctl() error and doesn't crash the kernel? > > > > There wasn't a lot of changes between 4.14 and 4.15 for PPP. > > The only PPP patch I can see that might have been risky is commit > > 0171c4183559 ("ppp: unlock all_ppp_mutex before registering device"). > > > >> I don’t have options to test with this old kernel 4.14.xxx i don’t have > >> support for them. > >> > >> > >>> > >>>> Please help to find the problem. > >>>> > >>>> Last time in link I see is make changes in ppp_generic.c > >>>> > >>>> ppp_lock(ppp); > >>>> spin_lock_bh(&pch->downl); > >>>> if (!pch->chan) { > >>>> /* Don't connect unregistered channels */ > >>>> spin_unlock_bh(&pch->downl); > >>>> ppp_unlock(ppp); > >>>> ret = -ENOTCONN; > >>>> goto outl; > >>>> } > >>>> spin_unlock_bh(&pch->downl); > >>>> > >>>> > >>>> But this fix only to don’t display error and freeze system > >>>> The problem is stay and is to big. > >>> > >>> Do you use accel-ppp's unit-cache option? Does the problem go away if > >>> you stop using it? > >>> > >> > >> No I don’t use unit-cache , if I set unit-cache accel-ppp defunct same but > >> user Is connect and disconnet more fast. > >> > >> The problem is same with unit and without . > >> Only after this patch I don’t see error in dimes but this is not solution. > > > > Soryy, what's "in dimes"? > > Do you mean that reverting commit 77f840e3e5f0 ("ppp: prevent > > unregistered channels from connecting to PPP units") fixes your problem? > > > >> In network have customer what have power cut problem, when drop 600 user > >> and back Is normal but in this moment kernel is locking and start to make > >> this : > >> sessions: > >> starting: 4235 > >> active: 3882 > >> finishing: 378 > >> The problem is starting session is not real user normal user in this > >> server is ~4k customers . > > > > What type of session is it? L2TP, PPPoE, PPTP? > > > >> I use pppd_compat . > >> > >> Any idea ? > >> > >>>> > >>>> Please help to fix. > >> Martin >