[Adding Simon to the discussion] On Mon, Nov 30, 2015 at 04:03:37PM +0100, Guillaume Nault wrote: > On Mon, Nov 30, 2015 at 12:05:13AM +0200, Andrew wrote: > > 26.11.2015 18:44, Guillaume Nault пишет: > > >On Wed, Nov 25, 2015 at 04:58:54PM +0200, Andrew wrote: > > >>25.11.2015 16:10, Guillaume Nault пишет: > > >>>On Wed, Nov 25, 2015 at 12:59:52AM +0200, Andrew wrote: > > >>>>Hi. > > >>>> > > >>>>I tried to reproduce errors in virtual environment (some VMs on my > > >>>>notebook). > > >>>> > > >>>>I've tried to create 1000 client PPPoE sessions from this box via > > >>>>script: > > >>>>for i in `seq 1 1000`; do pppd plugin rp-pppoe.so user test password > > >>>>test > > >>>>nodefaultroute maxfail 0 persist nodefaultroute holdoff 1 noauth eth0; > > >>>>done > > >>>> > > >>>I've tried to reproduce the bug with your script, but couldn't get > > >>>anything to crash (VM is Debian Jessie i386 running on KVM with upstream > > >>>kernel 4.1.12). Does the crash happen before all sessions get > > >>>established? > > >>Yes, crash happens even before all daemon instances are started. Sessions > > >>don't get established because BRAS configured to reject sessions (so a lot > > >>of concurrent connection retries happens) - I still didn't created account > > >>for test user on it. > > >> > > >Ok, I got the crash too. In fact I had misunderstood your previous > > >message, crash happens when PPP sessions don't get established > > >(authentication failures in my case). > > > > > >I'll investigate on that and let you know. > > > > It seems like bug appears on mass ppp devices removing (I planned to use > > this test environment to reproduce BRAS periodical crashes, but suddenly > > I've got crashes on test client). > > > > I've checked it with some kernels - it's present in 4.3.0, but it isn't > > present in 3.10.57. I'll try to build 3.14/3.18 kernels to look how they > > will work in this case. > > Yes, it most likely was introduced by 287f3a943fef ("pppoe: Use > workqueue to die properly when a PADT is received"). I still have to > figure out why.
I confirm the bug comes from this commit. It happens if pppoe_connect() reinitialises po->proto.pppoe.padt_work after pppoe_disc_rcv() has added it to the system's work queue, and before that work got scheduled. Then when scheduling occurs, the worker thread tries to run a corrupted structure and crashes. I'm going to work on a patch. -- To unsubscribe from this list: send the line "unsubscribe netdev" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html