Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-17 Thread Hans Petter Selasky
Hi, I'm going to run some tests tomorrow. Meanwhile I've uploaded a patch here: https://reviews.freebsd.org/D4605 --HPS ___ freebsd-net@freebsd.org mailing list https://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to "f

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-12 Thread Hans Petter Selasky
On 12/12/15 00:26, Randall Stewart wrote: Hans: After talking with Gleb he tells me part of your test is to kldunload a module. Now I think that is the source of the problem. Probably the cleanup code failed to stop the timer and did the remove.. thus when the timer expires it blows up. This

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Randall Stewart via freebsd-net
Hans: After talking with Gleb he tells me part of your test is to kldunload a module. Now I think that is the source of the problem. Probably the cleanup code failed to stop the timer and did the remove.. thus when the timer expires it blows up. This is not a callout issue.. I think you need to

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Randall Stewart via freebsd-net
Hans: I don’t think you are getting a 1 back from the callout_reset().. If the pending bit is set, you get a 1 back. But if you have a race where the arp-timer is blocked on the lock (held by arp resolve) your going to have the pending bit off.. since before calling the function the callout code

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Hans Petter Selasky
On 12/11/15 12:16, Hans Petter Selasky wrote: On 12/11/15 11:12, Alexander V. Chernikov wrote: 11.12.2015, 12:15, "Hans Petter Selasky" : Hi, Pulling the nail out of the haystack hopefully. Any ideas on where next to look? Adrian: In your dump aswell I see: la_flags = 1 That means ther

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Hans Petter Selasky
On 12/11/15 11:12, Alexander V. Chernikov wrote: 11.12.2015, 12:15, "Hans Petter Selasky" : Hi, Pulling the nail out of the haystack hopefully. Any ideas on where next to look? Adrian: In your dump aswell I see: la_flags = 1 That means there was a race calling arptimer() and removing th

Re: Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Alexander V . Chernikov
11.12.2015, 12:15, "Hans Petter Selasky" : > Hi, > > Pulling the nail out of the haystack hopefully. > >>>  Any ideas on where next to look? > > Adrian: In your dump aswell I see: > > la_flags = 1 > > That means there was a race calling arptimer() and removing the "lle". Yes. The interesting part h

Race between arptimer() and lle removal [WAS: panic in arptimer in r289937]

2015-12-11 Thread Hans Petter Selasky
Hi, Pulling the nail out of the haystack hopefully. Any ideas on where next to look? Adrian: In your dump aswell I see: la_flags = 1 That means there was a race calling arptimer() and removing the "lle". Alexander: Can you comment on the following patch: > Index: netinet/if_ether.c >

Re: panic in arptimer in r289937

2015-12-10 Thread Hans Petter Selasky
On 12/10/15 16:35, Randall Stewart wrote: If you did that it would change the KPI a bit meaning lots of thrashing in the code. Hi, There are only 5 consumers of the callout_reset() return code in the FreeBSD 11-current kernel from what I can see: grep -r "= callout_reset" sys/ | wc -l

Re: panic in arptimer in r289937

2015-12-10 Thread Hans Petter Selasky
On 12/10/15 16:25, Randall Stewart wrote: For callout_stop/drain you get -1 == Callout as already gone off or is not running (usually the latter) else the caller iks not using locking properly or did not lock and check the active() value (which would have returned not active so no s

Re: panic in arptimer in r289937

2015-12-10 Thread Randall Stewart via freebsd-net
If you did that it would change the KPI a bit meaning lots of thrashing in the code. And on top of that you now would no longer return 0.. You would get 1 it was restarted or -1 it was not running but is now started. Makes no sense to me sorry.. R On Dec 10, 2015, at 7:35 AM, Hans Petter Selask

Re: panic in arptimer in r289937

2015-12-10 Thread Randall Stewart via freebsd-net
For callout_stop/drain you get -1 == Callout as already gone off or is not running (usually the latter) else the caller iks not using locking properly or did not lock and check the active() value (which would have returned not active so no stop would have been needed); 0 == w

Re: panic in arptimer in r289937

2015-12-10 Thread Randall Stewart via freebsd-net
Hans: Though it would not hurt to add your patch, its not possible for callout_reset() to return anything but 1 or 0. Only callout_stop(), callout_drain(), callout_async_drain() can return -1. So I don’t think that this will fix it. R On Dec 4, 2015, at 11:34 AM, Hans Petter Selasky wrote: >>

Re: panic in arptimer in r289937

2015-12-10 Thread Hans Petter Selasky
Hi, Here is the backtrace for a reproducable panic seen with arptimer(): #0 doadump (textdump=0) at pcpu.h:221 #1 0x80385afb in db_dump (dummy=, dummy2=false, dummy3=0, dummy4=0x0) at /usr/img/freebsd/sys/ddb/db_command.c:533 #2 0x803858ee in db_command (cmd_table=0x0) at /

Re: panic in arptimer in r289937

2015-12-04 Thread Hans Petter Selasky
On 12/04/15 20:34, Hans Petter Selasky wrote: Hi Adrian, On 10/31/15 16:01, Alexander V. Chernikov wrote: 31.10.2015, 16:46, "Adrian Chadd" : On 31 October 2015 at 09:34, Alexander V. Chernikov wrote: 31.10.2015, 05:32, "Adrian Chadd" : Hiya, Here's a panic from arptimer: Hi Adr

Re: panic in arptimer in r289937

2015-12-04 Thread Hans Petter Selasky
Hi Adrian, On 10/31/15 16:01, Alexander V. Chernikov wrote: 31.10.2015, 16:46, "Adrian Chadd" : On 31 October 2015 at 09:34, Alexander V. Chernikov wrote: 31.10.2015, 05:32, "Adrian Chadd" : Hiya, Here's a panic from arptimer: Hi Adrian, As far as I see, line 205 in if_ether.c

Re: panic in arptimer in r289937

2015-10-31 Thread Alexander V . Chernikov
31.10.2015, 16:46, "Adrian Chadd" : > On 31 October 2015 at 09:34, Alexander V. Chernikov > wrote: >>  31.10.2015, 05:32, "Adrian Chadd" : >>>  Hiya, >>> >>>  Here's a panic from arptimer: >>  Hi Adrian, >> >>  As far as I see, line 205 in if_ether.c is IF_AFDATA_LOCK(ifp) which >> happens afte

Re: panic in arptimer in r289937

2015-10-31 Thread Adrian Chadd
On 31 October 2015 at 09:34, Alexander V. Chernikov wrote: > > > 31.10.2015, 05:32, "Adrian Chadd" : >> Hiya, >> >> Here's a panic from arptimer: > Hi Adrian, > > As far as I see, line 205 in if_ether.c is IF_AFDATA_LOCK(ifp) which happens > after LLE_WUNLOCK(). > So, it looks like (pre-cached) i

Re: panic in arptimer in r289937

2015-10-31 Thread Alexander V . Chernikov
31.10.2015, 05:32, "Adrian Chadd" : > Hiya, > > Here's a panic from arptimer: Hi Adrian, As far as I see, line 205 in if_ether.c is IF_AFDATA_LOCK(ifp) which happens after LLE_WUNLOCK(). So, it looks like (pre-cached) ifp had been freed before locking ifdata. Do you have any more details on tha

panic in arptimer in r289937

2015-10-30 Thread Adrian Chadd
Hiya, Here's a panic from arptimer: (kgdb) bt #0 doadump (textdump=0) at pcpu.h:221 #1 0x803666b6 in db_fncall (dummy1=, dummy2=, dummy3=, dummy4=) at /usr/home/adrian/work/freebsd/head/src/sys/ddb/db_command.c:568 #2 0x8036614e in db_command (cmd_table=0x0) at /usr/home/adrian