On 12/11/15 11:12, Alexander V. Chernikov wrote:
11.12.2015, 12:15, "Hans Petter Selasky" <h...@selasky.org>:
Hi,

Pulling the nail out of the haystack hopefully.

  Any ideas on where next to look?

Adrian: In your dump aswell I see:

la_flags = 1

That means there was a race calling arptimer() and removing the "lle".
Yes. The interesting part here is why lle is removed. There are quite a few 
reasons: either interface address deleted or interface going down, or explicit 
delete request.
That's why I asked Adrian about interface stuff (and haven't got a reply).

Alexander: Can you comment on the following patch:

  > Index: netinet/if_ether.c
  > ===================================================================
  > --- netinet/if_ether.c (revision 291256)
  > +++ netinet/if_ether.c (working copy)
  > @@ -185,7 +185,13 @@
  > LLE_WUNLOCK(lle);
  > return;
  > }
  > - ifp = lle->lle_tbl->llt_ifp;
  > + if (lle->la_flags & LLE_LINKED) {
  > + ifp = lle->lle_tbl->llt_ifp;
  > + } else {
  > + /* XXX RACE entry has been freed */
  > + llentry_free(lle);
  > + return;
  > + }
  > CURVNET_SET(ifp->if_vnet);
  >
  > if ((lle->la_flags & LLE_DELETED) == 0) {

We need a check in arptimer() that the lle is still linked before
Yes, I had exactly that approach in mind. (And nd6_llinfo_timer() needs the 
same fix).
So, would you commit it or should I?
proceeding, in there from what I can see. Because the callback is not
protected by a mutex, it is not atomically stopped by callout_stop().

Hi,

Talking to Randall offlist, I see some more trouble. Let's get everything straight before making a fix. There is one more race I see:

The start of the arptimer() callback looks like this:

> static void
> arptimer(void *arg)
> {
POINT0
>         LLE_WLOCK(lle);
>         if (callout_pending(&lle->lle_timer)) {
POINT1
>                 LLE_WUNLOCK(lle);
>                 return;
>         }

The code starting the callback looks like this:

>                 LLE_ADDREF(la);
>                 la->la_expire = time_uptime;
> canceled = callout_reset(&la->lle_timer, hz * V_arpt_down,
>                     arptimer, la);
>                 if (canceled)
>                         LLE_REMREF(la);

Which can be written like this:

> la->la_expire = time_uptime;
> canceled = callout_reset(&la->lle_timer, hz * V_arpt_down,
>     arptimer, la);
> if (canceled == 0)
>     LLE_ADDREF(la);

In case we are at POINT0 in arptimer, callout_reset() will not be able to cancel the callout and will return 0. We should also drop one ref at POINT1, or rewrite the code a bit, which might need Randall's help in the callout subsystem area.

--HPS
_______________________________________________
freebsd-net@freebsd.org mailing list
https://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Reply via email to