> On 29 Apr 2021, at 01:38, Alexandr Nedvedicky
> <[email protected]> wrote:
>
> Hello,
>
> </snip>
>> Such a diff is below. I will test extensively towmorrow. If anyone
>> wants to try now, be my guest.
>>
>> Is the comment correct?
>
> I think comment is not quite right.
>
> </snip>
>
> the packet gets inserted into la->la_mq via arpresolve(), which
> is not protected by KERNEL_LOCK().
>
> arpresolve() runs as a reader on netlock, right?
>
>
>> Index: netinet/if_ether.c
>> ===================================================================
>> RCS file: /cvs/src/sys/netinet/if_ether.c,v
>> retrieving revision 1.248
>> diff -u -p -r1.248 if_ether.c
>> --- netinet/if_ether.c 28 Apr 2021 21:21:44 -0000 1.248
>> +++ netinet/if_ether.c 28 Apr 2021 21:44:22 -0000
>> @@ -689,12 +689,15 @@ arpcache(struct ifnet *ifp, struct ether
>> len = ml_len(&ml);
>> while ((m = ml_dequeue(&ml)) != NULL)
>> ifp->if_output(ifp, m, rt_key(rt), rt);
>> - /* XXXSMP we discard if other CPU enqueues */
>> - if (mq_len(&la->la_mq) > 0) {
>> - /* mbuf is back in queue. Discard. */
>> - atomic_sub_int(&la_hold_total, len + mq_purge(&la->la_mq));
>> - } else
>> - atomic_sub_int(&la_hold_total, len);
>> + atomic_sub_int(&la_hold_total, len);
>> +
>> + /*
>> + * This function is protected by kernel lock. No other CPU can insert
>> + * to la_mq while we feed the packets to if_output(). If a packet
>> + * is reinserted from there we have a loop. This should not happen
>> + * and we want to find such cases.
>> + */
>> + KASSERT(mq_len(&la->la_mq) == 0);
>
> tripping the assert here means the ARP cache entry either
> expired or got deleted, while we were dispatching queued packets
> to wire in a while() loop above.
>
> we enter dispatch loop above after ARP successfully resolves IP address.
> if ARP entry gets lost while we dispatching packets to wire we end up
> queuing packets into la->la_mq queue. This happens in arpresolve().
> arpresolve() is called on behalf of ifp->if_output()
>
> given arpcache() runs under KERNEL_LOCK() we can rule out case user
> runs 'arp -a -d' command, which would delete ARP cache. the 'arp -a -d'
> is excluded by KERNEL_LOCK().
>
> so if the assert fires something very strange must be happening.
This time arpcache() is only called by netisr() with both kernel and
exclusive net locks held. RTM_DELETE message processing will also call
ifp->if_rtrequest() with exclusive netlock held.
So this while() loop within arpcache() can’t be broken by “arp -a -d”.
Also netlock serializes arpcache() and arpresolve().
>
> I would suggest comment below:
>
> /*
> * If la_mq is not empty, then something very strange has happened.
> * We expect la_mq to be empty, because while() loop above dispatches
> * queued packets to wire after ARP resolves IP address.
> *
> * la_mq not being empty means arpresolve() called on behalf of
> * ifp->if_output() could not find ARP entry we've just put to
> * cache. Quite unexpected given we run under KERNEL_LOCK().
> */
>
> thanks and
> regards
> sashan