Socketcan reordered packets and packet stalls

Mårten Svanfeldt Fri, 03 Mar 2023 00:48:15 -0800

Hello,


We are using Nuttx and its SocketCAN support (on top of iMXRT hardware), but 
have run into some problems related to packet reordering and also possible 
soft-locks/stalls in the rx side.


The symptoms we have is that the upper layer, which is a isotp implementation, 
receives packets out of order and sometimes does not receive the last packet in 
a burst of packets. The out of order problem is on the order of a few packets 
per 10k, and not receiving the last about the same prevalence (few per 10k 
bursts).


Our analysis so far is that the issue is mostly related to that SocketCAN, 
opposed to for example all other network drivers, does packet ingestion without 
holding the net_lock(). This results in that there is a window when 
can_recvmsg() hold the lock, checks the readahead buffer but not yet setup the 
connection callback and waiting, that if a packet arrives within this window 
the low-level driver will call can_callback() from the ISR, it fails the 
net_trylock() and puts the newly received packet into the read-ahead buffer. 
can_recvmsg will then setup the callback, and the next received packet will be 
delivered to the application resulting in reordering. If the packet that is put 
into the read-ahead buffer is the last in a burst (each burst has to be acked 
before the sender will send more packets), then the callback will never be 
called an dcan_recvmsg wait forever.


The out-of-order delivery we have solved with a local patch (basically, once we 
get a callback we process read-ahead buffer again before returning the newly 
received packet), but it is not that clean and does not solve the 
last-packet-in-burst problem. I am looking into ways to solve this properly and 
then contribute it up to Nuttx upstream, so I am looking for some input as to 
what would be considered an acceptable solution. Right now I have two possible 
solutions


  1.  Use our current way to deal with reordering, and then add a enqueued 
work-function that calls the callback in the case where net_trylock() fails (so 
data is immediately put on readahead queue, and then a work item later 
processes the callback to notify the application). This will probably work, but 
results in two notification chains and two places where read-ahead is handled.

  2.  Change SocketCAN to use the same pattern as all other network drivers, 
always do the ingestion of packets in a work-function. The ISR disables 
interrupts and enqueues a work function that then does the read-out of packets 
from the hardware mailboxes and sends them into the network layer (under 
net_lock()). This solves both problems cleanly, however there could be an issue 
for hardware that has very few mailboxes available with high packet rates. In 
our case with iMXRT we potentially have up to 60 hardware mailboxes, so it is 
no issue but for other hardware it could be that the driver needs to implement 
a software FIFO. This is right now my preferred solution.

Any input from anyone involved in the original implementation is greatly 
appreciated.

Regards
Marten Svanfeldt

Socketcan reordered packets and packet stalls

Reply via email to