Hello,
We are using Nuttx and its SocketCAN support (on top of iMXRT hardware), but have run into some problems related to packet reordering and also possible soft-locks/stalls in the rx side. The symptoms we have is that the upper layer, which is a isotp implementation, receives packets out of order and sometimes does not receive the last packet in a burst of packets. The out of order problem is on the order of a few packets per 10k, and not receiving the last about the same prevalence (few per 10k bursts). Our analysis so far is that the issue is mostly related to that SocketCAN, opposed to for example all other network drivers, does packet ingestion without holding the net_lock(). This results in that there is a window when can_recvmsg() hold the lock, checks the readahead buffer but not yet setup the connection callback and waiting, that if a packet arrives within this window the low-level driver will call can_callback() from the ISR, it fails the net_trylock() and puts the newly received packet into the read-ahead buffer. can_recvmsg will then setup the callback, and the next received packet will be delivered to the application resulting in reordering. If the packet that is put into the read-ahead buffer is the last in a burst (each burst has to be acked before the sender will send more packets), then the callback will never be called an dcan_recvmsg wait forever. The out-of-order delivery we have solved with a local patch (basically, once we get a callback we process read-ahead buffer again before returning the newly received packet), but it is not that clean and does not solve the last-packet-in-burst problem. I am looking into ways to solve this properly and then contribute it up to Nuttx upstream, so I am looking for some input as to what would be considered an acceptable solution. Right now I have two possible solutions 1. Use our current way to deal with reordering, and then add a enqueued work-function that calls the callback in the case where net_trylock() fails (so data is immediately put on readahead queue, and then a work item later processes the callback to notify the application). This will probably work, but results in two notification chains and two places where read-ahead is handled. 2. Change SocketCAN to use the same pattern as all other network drivers, always do the ingestion of packets in a work-function. The ISR disables interrupts and enqueues a work function that then does the read-out of packets from the hardware mailboxes and sends them into the network layer (under net_lock()). This solves both problems cleanly, however there could be an issue for hardware that has very few mailboxes available with high packet rates. In our case with iMXRT we potentially have up to 60 hardware mailboxes, so it is no issue but for other hardware it could be that the driver needs to implement a software FIFO. This is right now my preferred solution. Any input from anyone involved in the original implementation is greatly appreciated. Regards Marten Svanfeldt