Re: STM32H7 Ethernet problems

Reto Gähwiler Wed, 19 Feb 2020 23:57:42 -0800

Hi Adam, 

I saw your post and was thinking about linking to it. From what I understand so 
far, it happens only if there are multiple connections running. Some bursts of 
arp messages or flood pings will eventually also break things. Just a matter of 
timing. The whole collapse starts with no free buffer (network warning and 
error messages need to be actiaved in the config) after which the DMA will be 
locked down.


rgds, Reto

On 2020/02/19 22:18:51, Adam Feuer <a...@starcat.io> wrote: 
> Reto,
> 
> This sounds a lot like the problem I'm having with the SAMA5D36 Gigabit
> ethernet... I'm running into some kind of deadlock on long transfers that
> send packets very quickly. NuttX seems to run out of IOBs and then can't
> send or respond to network packets.
> 
> I tried increasing the low priority worker threads to 2 (and also 3) but
> neither of them solved the problem.
> 
> I'll look at the net_lock() to see if there's a way to release it.
> 
> If you find a solution, I would love to know it! If I find one, I'll post
> it here.
> 
> cheers
> adam
> 
> On Wed, Feb 19, 2020 at 6:49 AM Gregory Nutt <spudan...@gmail.com> wrote:
> 
> > Hi, Reto,
> > > I am working for Hexagon Mining in Baar (some of our people were on the
> > last nuttx summit). Our project involves nuttx and runs on a stm32h743zi.
> > Our nuttx version was recently (mid January) updated.
> > >
> > > We recently recognised that the ethernet of our platform dies after some
> > time. After some investigation it looks like that the DMA of the ethernet
> > ends up in a deadlock since no further descriptors are available and also
> > not freed. It all starts with a packet which is not processed right away
> > but in the next interrupt handling cycle after the next ETH_DMACSR_RI
> > appeared. It then either collapses or will have a massive slow down. In
> > case of a collapse everything will stop (incl. ping) since the chain is
> > stopped around the DMA.
> > >
> > > So far it only happened during a longer data transfer. The device works
> > fine over days just sitting there and responding to the broadcasts on the
> > network. To verify the problems we also used the TCPblaster example with a
> > minimum code base from our side. There, it only happens if multiple threads
> > are used. One thread alone is handled well as it seems. The TCPblaster
> > worked fine on the stm32f7.
> > >
> > > I am wondering if anyone had this or a similar problem before. If you
> > need more information please let me know.
> >
> > I don't know anything about your specific deadlock, but errors like
> > these have occurred and been fixed in the past.  The deadlock normally
> > occurs like this:
> >
> >  1. Some network activity is started and runs on the low priority work
> >     queue.  Most networking occurs FIFO on the low priority worker thread.
> >  2. That network task task the network lock (net_lock()) giving it
> >     exclusive access to the network
> >  3. Then it waits for some event or resource with the network locked.
> >  4. The task that will provide the event or resource also requires the
> >     network lock --> Deadlock
> >
> > With IOBs, there are other related kinds of deadlocks that are possible.
> >
> >  1. Some network activity is started and runs on the low priority work
> >     queue.
> >  2. The network task needs IOBs but we are out of IOBs so that network
> >     task unlocks the network (allowing network activity) but blocks in
> >     the low priority work queue waiting for a free IOB.
> >  3. The task the will release the IOB is also queued for execution on
> >     the low priority work queue.  But since the queue is block because
> >     the network task is waiting on the working queue, the IOB cannot be
> >     release --> Deadlock
> >
> > There are a couple of solutions to this latter IOB case:  First you can
> > analyze the deadlock and find the culprit.  Then modify the design so
> > that the deadlock cannot occur.
> >
> > If this is the situation, that a really simple fix is to increase the
> > number of low priority worker threads.  By default, there is only one so
> > the FIFO nature of the single work queue tends to deadlock.  But if you
> > increase the number of threads to two these is much less likelihood of
> > deadlocking in this way.
> >
> > Greg
> >
> >
> >
> 
> -- 
> Adam Feuer <a...@starcat.io>
>

Re: STM32H7 Ethernet problems

Reply via email to