Here is a demo fix for one of IOB deadlock recently :

commit 2d0baa779d997f39b8121f5965f8125184e80d71
Author: chao.an <anc...@xiaomi.com>
Date:   Thu Jan 16 14:20:09 2020 -0300

    net/udp: break the network lock to avoid deadlock

    Author: chao.an <anc...@xiaomi.com>

        net/udp: break the network lock to avoid deadlock

          network deadlock when udp sendto() storm is coming

        net/close: force wait tx drain to complete

          atomic send() and close() will causes data to be discarded directly

    Signed-off-by: chao.an <anc...@xiaomi.com>

diff --git a/net/udp/udp_psock_sendto_buffered.c
b/net/udp/udp_psock_sendto_buffered.c
index 0dcf892759..fbc44de0f1 100644
--- a/net/udp/udp_psock_sendto_buffered.c
+++ b/net/udp/udp_psock_sendto_buffered.c
@@ -75,6 +75,7 @@
 #include "neighbor/neighbor.h"
 #include "udp/udp.h"
 #include "devif/devif.h"
+#include "utils/utils.h"

 /****************************************************************************
  * Pre-processor Definitions
@@ -713,8 +714,21 @@ ssize_t psock_udp_sendto(FAR struct socket
*psock, FAR const void *buf,
         }
       else
         {
+          unsigned int count;
+          int blresult;
+
+          /* iob_copyin might wait for buffers to be freed, but if
+           * network is locked this might never happen, since network
+           * driver is also locked, therefore we need to break the lock
+           */
+
+          blresult = net_breaklock(&count);
           ret = iob_copyin(wrb->wb_iob, (FAR uint8_t *)buf, len, 0, false,
                            IOBUSER_NET_SOCK_UDP);
+          if (blresult >= 0)
+            {
+              net_restorelock(count);
+            }
         }

       if (ret < 0)

The problem is that iob_copybin may allocate more IOB buffer
internally and will wait if IOB isn't available.
the old code call it without breaking netlock, then the other path
can't get the netlock again after the pending IOB finish the sending
and return to the pool.
Hoping this case can give some tips.

Thanks
Xiang

On Thu, Feb 20, 2020 at 6:50 AM Gregory Nutt <spudan...@gmail.com> wrote:
>
>
> > This sounds a lot like the problem I'm having with the SAMA5D36 Gigabit
> > ethernet... I'm running into some kind of deadlock on long transfers that
> > send packets very quickly. NuttX seems to run out of IOBs and then can't
> > send or respond to network packets.
> >
> > I tried increasing the low priority worker threads to 2 (and also 3) but
> > neither of them solved the problem.
> >
> > I'll look at the net_lock() to see if there's a way to release it.
> >
> > If you find a solution, I would love to know it! If I find one, I'll post
> > it here.
>
> The first step in debugging a deadlock is to find what is stuck waiting
> for what resource.
>
> Then find the logic that provides the resource that is being waited on.
>
> Then figure out why that logic is not running.  Most likely, it would be
> waiting the low priority work queue.
>
> I have had to solve lots of problems like this.  It is not really so
> difficult once you unstand the above things.
>
>
>

Reply via email to