Xiang,

Thanks for the concrete example of how to break IOB deadlock. If that is
what's causing my problem, I will try it out.

cheers
adam

On Wed, Feb 19, 2020 at 6:11 PM Xiang Xiao <xiaoxiang781...@gmail.com>
wrote:

> Here is a demo fix for one of IOB deadlock recently :
>
> commit 2d0baa779d997f39b8121f5965f8125184e80d71
> Author: chao.an <anc...@xiaomi.com>
> Date:   Thu Jan 16 14:20:09 2020 -0300
>
>     net/udp: break the network lock to avoid deadlock
>
>     Author: chao.an <anc...@xiaomi.com>
>
>         net/udp: break the network lock to avoid deadlock
>
>           network deadlock when udp sendto() storm is coming
>
>         net/close: force wait tx drain to complete
>
>           atomic send() and close() will causes data to be discarded
> directly
>
>     Signed-off-by: chao.an <anc...@xiaomi.com>
>
> diff --git a/net/udp/udp_psock_sendto_buffered.c
> b/net/udp/udp_psock_sendto_buffered.c
> index 0dcf892759..fbc44de0f1 100644
> --- a/net/udp/udp_psock_sendto_buffered.c
> +++ b/net/udp/udp_psock_sendto_buffered.c
> @@ -75,6 +75,7 @@
>  #include "neighbor/neighbor.h"
>  #include "udp/udp.h"
>  #include "devif/devif.h"
> +#include "utils/utils.h"
>
>
>  /****************************************************************************
>   * Pre-processor Definitions
> @@ -713,8 +714,21 @@ ssize_t psock_udp_sendto(FAR struct socket
> *psock, FAR const void *buf,
>          }
>        else
>          {
> +          unsigned int count;
> +          int blresult;
> +
> +          /* iob_copyin might wait for buffers to be freed, but if
> +           * network is locked this might never happen, since network
> +           * driver is also locked, therefore we need to break the lock
> +           */
> +
> +          blresult = net_breaklock(&count);
>            ret = iob_copyin(wrb->wb_iob, (FAR uint8_t *)buf, len, 0, false,
>                             IOBUSER_NET_SOCK_UDP);
> +          if (blresult >= 0)
> +            {
> +              net_restorelock(count);
> +            }
>          }
>
>        if (ret < 0)
>
> The problem is that iob_copybin may allocate more IOB buffer
> internally and will wait if IOB isn't available.
> the old code call it without breaking netlock, then the other path
> can't get the netlock again after the pending IOB finish the sending
> and return to the pool.
> Hoping this case can give some tips.
>
> Thanks
> Xiang
>
> On Thu, Feb 20, 2020 at 6:50 AM Gregory Nutt <spudan...@gmail.com> wrote:
> >
> >
> > > This sounds a lot like the problem I'm having with the SAMA5D36 Gigabit
> > > ethernet... I'm running into some kind of deadlock on long transfers
> that
> > > send packets very quickly. NuttX seems to run out of IOBs and then
> can't
> > > send or respond to network packets.
> > >
> > > I tried increasing the low priority worker threads to 2 (and also 3)
> but
> > > neither of them solved the problem.
> > >
> > > I'll look at the net_lock() to see if there's a way to release it.
> > >
> > > If you find a solution, I would love to know it! If I find one, I'll
> post
> > > it here.
> >
> > The first step in debugging a deadlock is to find what is stuck waiting
> > for what resource.
> >
> > Then find the logic that provides the resource that is being waited on.
> >
> > Then figure out why that logic is not running.  Most likely, it would be
> > waiting the low priority work queue.
> >
> > I have had to solve lots of problems like this.  It is not really so
> > difficult once you unstand the above things.
> >
> >
> >
>


-- 
Adam Feuer <a...@starcat.io>

Reply via email to