Re: igb driver RX (was TX) hangs when out of mbuf clusters

Michael Tuexen Wed, 09 Feb 2011 02:37:20 -0800

Hi Jack,

I could recreate the problem. When the problem occurs, we see


rx_nxt_check = n
rx_nxt_refresh = n + 1

(This was also reported in a mail from Karim)

This means that the *whole* receive ring has no buffers anymore. This can
occur if, for some amount of time, no clusters are available.

Now outside of the driver, at some point of time, clusters are freed.
I don't think that igb_refresh_mbufs() gets called, since it only gets
called from igb_rxeof(), which gets called when a packet has been received,
which can not happen since the receive ring is empty. So how can the driver
know? I have no idea. Maybe we can periodically check for such an event
and call igb_refresh_mbufs().

Does this make sense to you?

Best regards
Michael


On Feb 9, 2011, at 8:32 AM, Jack Vogel wrote:

> Hmmm, well so much for that theory :)
> 
> Jack
> 
> 
> On Tue, Feb 8, 2011 at 4:06 PM, Karim Fodil-Lemelin 
> <fodillemlinka...@gmail.com> wrote:
> 
> 
> 2011/2/8 Jack Vogel <jfvo...@gmail.com>
> 
> 
> I have been following this, and thinking about it. I still am working from a 
> theoretical
> standpoint, but based on a patch I got quite a long time back and never quite 
> groked,
> I believe now that I might have a solution.
> 
> The original PR and patch was kern/150516 from Beezar Liu,  I was never quite 
> comfortable
> with the code changes, nor convinced that it was a real issue and not a 
> misunderstanding.
> However I think now that this very report might be behind what we are seeing 
> today. I have
> a slightly different approach to solving it, of course it remains to be seen 
> if it handles it 
> properly. 
> 
> Please try the patch I've attached, I'm open to further correction or 
> polishing of the 
> changes. And thanks to Beezar for his original report and changes, this is 
> not for em,
> but if this eliminates the problem its clearly needed in all drivers. 
> 
> Jack
> 
> 
> Hi Jack,
> 
> Thanks for your help. I tried your patch and it didn't work so I added a 
> couple of printf to see if the added code was getting hit:
> 
> --- a/freebsd/sys/dev/e1000/if_igb.c
> --More--(byte 1253)+++ b/freebsd/sys/dev/e1000/if_igb.c
> @@ -612,7 +612,7 @@ igb_attach(device_t dev)
>             device_get_nameunit(dev));
>  
>         INIT_DEBUGOUT("igb_attach: end");
> -
> +       printf("this driver has a patch from Jack Vogel\n");
>         return (0);
>  
>  err_late:
> @@ -4131,6 +4131,7 @@ igb_rxeof(struct igb_queue *que, int count, int *done)
>                 struct mbuf             *sendmp, *mh, *mp;
>                 struct igb_rx_buf       *rxbuf;
>                 u16                     hlen, plen, hdr, vtag;
> +               int                     commit;
>                 bool                    eop = FALSE;
>   
>                 cur = &rxr->rx_base[i];
> @@ -4255,10 +4256,23 @@ next_desc:
>                 bus_dmamap_sync(rxr->rxdma.dma_tag, rxr->rxdma.dma_map,
>                     BUS_DMASYNC_PREREAD | BUS_DMASYNC_PREWRITE);
>  
> +               commit = i;     /* capture the old index */
> +
>                 /* Advance our pointers to the next descriptor. */
>                 if (++i == adapter->num_rx_desc)
>                         i = 0;
>                 /*
> +               ** Sanity test for ring full, if this
> +               ** happens we need to refresh immediately
> +               ** or refresh may deadlock.
> +               */
> +               if (i == rxr->next_to_refresh) {
> +                       igb_refresh_mbufs(rxr, commit);
> +                       printf("igb_refresh_mbufs called with commit %d\n", 
> commit);
> +                       processed = 0;
> +               }
> +
> +               /*
>                 ** Send to the stack or LRO
>                 */
>                 if (sendmp != NULL) {
> 
> Here is the results:
> 
> # dmesg | grep Vogel
> this driver has a patch from Jack Vogel
> this driver has a patch from Jack Vogel
> 
> # netstat -m
> 60453/52707/113160 mbufs in use (current/cache/total)
> 48416/51584/100000/100000 mbuf clusters in use (current/cache/total/max)
> 2894/690 mbuf+clusters out of packet secondary zone in use (current/cache)
> 11946/854/12800/12800 4k (page size) jumbo clusters in use 
> (current/cache/total/max)
> 0/0/0/6400 9k jumbo clusters in use (current/cache/total/max)
> 0/0/0/3200 16k jumbo clusters in use (current/cache/total/max)
> 164834K/119760K/284595K bytes allocated to network (current/cache/total)
> 0/339/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters)
> 0/0/0 requests for jumbo clusters denied (4k/9k/16k)
> 0/4/6656 sfbufs in use (current/peak/max)
> 0 requests for sfbufs denied
> 0 requests for sfbufs delayed
> 0 requests for I/O initiated by sendfile
> 0 calls to protocol drain routines
> # dmesg | grep commit
> 
> At this point RX has hung.
> 
> Somehow the check (i == rxr->next_to_refresh) is never true in this case. 
> Also, I did read kern/150516 and couldn't wrap my head around the patch for 
> the em driver that Beezar Liu suggested.
> 
> Regards,
> 
> Karim.
> 
> 

_______________________________________________
freebsd-net@freebsd.org mailing list
http://lists.freebsd.org/mailman/listinfo/freebsd-net
To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org"

Re: igb driver RX (was TX) hangs when out of mbuf clusters

Reply via email to