is it not the same issue as PR 211990 ? can you try by turning off jumbo frames ?
On Tue, Jan 03, 2017 at 06:27:15AM +0000, Meny Yossefi wrote: > > ________________________________________ > From: owner-freebsd-net@freebsd.orgOn Behalf OfBen RUBSON > Sent: Monday, January 2, 2017 11:09:15 AM (UTC+00:00) Monrovia, Reykjavik > To: freebsd-net@freebsd.org > Cc: Meny Yossefi; Yuval Bason; Hans Petter Selasky > Subject: Re: iSCSI failing, MLX rx_ring errors ? > > Hi Meny, > > Thank you very much for your feedback. > > I think you are right, this could be a mbufs issue. > Here are some more numbers : > > # vmstat -z | grep -v "0, 0$" > ITEM SIZE LIMIT USED FREE REQ FAIL > SLEEP > 4 Bucket: 32, 0, 2673, 28327, 88449799, 17317, 0 > 8 Bucket: 64, 0, 449, 15609, 13926386, 4871, 0 > 12 Bucket: 96, 0, 335, 5323, 10293892, 142872, 0 > 16 Bucket: 128, 0, 533, 6070, 7618615, 472647, 0 > 32 Bucket: 256, 0, 8317, 22133, 36020376, 563479, 0 > 64 Bucket: 512, 0, 1238, 3298, 20138111, 11430742, 0 > 128 Bucket: 1024, 0, 1865, 2963, 21162182, 158752, 0 > 256 Bucket: 2048, 0, 1626, 450, 80253784, 4890164, 0 > mbuf_jumbo_9k: 9216, 603712, 16400, 8744, 4128521064, 2661, 0 > > # netstat -m > 32801/18814/51615 mbufs in use (current/cache/total) > 16400/9810/26210/4075058 mbuf clusters in use (current/cache/total/max) > 16400/9659 mbuf+clusters out of packet secondary zone in use (current/cache) > 0/8647/8647/2037529 4k (page size) jumbo clusters in use > (current/cache/total/max) > 16400/8744/25144/603712 9k jumbo clusters in use (current/cache/total/max) > 0/0/0/339588 16k jumbo clusters in use (current/cache/total/max) > 188600K/137607K/326207K bytes allocated to network (current/cache/total) > 0/0/0 requests for mbufs denied (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for mbufs delayed (mbufs/clusters/mbuf+clusters) > 0/0/0 requests for jumbo clusters delayed (4k/9k/16k) > 0/2661/0 requests for jumbo clusters denied (4k/9k/16k) > 0 sendfile syscalls > 0 sendfile syscalls completed without I/O request > 0 requests for I/O initiated by sendfile > 0 pages read by sendfile as part of a request > 0 pages were valid at time of a sendfile request > 0 pages were requested for read ahead by applications > 0 pages were read ahead by sendfile > 0 times sendfile encountered an already busy page > 0 requests for sfbufs denied > 0 requests for sfbufs delayed > > I did not perform any mbufs tuning, numbers above are from FreeBSD itself. > > This server has 64GB of memory. > It has a ZFS pool for which I limit ARC memory impact with : > vfs.zfs.arc_max=64424509440 #60G > > The only thing I did is some TCP tuning to improve throughput over > high-latency long-distance private links : > kern.ipc.maxsockbuf=7372800 > net.inet.tcp.sendbuf_max=6553600 > net.inet.tcp.recvbuf_max=6553600 > net.inet.tcp.sendspace=65536 > net.inet.tcp.recvspace=65536 > net.inet.tcp.sendbuf_inc=65536 > net.inet.tcp.recvbuf_inc=65536 > net.inet.tcp.cc.algorithm=htcp > > Here are some graphs of memory & ARC usage when issue occurs. > Crosshair (vertical red line) is at the timestamp where I get iSCSI > disconnections. > https://postimg.org/gallery/1kkekrc4e/ > What is strange is that each time issue occurs there is around 1GB of free > memory. > So FreeBSD should still be able to allocate some more mbufs ? > Unfortunately I do not have graphs about mbufs. > > What should I ideally do ? > > >> Have you tried increasing the mbufs limit? > (sysctl) kern.ipc.nmbufs (Maximum number of mbufs allowed) > > > Thank you again, > > Best regards, > > Ben > > > > > On 01 Jan 2017, at 09:16, Meny Yossefi <me...@mellanox.com> wrote: > > > > Hi Ben, > > > > Those are not HW errors, note that: > > > > hw.mlxen1.stat.rx_dropped: 0 > > hw.mlxen1.stat.rx_errors: 0 > > > > It seems to be triggered when you are failing to allocate a replacement > > buffer. > > Any chance you ran out of mbufs in the system? > > > > en_rx.c: > > > > mlx4_en_process_rx_cq(): > > > > mb = mlx4_en_rx_mb(priv, rx_desc, mb_list, length); > > if (!mb) { > > ring->errors++; > > goto next; > > } > > > > mlx4_en_rx_mb() à mlx4_en_complete_rx_desc(): > > > > /* Allocate a replacement page */ > > if (mlx4_en_alloc_buf(priv, rx_desc, mb_list, nr)) > > goto fail; > > > > -Meny > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" > _______________________________________________ > freebsd-net@freebsd.org mailing list > https://lists.freebsd.org/mailman/listinfo/freebsd-net > To unsubscribe, send any mail to "freebsd-net-unsubscr...@freebsd.org" -- Julien Cigar Belgian Biodiversity Platform (http://www.biodiversity.be) PGP fingerprint: EEF9 F697 4B68 D275 7B11 6A25 B2BB 3710 A204 23C0 No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced.
signature.asc
Description: PGP signature