https://bugs.dpdk.org/show_bug.cgi?id=440
Bug ID: 440 Summary: net/mlx5: Read of "out_of_buffer" using fopen/fread/fclose causing TLB shootdowns due to mmap/munmap Product: DPDK Version: 19.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: mohsinsha...@niometrics.com Target Milestone: --- Created attachment 86 --> https://bugs.dpdk.org/attachment.cgi?id=86&action=edit Potential fix patch Setup: HPE ProLiant DL560 Gen10 Server with 4 x "Intel(R) Xeon(R) Platinum 8180M CPU @ 2.50GHz" and 6TB of memory (1 TB reserved huge pages) with 2 X "Mellanox Technologies ConnectX®-5 EN network interface card, 100GbE single-port QSFP28, PCIe3.0 x16, tall bracket; MCX515A-CCAT". OS is Centos 7 with kernel 3.10.0-957.27.2.el7.x86_64. While testing I noticed that the rx cores of the DPDK app were getting TLB shootdowns periodically. After debugging I found that the stats thread of the DPDK app was responsible for those. Specifically the read of "out_of_buffer" stat: static inline void mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat) { FILE *file; if (priv->sh) { MKSTR(path, "%s/ports/%d/hw_counters/%s", priv->sh->ibdev_path, priv->ibv_port, ctr_name); file = fopen(path, "rb"); if (file) { int n = fscanf(file, "%" SCNu64, stat); <----- Calls mmap(). fclose(file); <---- Calls munmap which leads to TLB shootdown if (n == 1) return; } } *stat = 0; } NOTE: We almost exclusively use huge pages in the DPDK app for allocations. Stack traces: (gdb) where #0 __mmap (addr=addr@entry=0x0, len=4096, prot=prot@entry=3, flags=flags@entry=34, fd=fd@entry=-1, offset=offset@entry=0) at ../sysdeps/unix/sysv/linux/wordsize-64/mmap.c:32 #1 0x00007fffdeaf8ed1 in __GI__IO_file_doallocate (fp=0x2b52263ae000) at filedoalloc.c:127 #2 0x00007fffdeb07d07 in __GI__IO_doallocbuf (fp=fp@entry=0x2b52263ae000) at genops.c:399 #3 0x00007fffdeb06c8c in _IO_new_file_underflow (fp=0x2b52263ae000) at fileops.c:557 #4 0x00007fffdeb07dd2 in __GI__IO_default_uflow (fp=0x2b52263ae000) at genops.c:414 #5 0x00007fffdeae7efa in _IO_vfscanf_internal (s=<optimized out>, format=<optimized out>, argptr=argptr@entry=0x7fff51f07498, errp=errp@entry=0x0) at vfscanf.c:600 #6 0x00007fffdeaed617 in ___vfscanf (s=<optimized out>, format=<optimized out>, argptr=argptr@entry=0x7fff51f07498) at vfscanf.c:2942 #7 0x00007fffdeaf6247 in __fscanf (stream=<optimized out>, format=<optimized out>) at fscanf.c:31 #8 0x00007fffe632ee9f in mlx5_stats_get () from /opt/nio/dpdk/lib/librte_pmd_mlx5.so.20.0 #9 0x00007fffeee6b468 in rte_eth_stats_get () from /opt/nio/dpdk/lib/librte_ethdev.so.20.0 Breakpoint 2, munmap () at ../sysdeps/unix/syscall-template.S:81 81 T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS) (gdb) where #0 munmap () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007fffdeb07cb2 in __GI__IO_setb (f=f@entry=0x2b52263ae000, b=b@entry=0x0, eb=eb@entry=0x0, a=a@entry=0) at genops.c:383 #2 0x00007fffdeb06090 in _IO_new_file_close_it (fp=fp@entry=0x2b52263ae000) at fileops.c:192 #3 0x00007fffdeaf9138 in _IO_new_fclose (fp=0x2b52263ae000) at iofclose.c:58 #4 0x00007fffe632eeaa in mlx5_stats_get () from /opt/nio/dpdk/lib/librte_pmd_mlx5.so.20.0 #5 0x00007fffeee6b468 in rte_eth_stats_get () from /opt/nio/dpdk/lib/librte_ethdev.so.20.0 Output from kernel tracing (using trace-cmd): <...>-120694 [000] 502442.376421: function: native_flush_tlb_others <...>-120694 [000] 502442.376423: kernel_stack: <stack trace> => tlb_flush_mmu.part.76 (ffffffffa89e57d7) => tlb_finish_mmu (ffffffffa89e70d5) => unmap_region (ffffffffa89f0754) => do_munmap (ffffffffa89f2d45) => vm_munmap (ffffffffa89f2f85) => SyS_munmap (ffffffffa89f4212) => tracesys (ffffffffa8f7706b) <...>-120324 [002] 502442.382250: function: flush_tlb_func <...>-120323 [001] 502442.382250: function: flush_tlb_func <...>-120325 [003] 502442.382251: function: flush_tlb_func <...>-120327 [005] 502442.382251: function: flush_tlb_func <...>-120326 [004] 502442.382251: function: flush_tlb_func <...>-120332 [010] 502442.382251: function: flush_tlb_func <...>-120324 [002] 502442.382252: kernel_stack: <stack trace> => generic_smp_call_function_single_interrupt (ffffffffa8912ea3) => smp_call_function_interrupt (ffffffffa885737d) => call_function_interrupt (ffffffffa8f79382) The following patch seems to fix the issue: ========================================================================= commit e4e93bd3ffb4e75b2178b0fdfe12341bfddf171d Author: Mohsin Shaikh <mohsinsha...@niometrics.com> Date: Tue Mar 31 14:28:41 2020 +0800 net/mlx5: Use open/read/close for reading ib stat diff --git a/drivers/net/mlx5/mlx5_stats.c b/drivers/net/mlx5/mlx5_stats.c index 205e4fe..769f080 100644 --- a/drivers/net/mlx5/mlx5_stats.c +++ b/drivers/net/mlx5/mlx5_stats.c @@ -8,6 +8,8 @@ #include <linux/ethtool.h> #include <stdint.h> #include <stdio.h> +#include <unistd.h> +#include <fcntl.h> #include <rte_ethdev_driver.h> #include <rte_common.h> @@ -139,20 +141,22 @@ static inline void mlx5_read_ib_stat(struct mlx5_priv *priv, const char *ctr_name, uint64_t *stat) { - FILE *file; + int fd; if (priv->sh) { MKSTR(path, "%s/ports/%d/hw_counters/%s", priv->sh->ibdev_path, priv->ibv_port, ctr_name); - file = fopen(path, "rb"); - if (file) { - int n = fscanf(file, "%" SCNu64, stat); - - fclose(file); - if (n == 1) + fd = open(path, O_RDONLY); + if (fd != -1) { + char buf[32]; + ssize_t n = read(fd, buf, sizeof(buf)); + close(fd); + if (n != -1) { + sscanf(buf, "%lu", stat); return; + } } } *stat = 0; ========================================================================= Patch attached to bug. These TLB shootdowns lead to packets being dropped sporadically by the mlx5 NICs with the "rx_discards_phy" counter being incremented. The above patch significantly reduced the drops in our app. -- You are receiving this mail because: You are the assignee for the bug.