On Tue, 19 Apr 2022 22:27:32 +0200 Michał Krawczyk <m...@semihalf.com> wrote:
> wt., 19 kwi 2022 o 17:01 Stephen Hemminger > <step...@networkplumber.org> napisał(a): > > > > On Tue, 19 Apr 2022 14:10:23 +0200 > > Michał Krawczyk <m...@semihalf.com> wrote: > > > > > pon., 18 kwi 2022 o 17:19 Amiya Mohakud > > > <amoha...@paloaltonetworks.com> napisał(a): > > > > > > > > + Megha, Sharad and Eswar. > > > > > > > > On Mon, Apr 18, 2022 at 2:03 PM Amiya Mohakud > > > > <amoha...@paloaltonetworks.com> wrote: > > > >> > > > >> Hi Michal/DPDK-Experts, > > > >> > > > >> I am facing one issue in net/ena driver while fetching extended stats > > > >> (xstats). The DPDK seems to segfault with below backtrace. > > > >> > > > >> DPDK Version: 20.11.1 > > > >> ENA version: 2.2.1 > > > >> > > > >> > > > >> Using host libthread_db library "/lib64/libthread_db.so.1". > > > >> > > > >> Core was generated by `/opt/dpfs/usr/local/bin/brdagent'. > > > >> > > > >> Program terminated with signal SIGSEGV, Segmentation fault. > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at > > > >> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > > >> > > > >> 232 VMOVU %VEC(0), (%rdi) > > > >> > > > >> [Current thread is 1 (Thread 0x7fffed93a400 (LWP 5060))] > > > >> > > > >> > > > >> Thread 1 (Thread 0x7fffed93a400 (LWP 5060)): > > > >> > > > >> #0 __memmove_avx_unaligned_erms () at > > > >> ../sysdeps/x86_64/multiarch/memmove-vec-unaligned-erms.S:232 > > > >> > > > >> #1 0x00007ffff3c246df in ena_com_handle_admin_completion () from > > > >> ../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #2 0x00007ffff3c1e7f5 in ena_interrupt_handler_rte () from > > > >> ../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #3 0x00007ffff3519902 in eal_intr_thread_main () from > > > >> /../lib64/../../lib64/libdpdk.so.20 > > > >> > > > >> #4 0x00007ffff510714a in start_thread (arg=<optimized out>) at > > > >> pthread_create.c:479 > > > >> > > > >> #5 0x00007ffff561ff23 in clone () at > > > >> ../sysdeps/unix/sysv/linux/x86_64/clone.S:95 > > > >> > > > >> > > > >> > > > >> > > > >> Background: > > > >> > > > >> This used to work fine with DPDK-19.11.3 , that means there was no > > > >> crash observed with the 19.11.3 DPDK version, but now after upgrading > > > >> to DPDK 20.11.1, DPDK is crashing with the above trace. > > > >> It looks to me as a DPDK issue. > > > >> I could see multiple fixes/patches in the net/ena area, but not able > > > >> to identify which patch would exactly fix this issue. > > > >> > > > >> For example: > > > >> http://git.dpdk.org/dpdk/diff/?h=releases&id=aab58857330bb4bd03f6699bf1ee716f72993774 > > > >> https://inbox.dpdk.org/dev/20210430125725.28796-6...@semihalf.com/T/#me99457c706718bb236d1fd8006ee7a0319ce76fc > > > >> > > > >> > > > >> Could you please help here and let me know what patch could fix this > > > >> issue. > > > >> > > > > > > + Shai Brandes and ena-dev > > > > > > Hi Amiya, > > > > > > Thanks for reaching me out. Could you please provide us with more > > > details regarding the reproduction? I cannot reproduce this on my > > > setup for DPDK v20.11.1 when using testpmd and probing for the xstats. > > > > > > ======================================================================= > > > [ec2-user@<removed> dpdk]$ sudo ./build/app/dpdk-testpmd -- -i > > > EAL: Detected 8 lcore(s) > > > EAL: Detected 1 NUMA nodes > > > EAL: Detected static linkage of DPDK > > > EAL: Multi-process socket /var/run/dpdk/rte/mp_socket > > > EAL: Selected IOVA mode 'PA' > > > EAL: No available hugepages reported in hugepages-1048576kB > > > EAL: Probing VFIO support... > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Invalid NUMA socket, default to 0 > > > EAL: Probe PCI driver: net_ena (1d0f:ec20) device: 0000:00:06.0 (socket 0) > > > EAL: No legacy callbacks, legacy socket not created > > > Interactive-mode selected > > > ena_mtu_set(): Set MTU: 1500 > > > > > > testpmd: create a new mbuf pool <mb_pool_0>: n=203456, size=2176, socket=0 > > > testpmd: preferred mempool ops selected: ring_mp_mc > > > > > > Warning! port-topology=paired and odd forward ports number, the last > > > port will pair with itself. > > > > > > Configuring Port 0 (socket 0) > > > Port 0: <removed> > > > Checking link statuses... > > > Done > > > Error during enabling promiscuous mode for port 0: Operation not > > > supported - ignore > > > testpmd> start > > > io packet forwarding - ports=1 - cores=1 - streams=1 - NUMA support > > > enabled, MP allocation mode: native > > > Logical Core 1 (socket 0) forwards packets on 1 streams: > > > RX P=0/Q=0 (socket 0) -> TX P=0/Q=0 (socket 0) peer=02:00:00:00:00:00 > > > > > > io packet forwarding packets/burst=32 > > > nb forwarding cores=1 - nb forwarding ports=1 > > > port 0: RX queue number: 1 Tx queue number: 1 > > > Rx offloads=0x0 Tx offloads=0x0 > > > RX queue: 0 > > > RX desc=0 - RX free threshold=0 > > > RX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > > RX Offloads=0x0 > > > TX queue: 0 > > > TX desc=0 - TX free threshold=0 > > > TX threshold registers: pthresh=0 hthresh=0 wthresh=0 > > > TX offloads=0x0 - TX RS bit threshold=0 > > > testpmd> show port xstats 0 > > > ###### NIC extended statistics for port 0 > > > rx_good_packets: 1 > > > tx_good_packets: 1 > > > rx_good_bytes: 42 > > > tx_good_bytes: 42 > > > rx_missed_errors: 0 > > > rx_errors: 0 > > > tx_errors: 0 > > > rx_mbuf_allocation_errors: 0 > > > rx_q0_packets: 1 > > > rx_q0_bytes: 42 > > > rx_q0_errors: 0 > > > tx_q0_packets: 1 > > > tx_q0_bytes: 42 > > > wd_expired: 0 > > > dev_start: 1 > > > dev_stop: 0 > > > tx_drops: 0 > > > bw_in_allowance_exceeded: 0 > > > bw_out_allowance_exceeded: 0 > > > pps_allowance_exceeded: 0 > > > conntrack_allowance_exceeded: 0 > > > linklocal_allowance_exceeded: 0 > > > rx_q0_cnt: 1 > > > rx_q0_bytes: 42 > > > rx_q0_refill_partial: 0 > > > rx_q0_bad_csum: 0 > > > rx_q0_mbuf_alloc_fail: 0 > > > rx_q0_bad_desc_num: 0 > > > rx_q0_bad_req_id: 0 > > > tx_q0_cnt: 1 > > > tx_q0_bytes: 42 > > > tx_q0_prepare_ctx_err: 0 > > > tx_q0_linearize: 0 > > > tx_q0_linearize_failed: 0 > > > tx_q0_tx_poll: 1 > > > tx_q0_doorbells: 1 > > > tx_q0_bad_req_id: 0 > > > tx_q0_available_desc: 1022 > > > ======================================================================= > > > > > > I think that you can see the regression because of the new xstats (ENI > > > limiters), which were added after DPDK v19.11 (mainline commit: > > > 45718ada5fa12619db4821646ba964a2df365c68), but I'm not sure what is > > > the reason why you can see that. > > > > > > Especially I've got few questions below. > > > > > > 1. Is the application you're using the single-process or multiprocess? > > > If so, from which process are you probing for the xstats? > > > 2. Have you tried running latest DPDK v20.11 LTS? > > > 3. What kernel module are you using (igb_uio/vfio-pci)? > > > 4. On what AWS instance type it was reproduced? > > > 5. Is the Seg Fault happening the first time you call for the xstats? > > > > > > If you've got any other information which could be useful, please > > > share, it will help us with resolving the cause of the issue. > > > > > > Thanks, > > > Michal > > > > > > >> > > > >> Regards > > > >> Amiya > > > > Try getting xstats in secondary process. > > I think that is where the bug was found. > > Thanks Stephen, indeed the issue reproduces in the secondary process. > > Basically ENA v2.2.1 is not MP aware, meaning it cannot be used safely > from the secondary process. The main obstacle is the admin queue which > is used for processing the hardware requests which can be used safely > only from the primary process. It's not strictly a bug, as we weren't > exposing 'MP Awareness' in the PMD features list, it's more like a > lack of proper MP support. Driver should report error. Not crash. Could you fix that.