On 2/15/2022 4:24 PM, Vipul Ashri wrote: > On 11/22/2021 3:53 PM, Gaëtan Rivet wrote: >> Could describe in more detail the execution? >> In particular, setting the EAL log-level to debug with the option: >> ' --log-level pmd.net.failsafe:debug ' >> for example while using testpmd or your DPDK app. >> It should show ethdev level accesses to the sub-devices, and error values. >> >> Best regards, > > Hi Gaetan > > Sorry for very late reply, we were busy working on 21.11 integration. > > Although we have adopted this code internally for us but I am sharing the > patch to opensource for community benefit. > > This is specific case of AZURE setup with our very customized complex > environment. > > Let me share the logs with trace-back first > ================================================================================================================== > SECONDARY PROCESS > timestamp=1633598184 > TCZ0.0.0 Cycle 152 (Build 1832) > signal 11 (Segmentation fault), address is 0x31117bbce6c8 from 0x47d08b1 > > [bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv (+ 0xf4) - sp = > 0x7fffef3fd110, ip = 0x3acdc54 > [bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv (+ 0x159) - sp = > 0x7fffef3fdc20, ip = 0x3acdf29 > [bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+ 0x104) - sp = > 0x7fffef3fdf00, ip = 0x274d4c4 > [bt]: ( 4) _L_unlock_18 (+ 0x2c) - sp = > 0x7fffef3fdf80, ip = 0x7ffff7bce630 > [bt]: ( 5) rte_eth_dev_attach_secondary (+ 0x21) - sp = > 0x7fffef3fec50, ip = 0x47d08b1 > [bt]: ( 6) rte_eth_from_ring (+ 0x3438) - sp = > 0x7fffef3fec80, ip = 0x4e49da8 > [bt]: ( 7) _init (+ 0xa1b8) - sp = > 0x7fffef3feec0, ip = 0x12e0368 > [bt]: ( 8) local_dev_probe (+ 0xac) - sp = > 0x7fffef3feef0, ip = 0x478fd2c > [bt]: ( 9) rte_uuid_unparse (+ 0x274) - sp = > 0x7fffef3fef30, ip = 0x47a3e94 > [bt]: (10) rte_eal_vfio_get_vf_token (+ 0xd7) - sp = > 0x7fffef3ff110, ip = 0x47b04b7 > [bt]: (11) eal_hugepage_info_read (+ 0x602) - sp = > 0x7fffef3ff170, ip = 0x47b2cd2 > [bt]: (12) start_thread (+ 0xc5) - sp = > 0x7fffef3ff220, ip = 0x7ffff7bc6ea5 > [bt]: (13) clone (+ 0x6d) - sp = > 0x7fffef3ff2c0, ip = 0x7ffff004096d > EAL: Fail to recv reply for request > /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request > EAL: Cannot send request to primary > EAL: Failed to send hotplug request to primary > net_failsafe: Failed to probe devargs net_tap_vsc0 > EAL: Fail to recv reply for request > /var/run/dpdk/oracusbc/mp_socket:eal_dev_mp_request > EAL: Cannot send request to primary > EAL: Failed to send hotplug request to primary > net_failsafe: Failed to probe devargs net_tap_vsc1 > EAL: No legacy callbacks, legacy socket not created > EAL: Drop mp reply: eal_dev_mp_request > ================================================================================================================== > PRIMARY PROCESS > timestamp=1633598196 > TCZ0.0.0 Cycle 152 (Build 1832) > signal 11 (Segmentation fault), address is 0x38 from 0x9d8fbe > > [bt]: ( 1) _Z18snprintf_backtraceRPciiP9siginfo_tPv (+ 0xf4) - sp = > 0x7fffecf41150, ip = 0x100dd44 > [bt]: ( 2) _Z13crit_err_hdlriP9siginfo_tPv (+ 0x159) - sp = > 0x7fffecf41c60, ip = 0x100e019 > [bt]: ( 3) _ZN13SignalAdapter12handleSignalEiP9siginfo_tPv (+ 0x104) - sp = > 0x7fffecf41f40, ip = 0xff4894 > [bt]: ( 4) _L_unlock_18 (+ 0x2c) - sp = > 0x7fffecf41fc0, ip = 0x7ffff61d9630 > [bt]: ( 5) failsafe_eth_dev_close (+ 0x65e) - sp = > 0x7fffecf42c90, ip = 0x9d8fbe > [bt]: ( 6) rte_eth_link_get_nowait (+ 0x6a) - sp = > 0x7fffecf42cf0, ip = 0x62fa0a > [bt]: ( 7) _ZN11StatsThread9statsLoopEP10CustomObject (+ 0x33e) - sp = > 0x7fffecf42d20, ip = 0xedea2e > [bt]: ( 8) _ZN11StatsThread9statsLoopEP10CustomObject (+ 0x8dc) - sp = > 0x7fffecf42d90, ip = 0xedefcc > [bt]: ( 9) ThreadFunction (+ 0xe6) - sp = > 0x7fffecf42db0, ip = 0x7ffff6b477e6 > [bt]: (10) start_thread (+ 0xc5) - sp = > 0x7fffecf42de0, ip = 0x7ffff61d1ea5 > [bt]: (11) clone (+ 0x6d) - sp = > 0x7fffecf42e80, ip = 0x7ffff0a6b96d > > ================================================================================================================== > DPDK 20.11.2 > core mask is 00000000000000000000000000004000 > DPDK Custom Process initialized with 2 ports > the min max TxQ is maxTxQueues 16 > Using 1 RxQs for port 0 (# F-core=1) > Using 1 RxQs for port 3 (# F-core=1) > Core 14 (port=0, rxQ=0) kni_ring=(nil) > Core 14 (port=3, rxQ=0) kni_ring=(nil) > Core 14 txN = 0 > Thread for core 14 using ring from usbc of 0x31117b29bb00 > Ring size must be powers of 2, adjusting from 8196 to 16384 > Thread for core 14 using ring from MEDIA of 0x31117b27b840 > Encaps Memory Zone= 48044 sizeof encaps = 60 > Trace Memory Zone= 272 > Policy Memory Zone= 8196 sizeof policy = 240 > link status for port 0 is 1 > link status for port 3 is 1 > PORT 0 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, > driver_type = 16) > PORT 0 is polling for link-change, interrupts disabled > [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File > exists > [DPDK] net_failsafe: Failed to create flow on sub_device 1 > add_flow(): create() fails for port 0; Reason: overlapping rules or Kernel > too old for flower support > Error adding broadcast flow > PORT 3 supports 16 rx queues and 16 tx queues (driver_name = net_failsafe, > driver_type = 16) > PORT 3 is polling for link-change, interrupts disabled > [DPDK] EAL: Failed to hotplug add device on primary > [DPDK] tap_flow_create(): Kernel refused TC filter rule creation (17): File > exists > [DPDK] net_failsafe: Failed to create flow on sub_device 1 > add_flow(): create() fails for port 3; Reason: overlapping rules or Kernel > too old for flower support > Error adding broadcast flow > Cmd Thread is available > Capture object initialized > init :Stats Thread is available > ifLinkUpdate: Sending OperStatus for port=0 stat=1 > ifLinkUpdate: Port 0 Link Change - speed 40000 Mbps - full-duplex > [DPDK] EAL: Fail to recv reply for request > /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request > [DPDK] EAL: rte_mp_request_sync failed > [DPDK] EAL: Failed to send hotplug request to secondary > [DPDK] EAL: Fail to recv reply for request > /var/run/dpdk/oracusbc/mp_socket_2934_298e9db8d1:eal_dev_mp_request > [DPDK] EAL: rte_mp_request_sync failed > [DPDK] EAL: Failed to hotplug add device on primary > [DPDK] Invalid port_id=2 > [DPDK] net_failsafe: Operation rte_eth_stats_get failed for sub_device 1 with > error -19 > > There is some race at secondary process and primary got crashed because its > data-structures and partially filled. > Let me know if you need GDB analysis, I can share with next reply if you are > still unsatisfied. GDB analysis will be bigger. > Thanks! > >
Hi Gaëtan, This is a very old patch, I don't know if it is still valid or if Vipul still pursues the issue, but do you need more data or do you have comment on how to proceed?