Hi Juraj, Could you please try the attached patch? Thanks. -----Original Message----- From: Juraj Linkeš <juraj.lin...@pantheon.tech> Sent: 2019年12月4日 18:12 To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; Maciek Konstantynowicz (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit-...@lists.fd.io Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <vrpo...@cisco.com>; Lijian Zhang (Arm Technology China) <lijian.zh...@arm.com>; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> Subject: RE: CSIT - performance tests failing on Taishan
Hi Ben, Lijian, Honnappa, The issue is reproducible after the second invocation of show pci: DBGvpp# show pci Address Sock VID:PID Link Speed Driver Product Name Vital Product Data 0000:11:00.0 2 8086:10fb 5.0 GT/s x8 ixgbe 0000:11:00.1 2 8086:10fb 5.0 GT/s x8 ixgbe 0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 mlx5_core CX4121A - ConnectX-4 LX SFP28 PN: MCX4121A-ACAT_C12 EC: A1 SN: MT1745K13032 V0: 0x 50 43 49 65 47 65 6e 33 ... RV: 0x ba 0002:f9:00.1 0 15b3:1015 8.0 GT/s x8 mlx5_core CX4121A - ConnectX-4 LX SFP28 PN: MCX4121A-ACAT_C12 EC: A1 SN: MT1745K13032 V0: 0x 50 43 49 65 47 65 6e 33 ... RV: 0x ba DBGvpp# show pci Address Sock VID:PID Link Speed Driver Product Name Vital Product Data 0000:11:00.0 2 8086:10fb 5.0 GT/s x8 ixgbe 0000:11:00.1 2 8086:10fb 5.0 GT/s x8 ixgbe Aborted Makefile:546: recipe for target 'run' failed make: *** [run] Error 134 I've tried to do some debugging with a debug build: (gdb) bt ... #5 0x0000ffffbe775000 in format_vlib_pci_vpd (s=0xffff7efa9e80 "0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 mlx5_core CX4121A - ConnectX-4 LX SFP28", args=0xffff7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230 ... (gdb) frame 5 #5 0x0000ffffbe775000 in format_vlib_pci_vpd (s=0xffff7efa9e80 "0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 mlx5_core CX4121A - ConnectX-4 LX SFP28", args=0xffff7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230 230 else if (*(u16 *) & data[p] == *(u16 *) id) (gdb) info locals data = 0xffff7efa9cd0 "PN\025MCX4121A-ACAT_C12 EC\002A1SN\030MT1745K13032", ' ' <repeats 12 times>, "V0\023PCIeGen3 x8 RV\001\272" id = 0xaaa8000000000000 <error: Cannot access memory at address 0xaaa8000000000000> indent = 91 string_types = {0xffffbe7b7950 "PN", 0xffffbe7b7958 "EC", 0xffffbe7b7960 "SN", 0xffffbe7b7968 "MN", 0x0} p = 0 first_line = 1 Looks like something went wrong with the 'id' variable. More is attached. As a temporary workaround (until we fix this), we're going to replace show pci with something else in CSIT: https://gerrit.fd.io/r/c/csit/+/23785 Juraj -----Original Message----- From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) <pmi...@cisco.com> Sent: Tuesday, December 3, 2019 3:58 PM To: Juraj Linkeš <juraj.lin...@pantheon.tech>; Benoit Ganne (bganne) <bga...@cisco.com>; Maciek Konstantynowicz (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit-...@lists.fd.io Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> Subject: RE: CSIT - performance tests failing on Taishan Latest update is that Benoit has no access over VPN so he did try to replicate in local lab (assuming x86). I will do quick fix in CSIT. I will disable MLX driver on Taishan. Peter Mikus Engineer - Software Cisco Systems Limited > -----Original Message----- > From: Juraj Linkeš <juraj.lin...@pantheon.tech> > Sent: Tuesday, December 3, 2019 3:09 PM > To: Benoit Ganne (bganne) <bga...@cisco.com>; Peter Mikus -X (pmikus - > PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; Maciek Konstantynowicz > (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit- > d...@lists.fd.io > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > Subject: RE: CSIT - performance tests failing on Taishan > > Hi Benoit, > > Do you have access to FD.io lab? The Taishan servers are in it. > > Juraj > > -----Original Message----- > From: Benoit Ganne (bganne) <bga...@cisco.com> > Sent: Friday, November 29, 2019 4:03 PM > To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > <pmi...@cisco.com>; Juraj Linkeš <juraj.lin...@pantheon.tech>; Maciek > Konstantynowicz (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp- > d...@lists.fd.io>; csit-...@lists.fd.io > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > Subject: RE: CSIT - performance tests failing on Taishan > > Hi Peter, can I get access to the setup to investigate? > > Best > ben > > > -----Original Message----- > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > > <pmi...@cisco.com> > > Sent: vendredi 29 novembre 2019 11:08 > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > > <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; > > csit-...@lists.fd.io > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > > lijian.zh...@arm.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com> > > Subject: RE: CSIT - performance tests failing on Taishan > > > > +dev lists > > > > Peter Mikus > > Engineer - Software > > Cisco Systems Limited > > > > > -----Original Message----- > > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > > > Sent: Friday, November 29, 2019 11:06 AM > > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > > > <mkons...@cisco.com> > > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > > > lijian.zh...@arm.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com> > > > Subject: CSIT - performance tests failing on Taishan > > > > > > Hello all, > > > > > > In CSIT we are observing the issue with Taishan boxes where > > > performance tests are failing. > > > There has been long misleading discussion about the potential > > > issue, > > root > > > cause and what workaround to apply. > > > > > > Issue > > > ===== > > > VPP is being restarted after an attempt to read "show pci" over > > > the socket on '/run/vpp/cli.sock' > > > in a loop. This loop test is executed in CSIT towards VPP with > > > default startup configuration via command below to check if VPP is > > > really UP and responding. > > > > > > How to reproduce > > > ================ > > > for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX- > > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > > > The same can be reproduced using vppctl: > > > > > > for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo > > > netstat - > > ap > > > | grep vpp; done > > > > > > To eliminate the issue with test itself I used "show version" > > > for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX- > > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > > > This test is passing with "show version" and VPP is not restarted. > > > > > > > > > Root cause > > > ========== > > > The root cause seems to be: > > > > > > Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault. > > > 0x0000ffffbeb4f3d0 in format_vlib_pci_vpd ( > > > s=0xffff7fabe830 "0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 > > > mlx5_core CX4121A - ConnectX-4 LX SFP28", args > > > =<optimized out>) > > > at /w/workspace/vpp-arm-merge-master- > > > ubuntu1804/src/vlib/pci/pci.c:230 > > > 230 /w/workspace/vpp-arm-merge-master- > ubuntu1804/src/vlib/pci/pci.c: > > > No such file or directory. > > > (gdb) > > > Continuing. > > > > > > Thread 1 "vpp_main" received signal SIGABRT, Aborted. > > > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or > directory. > > > (gdb) > > > > > > > > > Issue started after MLX was installed into Taishan. > > > > > > > > > @Benoit Ganne (bganne) can you please help fixing the root cause? > > > > > > Thank you. > > > > > > Peter Mikus > > > Engineer - Software > > > Cisco Systems Limited IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
show-pci.diff
Description: show-pci.diff
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14808): https://lists.fd.io/g/vpp-dev/message/14808 Mute This Topic: https://lists.fd.io/mt/64332740/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-