Latest update is that Benoit has no access over VPN so he did try to replicate in local lab (assuming x86). I will do quick fix in CSIT. I will disable MLX driver on Taishan.
Peter Mikus Engineer - Software Cisco Systems Limited > -----Original Message----- > From: Juraj Linkeš <juraj.lin...@pantheon.tech> > Sent: Tuesday, December 3, 2019 3:09 PM > To: Benoit Ganne (bganne) <bga...@cisco.com>; Peter Mikus -X (pmikus - > PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; Maciek Konstantynowicz > (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit- > d...@lists.fd.io > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > Subject: RE: CSIT - performance tests failing on Taishan > > Hi Benoit, > > Do you have access to FD.io lab? The Taishan servers are in it. > > Juraj > > -----Original Message----- > From: Benoit Ganne (bganne) <bga...@cisco.com> > Sent: Friday, November 29, 2019 4:03 PM > To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > <pmi...@cisco.com>; Juraj Linkeš <juraj.lin...@pantheon.tech>; Maciek > Konstantynowicz (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp- > d...@lists.fd.io>; csit-...@lists.fd.io > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > Subject: RE: CSIT - performance tests failing on Taishan > > Hi Peter, can I get access to the setup to investigate? > > Best > ben > > > -----Original Message----- > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > > <pmi...@cisco.com> > > Sent: vendredi 29 novembre 2019 11:08 > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > > <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; > > csit-...@lists.fd.io > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > > lijian.zh...@arm.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com> > > Subject: RE: CSIT - performance tests failing on Taishan > > > > +dev lists > > > > Peter Mikus > > Engineer - Software > > Cisco Systems Limited > > > > > -----Original Message----- > > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > > > Sent: Friday, November 29, 2019 11:06 AM > > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > > > <mkons...@cisco.com> > > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > > > lijian.zh...@arm.com; Honnappa Nagarahalli > > <honnappa.nagaraha...@arm.com> > > > Subject: CSIT - performance tests failing on Taishan > > > > > > Hello all, > > > > > > In CSIT we are observing the issue with Taishan boxes where > > > performance tests are failing. > > > There has been long misleading discussion about the potential issue, > > root > > > cause and what workaround to apply. > > > > > > Issue > > > ===== > > > VPP is being restarted after an attempt to read "show pci" over the > > > socket on '/run/vpp/cli.sock' > > > in a loop. This loop test is executed in CSIT towards VPP with > > > default startup configuration via command below to check if VPP is > > > really UP and responding. > > > > > > How to reproduce > > > ================ > > > for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX- > > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > > > The same can be reproduced using vppctl: > > > > > > for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo > > > netstat - > > ap > > > | grep vpp; done > > > > > > To eliminate the issue with test itself I used "show version" > > > for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX- > > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > > > This test is passing with "show version" and VPP is not restarted. > > > > > > > > > Root cause > > > ========== > > > The root cause seems to be: > > > > > > Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault. > > > 0x0000ffffbeb4f3d0 in format_vlib_pci_vpd ( > > > s=0xffff7fabe830 "0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 > > > mlx5_core CX4121A - ConnectX-4 LX SFP28", args > > > =<optimized out>) > > > at /w/workspace/vpp-arm-merge-master- > > > ubuntu1804/src/vlib/pci/pci.c:230 > > > 230 /w/workspace/vpp-arm-merge-master- > ubuntu1804/src/vlib/pci/pci.c: > > > No such file or directory. > > > (gdb) > > > Continuing. > > > > > > Thread 1 "vpp_main" received signal SIGABRT, Aborted. > > > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or > directory. > > > (gdb) > > > > > > > > > Issue started after MLX was installed into Taishan. > > > > > > > > > @Benoit Ganne (bganne) can you please help fixing the root cause? > > > > > > Thank you. > > > > > > Peter Mikus > > > Engineer - Software > > > Cisco Systems Limited
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14765): https://lists.fd.io/g/vpp-dev/message/14765 Mute This Topic: https://lists.fd.io/mt/64332740/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-