Hi Peter, can I get access to the setup to investigate? Best ben
> -----Original Message----- > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > <pmi...@cisco.com> > Sent: vendredi 29 novembre 2019 11:08 > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit-...@lists.fd.io > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > lijian.zh...@arm.com; Honnappa Nagarahalli <honnappa.nagaraha...@arm.com> > Subject: RE: CSIT - performance tests failing on Taishan > > +dev lists > > Peter Mikus > Engineer - Software > Cisco Systems Limited > > > -----Original Message----- > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) > > Sent: Friday, November 29, 2019 11:06 AM > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan) > > <mkons...@cisco.com> > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>; > > lijian.zh...@arm.com; Honnappa Nagarahalli > <honnappa.nagaraha...@arm.com> > > Subject: CSIT - performance tests failing on Taishan > > > > Hello all, > > > > In CSIT we are observing the issue with Taishan boxes where performance > > tests are failing. > > There has been long misleading discussion about the potential issue, > root > > cause and what workaround to apply. > > > > Issue > > ===== > > VPP is being restarted after an attempt to read "show pci" over the > > socket on '/run/vpp/cli.sock' > > in a loop. This loop test is executed in CSIT towards VPP with default > > startup configuration via command below to check if VPP is really UP and > > responding. > > > > How to reproduce > > ================ > > for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX- > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > The same can be reproduced using vppctl: > > > > for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo netstat - > ap > > | grep vpp; done > > > > To eliminate the issue with test itself I used "show version" > > for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX- > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done > > > > This test is passing with "show version" and VPP is not restarted. > > > > > > Root cause > > ========== > > The root cause seems to be: > > > > Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault. > > 0x0000ffffbeb4f3d0 in format_vlib_pci_vpd ( > > s=0xffff7fabe830 "0002:f9:00.0 0 15b3:1015 8.0 GT/s x8 > > mlx5_core CX4121A - ConnectX-4 LX SFP28", args > > =<optimized out>) > > at /w/workspace/vpp-arm-merge-master- > > ubuntu1804/src/vlib/pci/pci.c:230 > > 230 /w/workspace/vpp-arm-merge-master-ubuntu1804/src/vlib/pci/pci.c: > > No such file or directory. > > (gdb) > > Continuing. > > > > Thread 1 "vpp_main" received signal SIGABRT, Aborted. > > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51 > > 51 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. > > (gdb) > > > > > > Issue started after MLX was installed into Taishan. > > > > > > @Benoit Ganne (bganne) can you please help fixing the root cause? > > > > Thank you. > > > > Peter Mikus > > Engineer - Software > > Cisco Systems Limited
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#14736): https://lists.fd.io/g/vpp-dev/message/14736 Mute This Topic: https://lists.fd.io/mt/64332740/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-