Latest update is that Benoit has no access over VPN so he did try to replicate 
in local lab (assuming x86).
I will do quick fix in CSIT. I will disable MLX driver on Taishan.

Peter Mikus
Engineer - Software
Cisco Systems Limited

> -----Original Message-----
> From: Juraj Linkeš <juraj.lin...@pantheon.tech>
> Sent: Tuesday, December 3, 2019 3:09 PM
> To: Benoit Ganne (bganne) <bga...@cisco.com>; Peter Mikus -X (pmikus -
> PANTHEON TECH SRO at Cisco) <pmi...@cisco.com>; Maciek Konstantynowicz
> (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>; csit-
> d...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)
> <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>
> Subject: RE: CSIT - performance tests failing on Taishan
> 
> Hi Benoit,
> 
> Do you have access to FD.io lab? The Taishan servers are in it.
> 
> Juraj
> 
> -----Original Message-----
> From: Benoit Ganne (bganne) <bga...@cisco.com>
> Sent: Friday, November 29, 2019 4:03 PM
> To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)
> <pmi...@cisco.com>; Juraj Linkeš <juraj.lin...@pantheon.tech>; Maciek
> Konstantynowicz (mkonstan) <mkons...@cisco.com>; vpp-dev <vpp-
> d...@lists.fd.io>; csit-...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)
> <vrpo...@cisco.com>; lijian.zh...@arm.com; Honnappa Nagarahalli
> <honnappa.nagaraha...@arm.com>
> Subject: RE: CSIT - performance tests failing on Taishan
> 
> Hi Peter, can I get access to the setup to investigate?
> 
> Best
> ben
> 
> > -----Original Message-----
> > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)
> > <pmi...@cisco.com>
> > Sent: vendredi 29 novembre 2019 11:08
> > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš
> > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan)
> > <mkons...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io>;
> > csit-...@lists.fd.io
> > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)
> > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>;
> > lijian.zh...@arm.com; Honnappa Nagarahalli
> > <honnappa.nagaraha...@arm.com>
> > Subject: RE: CSIT - performance tests failing on Taishan
> >
> > +dev lists
> >
> > Peter Mikus
> > Engineer - Software
> > Cisco Systems Limited
> >
> > > -----Original Message-----
> > > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)
> > > Sent: Friday, November 29, 2019 11:06 AM
> > > To: Benoit Ganne (bganne) <bga...@cisco.com>; Juraj Linkeš
> > > <juraj.lin...@pantheon.tech>; Maciek Konstantynowicz (mkonstan)
> > > <mkons...@cisco.com>
> > > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco)
> > > <vrpo...@cisco.com>; Benoit Ganne (bganne) <bga...@cisco.com>;
> > > lijian.zh...@arm.com; Honnappa Nagarahalli
> > <honnappa.nagaraha...@arm.com>
> > > Subject: CSIT - performance tests failing on Taishan
> > >
> > > Hello all,
> > >
> > > In CSIT we are observing the issue with Taishan boxes where
> > > performance tests are failing.
> > > There has been long misleading discussion about the potential issue,
> > root
> > > cause and what workaround to apply.
> > >
> > > Issue
> > > =====
> > > VPP is being restarted after an attempt to read "show pci" over the
> > > socket on '/run/vpp/cli.sock'
> > > in a loop. This loop test is executed in CSIT towards VPP with
> > > default startup configuration via command below to check if VPP is
> > > really UP and responding.
> > >
> > > How to reproduce
> > > ================
> > > for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX-
> > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> > >
> > > The same can be reproduced using vppctl:
> > >
> > > for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo
> > > netstat -
> > ap
> > > | grep vpp; done
> > >
> > > To eliminate the issue with test itself I used "show version"
> > > for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX-
> > > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> > >
> > > This test is passing with "show version" and VPP is not restarted.
> > >
> > >
> > > Root cause
> > > ==========
> > > The root cause seems to be:
> > >
> > > Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
> > > 0x0000ffffbeb4f3d0 in format_vlib_pci_vpd (
> > >     s=0xffff7fabe830 "0002:f9:00.0   0  15b3:1015   8.0 GT/s x8
> > > mlx5_core       CX4121A - ConnectX-4 LX SFP28", args
> > > =<optimized out>)
> > >     at /w/workspace/vpp-arm-merge-master-
> > > ubuntu1804/src/vlib/pci/pci.c:230
> > > 230     /w/workspace/vpp-arm-merge-master-
> ubuntu1804/src/vlib/pci/pci.c:
> > > No such file or directory.
> > > (gdb)
> > > Continuing.
> > >
> > > Thread 1 "vpp_main" received signal SIGABRT, Aborted.
> > > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > > 51      ../sysdeps/unix/sysv/linux/raise.c: No such file or
> directory.
> > > (gdb)
> > >
> > >
> > > Issue started after MLX was installed into Taishan.
> > >
> > >
> > > @Benoit Ganne (bganne) can you please help fixing the root cause?
> > >
> > > Thank you.
> > >
> > > Peter Mikus
> > > Engineer - Software
> > > Cisco Systems Limited

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14765): https://lists.fd.io/g/vpp-dev/message/14765
Mute This Topic: https://lists.fd.io/mt/64332740/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-
  • ... Peter Mikus via Lists.Fd.Io
    • ... Benoit Ganne (bganne) via Lists.Fd.Io
      • ... Juraj Linkeš
        • ... Peter Mikus via Lists.Fd.Io
          • ... Juraj Linkeš
            • ... Lijian Zhang
              • ... Juraj Linkeš
                • ... Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io

Reply via email to