[vpp-dev] Port mirroring support in vpp

2018-01-17 Thread Juraj Linkeš
Hi VPP devs,

I'm trying to figure out whether it's possible to set up port mirroring on a 
vhost-user port in VPP. The case I'm trying to make work is simple: I have 
traffic between two vms (using vhost-user ports) and I want to listen to that 
traffic, replicate it and send it somewhere else (to an interface, but 
preferably an ip).

I've looked into what's available in VPP and there is some support for SPAN, 
but doesn't seem to work with vhost-user interfaces (I wasn't able to configure 
it). In fact, it only seems to be configurable on physical interfaces. Is this 
accurate?

Then there are clis for lawful intercept (set li), but the configuration 
doesn't seem to do anything. Is this supported?

Is there some other way to achieve port mirroring on vhost-user interfaces in 
case the two above are not supported? It can be any unwieldy/hacky way (maybe 
setting something up with multicast?).

Thanks,
Juraj
___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] Port mirroring support in vpp

2018-01-18 Thread Juraj Linkeš
Hi John,

Thanks for the summary. I’ve been using 1710 when I wrote the e-mail, but I’ve 
tried 1801 and I could configure span on a veth interface (that’s my setup for 
now), but I didn’t see any traffic on the destination port (I tried loopback 
bvi and an L2 and L3 physical interface as destinations) - nothing in show 
trace and the interface counters didn’t go up. How do I verify that the traffic 
is mirrored onto the destination port? Is there some constraint on what the 
destination port can be?

Thanks,
Juraj

From: John Lo (loj) [mailto:l...@cisco.com]
Sent: Thursday, January 18, 2018 3:20 AM
To: Damjan Marion (damarion) ; Juraj Linkeš 

Cc: vpp-dev@lists.fd.io
Subject: RE: [vpp-dev] Port mirroring support in vpp

For VPP 18.01 and master, SPAN has been enhanced to allow port mirroring for 
interface in L2 mode such as ones in bridge domains. There is a “L2” argument 
added to the SPAN CLI/API which allow any interface, including vHost, to have 
packet replicated on its L2 input and/or output paths and be sent to the 
specified destination interface.

The CLI syntax for SPAN is now:
DBGvpp# set int span ?
  set interface span … set interface span  [l2] {disable | destination 
 [both|rx|tx]}

If you specify the “l2” keyword, packet replication will be performed on L2 
input and/or output packets on the specified interface. It should work for any 
interface in any bridge domain except BVI. For the BVI, SPAN can only replicate 
L2 input (and not output) packets.

Regards,
John

From: vpp-dev-boun...@lists.fd.io<mailto:vpp-dev-boun...@lists.fd.io> 
[mailto:vpp-dev-boun...@lists.fd.io] On Behalf Of Damjan Marion (damarion)
Sent: Wednesday, January 17, 2018 8:17 PM
To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Port mirroring support in vpp

Have you tried with SPAN?

On 17 Jan 2018, at 10:07, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi VPP devs,

I’m trying to figure out whether it’s possible to set up port mirroring on a 
vhost-user port in VPP. The case I’m trying to make work is simple: I have 
traffic between two vms (using vhost-user ports) and I want to listen to that 
traffic, replicate it and send it somewhere else (to an interface, but 
preferably an ip).

I’ve looked into what’s available in VPP and there is some support for SPAN, 
but doesn’t seem to work with vhost-user interfaces (I wasn’t able to configure 
it). In fact, it only seems to be configurable on physical interfaces. Is this 
accurate?

Then there are clis for lawful intercept (set li), but the configuration 
doesn’t seem to do anything. Is this supported?

Is there some other way to achieve port mirroring on vhost-user interfaces in 
case the two above are not supported? It can be any unwieldy/hacky way (maybe 
setting something up with multicast?).

Thanks,
Juraj
___
vpp-dev mailing list
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
https://lists.fd.io/mailman/listinfo/vpp-dev

___
vpp-dev mailing list
vpp-dev@lists.fd.io
https://lists.fd.io/mailman/listinfo/vpp-dev

Re: [vpp-dev] VPP tap interface issue on Arm servers

2019-06-20 Thread Juraj Linkeš
I can't speak for Lijian, but I ran into this issue on a server with vhost-net 
support:
jlinkes@s27-t13-sut1:~/fdio/csit$ grep VHOST /boot/config-4.15.0-46-generic
CONFIG_VHOST_RING=m
CONFIG_VHOST_NET=m
CONFIG_VHOST_SCSI=m
CONFIG_VHOST_VSOCK=m
CONFIG_VHOST=m
# CONFIG_VHOST_CROSS_ENDIAN_LEGACY is not set

Then in my test container (it's also reproducible outside of container):
root@3f230f2af45f:~/download_dir# lsmod | grep vhost
vhost_net  24576  1
vhost  61440  1 vhost_net
tap28672  1 vhost_net

The server is using a standard Ubuntu18.04 distro and I think Lijian is using 
something similar, if not the same.

Juraj

-Original Message-
From: Benoit Ganne (bganne)  
Sent: Wednesday, June 19, 2019 10:04 AM
To: Lijian Zhang ; Dave Barach (dbarach) 
; Damjan Marion 
Cc: vpp-dev@lists.fd.io; Juraj Linkeš 
Subject: RE: [vpp-dev] VPP tap interface issue on Arm servers

VPP TAP code is using virtio rings for datapath (vhost-net), not read/write on 
fd as your code does.
This is for performance reasons, issuing a syscall for every packet is very 
slow.
As mentioned by Damjan, did you check if vhost-net is supported on your kernel?

ben

> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Lijian 
> Zhang
> Sent: mercredi 19 juin 2019 09:13
> To: Dave Barach (dbarach) ; Damjan Marion 
> 
> Cc: vpp-dev@lists.fd.io; Juraj Linkeš 
> Subject: Re: [vpp-dev] VPP tap interface issue on Arm servers
> 
> Thanks Dave and Damjan for your time investigating this issue.
> 
> We can reproduce the issue on several Arm servers, ThunderX, 
> ThunderX2, Qualcomm, etc.
> 
> 
> 
> On ThunderX2 (Ubuntu 18.04, kernel-4.18.0), I verified tap/tun with 
> attached code and below commands, and seems tap/tun is working on Arm 
> servers, but still there’s the issue with tap interface in VPP.
> 
> We are trying to understand the tap code in VPP.
> 
> 
> 
> Any suggestions on helping root cause the issue are appreciated.
> 
> 
> 
> gcc taptun.c -o taptun
> 
> sudo ./taptun
> 
> 
> 
> sudo ip a add 16.0.10.1/24 dev tun0
> 
> sudo ip link set tun0 up
> 
> ping 16.0.10.2
> 
> 
> 
> Thanks.
> 
> 
> 
> From: Dave Barach (dbarach) 
> Sent: 2019年6月18日 23:54
> To: Damjan Marion 
> Cc: Lijian Zhang (Arm Technology China) ; vpp- 
> d...@lists.fd.io
> Subject: RE: [vpp-dev] VPP tap interface issue on Arm servers
> 
> 
> 
> Ack.
> 
> 
> 
> On the [single remaining] 18.04 LTS ThunderX system – 10.30.51.65 - 
> vpp manages to create the Linux interface, configures its IP address, 
> and creates plausible-looking linux-side routing table entries.
> 
> 
> 
> Clue #1: pinging an IP address on correct subnet from Linux results in 
> zero packets transmitted on the vhost interface.
> 
> 
> 
> This isn’t a vpp problem AFAICT.
> 
> 
> 
> HTH... Dave
> 
> 
> 
> From: Damjan Marion mailto:dmar...@me.com> >
> Sent: Tuesday, June 18, 2019 11:35 AM
> To: Dave Barach (dbarach)  <mailto:dbar...@cisco.com> >
> Cc: Lijian Zhang mailto:lijian.zh...@arm.com> 
> >; vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>
> Subject: Re: [vpp-dev] VPP tap interface issue on Arm servers
> 
> 
> 
> 
> 
> 
> 
> Vhost-net kernel module is around for years  so I will not he 
> surprised that it is simply disabled in custom kernel built for mcbin.
> 
> —
> 
> Damjan
> 
> 
> On Jun 18, 2019, at 4:41 PM, Dave Barach via Lists.Fd.Io 
> mailto:dbarach=cisco@lists.fd.io> 
> >
> wrote:
> 
>   Dear Lijan,
> 
> 
> 
>   The aarch64 development resources in the LF data center are in need 
> of cleanup. I tried to use fdio-cavium5 @ 10.30.51.66 (Ubuntu1804), 
> but I find that the login credentials have been changed.
> 
> 
> 
>   I finally managed to gain access to fdio-mcbin3 @ 10.30.51.43 
> (Ubuntu1604). It is in fact running Ubuntu 16.04, which is unsupported 
> at this point. The Linux 4.4 kernel can be expected to cause trouble.
> 
> 
> 
>   I had difficulty building a master/latest vpp image due to several 
> warnings not seen elsewhere. Since we validate every patch for 
> aarch64, it’s likely a case of tool chain bit rot. Anyhow, I finally 
> managed to build an image.
> 
> 
> 
>   Here’s what I see:
> 
> 
> 
>   DBGvpp# create tap host-if-name lstack host-ip4-addr 192.168.10.2/24
> 
>   create tap: ioctl(VHOST_NET_SET_BACKEND): Bad address
> 
> 
> 
>   This error comes from virtio_vring_init (...), and seems completely 
> consistent with running over an old kernel, instead of a 4.15 kernel. 
> This may or may not have anything to 

[vpp-dev] VPP Cross compilation (x86_64 -> aarch64)

2019-08-02 Thread Juraj Linkeš
Hi VPP Devs,

I'm trying to understand how to implement support for cross compilation for 
aarch64. I've put together some changes that are finally cross compiling DPDK 
(https://gerrit.fd.io/r/#/c/vpp/+/21035/), but the compilation is failing when 
it doesn't find openssl headers:
In file included from 
/home/jlinkes/vpp/build-root/build-aarch64-aarch64/external/dpdk-19.05/drivers/crypto/qat/qat_sym.c:5:0:
/usr/include/openssl/evp.h:13:11: fatal error: openssl/opensslconf.h: No such 
file or directory
# include 
   ^~~
compilation terminated.

I've added locally cross compiled numalib to cflags and ldflags just to see how 
far I'd go with compilation. It seems that I'll need to do the same for not 
only openssl, but also the other dpdk dependencies (if I understand it 
correctly, I'll need at least ipseb-mb nasm) and possibly VPP dependencies.

I've seen some e-mails suggesting that I should produce my own 
platforms/.mk file, but not much about any other steps. I also see 
the CROSS_TOOLS var in build-root/Makefile (and the corresponding 
$(PLATFORM)_cross_tools), but I wasn't able to figure out how to use those - is 
that something I should look into? What are the actual dependencies that *need* 
to be cross compiled before attempting to cross compile vpp/dpdk? I'd like to 
understand a bit more about this before I attempt to possibly fit a circle into 
a square-shaped hole.

The goal here is to build VPP and DPDK (I didn't see an option in the root 
Makefile that would build just VPP without DPDK, but we want to support both) 
for generic armv8 linux target.

I'd appreciate it if someone looked at my WIP patch and told me what I'm doing 
wrong and what's missing.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#13655): https://lists.fd.io/g/vpp-dev/message/13655
Mute This Topic: https://lists.fd.io/mt/32691486/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] VPP Cross compilation (x86_64 -> aarch64)

2019-08-12 Thread Juraj Linkeš
With some help from Dave, I was able to make the cross compilation work: 
https://gerrit.fd.io/r/#/c/vpp/+/21035/

I changed a bit how the PLATFORM variable is used. I don't know how it was 
supposed to work in the past, but it seems broken to me with the current cmake 
system, so I adapted it to be more suitable to the current environment.

I also disabled the basic_system package installation, because it seemed like 
legacy code that was supplanted by make install-dep.

Let me know if any of these is off the mark. Or anything else in the patch :)

Juraj

From: Juraj Linkeš
Sent: Friday, August 2, 2019 5:05 PM
To: vpp-dev@lists.fd.io
Subject: VPP Cross compilation (x86_64 -> aarch64)

Hi VPP Devs,

I'm trying to understand how to implement support for cross compilation for 
aarch64. I've put together some changes that are finally cross compiling DPDK 
(https://gerrit.fd.io/r/#/c/vpp/+/21035/), but the compilation is failing when 
it doesn't find openssl headers:
In file included from 
/home/jlinkes/vpp/build-root/build-aarch64-aarch64/external/dpdk-19.05/drivers/crypto/qat/qat_sym.c:5:0:
/usr/include/openssl/evp.h:13:11: fatal error: openssl/opensslconf.h: No such 
file or directory
# include 
   ^~~
compilation terminated.

I've added locally cross compiled numalib to cflags and ldflags just to see how 
far I'd go with compilation. It seems that I'll need to do the same for not 
only openssl, but also the other dpdk dependencies (if I understand it 
correctly, I'll need at least ipseb-mb nasm) and possibly VPP dependencies.

I've seen some e-mails suggesting that I should produce my own 
platforms/.mk file, but not much about any other steps. I also see 
the CROSS_TOOLS var in build-root/Makefile (and the corresponding 
$(PLATFORM)_cross_tools), but I wasn't able to figure out how to use those - is 
that something I should look into? What are the actual dependencies that *need* 
to be cross compiled before attempting to cross compile vpp/dpdk? I'd like to 
understand a bit more about this before I attempt to possibly fit a circle into 
a square-shaped hole.

The goal here is to build VPP and DPDK (I didn't see an option in the root 
Makefile that would build just VPP without DPDK, but we want to support both) 
for generic armv8 linux target.

I'd appreciate it if someone looked at my WIP patch and told me what I'm doing 
wrong and what's missing.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#13710): https://lists.fd.io/g/vpp-dev/message/13710
Mute This Topic: https://lists.fd.io/mt/32691486/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

2019-10-30 Thread Juraj Linkeš
I'm all for a more generic solution. I haven't heard of using cross-arch build 
containers, so we can look into it. Any pointers would be welcome ☺

Thanks,
Juraj
From: Damjan Marion via Lists.Fd.Io 
Sent: Wednesday, October 30, 2019 2:40 PM
To: Stanislav Chlebec 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

(resending with bogus email addresses removed, added vpp-dev)

Honestly, I don't see lot of value in this kind of cross-compilation support. 
VPP today is linked against lot of shared libraries provided by the current 
distro, so effectively you can cross-compile only for same distro, same 
version, just different target cpu.

What about using cross-arch build containers instead?



On 30 Oct 2019, at 14:10, Stanislav Chlebec 
mailto:stanislav.chle...@pantheon.tech>> wrote:

Hello Dave

I miss in the building procedure make install-ext-deps.
It could have been a reason why it failed.

Prerequisites:
-You are on some x86_64 system
-Uninstall previous versions of vpp-ext-deps (it could have been 
compiled for improper platform x86_64)
-Clean VPP git repo (make clean -qfx)
-git checkout 6be55648334308d4eaa4a02143b968720bb62078
-git fetch "https://gerrit.fd.io/r/vpp"; refs/changes/35/21035/23 && git 
checkout FETCH_HEAD

Then please try:
make PLATFORM=aarch64-generic install-dep
make PLATFORM=aarch64-generic install-ext-deps
make PLATFORM=aarch64-generic pkg-deb

I repeated the procedure at my system – here is logs (if it helps...)
https://gist.github.com/stanislav-chlebec/3042a0eeb56819aea8217dfaf5e60647

Thanks
Stanislav

From: Dave Barach (Code Review) mailto:ger...@fd.io>>
Sent: Wednesday, October 30, 2019 1:00 PM
To: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>; Stanislav 
Chlebec 
mailto:stanislav.chle...@pantheon.tech>>
Cc: fd.io<http://fd.io/> JJB 
mailto:jobbuil...@projectrotterdam.info>>; 
Nitin Saxena mailto:nsax...@marvell.com>>; Vratko Polak 
mailto:vrpo...@cisco.com>>; Ed Kern 
mailto:e...@cisco.com>>
Subject: Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

Downloaded the patch, tried "make install-dep" followed by "make 
PLATFORM=aarch64-generic build". First, is that the right way to cross-compile 
an aarch64 debug binary?
Aside: "make PLATFORM=aarch64-generic install dep" fails:

.8 kB]
Ign:27 http://security.ubuntu.com/ubuntu bionic-security/universe arm64 Packages
Ign:30 http://security.ubuntu.com/ubuntu bionic-security/multiverse arm64 
Packages
Err:23 http://security.ubuntu.com/ubuntu bionic-security/main arm64 Packages
  404  Not Found [IP: 91.189.88.149 80]
Ign:24 http://security.ubuntu.com/ubuntu bionic-security/restricted arm64 
Packages
Ign:27 http://security.ubuntu.com/ubuntu bionic-security/universe arm64 Packages
Ign:30 http://security.ubuntu.com/ubuntu bionic-security/multiverse arm64 
Packages
Fetched 53.5 kB in 4s (14.2 kB/s)
Reading package lists... Done
E: Failed to fetch 
http://us.archive.ubuntu.com/ubuntu/dists/bionic/main/binary-arm64/Packages  
404  Not Found [IP: 91.189.91.14 80]
E: Failed to fetch 
http://us.archive.ubuntu.com/ubuntu/dists/bionic-updates/main/binary-arm64/Packages
  404  Not Found [IP: 91.189.91.14 80]
E: Failed to fetch 
http://security.ubuntu.com/ubuntu/dists/bionic-security/main/binary-arm64/Packages
  404  Not Found [IP: 91.189.88.149 80]
E: Failed to fetch 
http://us.archive.ubuntu.com/ubuntu/dists/bionic-backports/main/binary-arm64/Packages
  404  Not Found [IP: 91.189.91.14 80]
E: Some index files failed to download. They have been ignored, or old ones 
used instead.
Makefile:310: recipe for target 'install-dep' failed
The vpp compile attempt fails:

-- Performing Test HAVE_MEMFD_CREATE - Success
-- Performing Test HAVE_GETCPU
-- Performing Test HAVE_GETCPU - Failed
CMake Error at cmake/misc.cmake:27 (_message):
  Could NOT find OpenSSL, try to set the path to OpenSSL root folder
  in the system variable OPENSSL_ROOT_DIR (missing: OPENSSL_CRYPTO_LIBRARY)
  (found version "1.1.1d")
Enough said.
View Change<https://gerrit.fd.io/r/c/vpp/+/21035>
To view, visit change 21035<https://gerrit.fd.io/r/c/vpp/+/21035>. To 
unsubscribe, or for help writing mail filters, visit 
settings<https://gerrit.fd.io/r/settings>.
Gerrit-Project: vpp
Gerrit-Branch: master
Gerrit-Change-Id: I66cb57f60d1488a459a74964ea65f2502e4633f6
Gerrit-Change-Number: 21035
Gerrit-PatchSet: 23
Gerrit-Owner: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>
Gerrit-Assignee: Damjan Marion mailto:dmar...@me.com>>
Gerrit-Reviewer: Damjan Marion mailto:dmar...@me.com>>
Gerrit-Reviewer: Dave Barach mailto:open...@barachs.net>>
Gerrit-Reviewer: Ed Kern mailto:e...@cisco.com>>
Gerrit-Reviewer: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>
Gerrit-R

Re: [vpp-dev] CSIT - performance tests failing on Taishan

2019-12-03 Thread Juraj Linkeš
Hi Benoit,

Do you have access to FD.io lab? The Taishan servers are in it.

Juraj

-Original Message-
From: Benoit Ganne (bganne)  
Sent: Friday, November 29, 2019 4:03 PM
To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) ; 
Juraj Linkeš ; Maciek Konstantynowicz (mkonstan) 
; vpp-dev ; csit-...@lists.fd.io
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
lijian.zh...@arm.com; Honnappa Nagarahalli 
Subject: RE: CSIT - performance tests failing on Taishan

Hi Peter, can I get access to the setup to investigate?

Best
ben

> -Original Message-
> From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) 
> 
> Sent: vendredi 29 novembre 2019 11:08
> To: Benoit Ganne (bganne) ; Juraj Linkeš 
> ; Maciek Konstantynowicz (mkonstan) 
> ; vpp-dev ; 
> csit-...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) 
> ; Benoit Ganne (bganne) ; 
> lijian.zh...@arm.com; Honnappa Nagarahalli 
> 
> Subject: RE: CSIT - performance tests failing on Taishan
> 
> +dev lists
> 
> Peter Mikus
> Engineer - Software
> Cisco Systems Limited
> 
> > -Original Message-
> > From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)
> > Sent: Friday, November 29, 2019 11:06 AM
> > To: Benoit Ganne (bganne) ; Juraj Linkeš 
> > ; Maciek Konstantynowicz (mkonstan) 
> > 
> > Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) 
> > ; Benoit Ganne (bganne) ; 
> > lijian.zh...@arm.com; Honnappa Nagarahalli
> 
> > Subject: CSIT - performance tests failing on Taishan
> >
> > Hello all,
> >
> > In CSIT we are observing the issue with Taishan boxes where 
> > performance tests are failing.
> > There has been long misleading discussion about the potential issue,
> root
> > cause and what workaround to apply.
> >
> > Issue
> > =
> > VPP is being restarted after an attempt to read "show pci" over the 
> > socket on '/run/vpp/cli.sock'
> > in a loop. This loop test is executed in CSIT towards VPP with 
> > default startup configuration via command below to check if VPP is 
> > really UP and responding.
> >
> > How to reproduce
> > 
> > for i in $(seq 1 120); do echo "show pci" | sudo socat - UNIX- 
> > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> >
> > The same can be reproduced using vppctl:
> >
> > for i in $(seq 1 120); do echo "show pci" | sudo vppctl; sudo 
> > netstat -
> ap
> > | grep vpp; done
> >
> > To eliminate the issue with test itself I used "show version"
> > for i in $(seq 1 120); do echo "show version" | sudo socat - UNIX- 
> > CONNECT:/run/vpp/cli.sock; sudo netstat -ap | grep vpp; done
> >
> > This test is passing with "show version" and VPP is not restarted.
> >
> >
> > Root cause
> > ==
> > The root cause seems to be:
> >
> > Thread 1 "vpp_main" received signal SIGSEGV, Segmentation fault.
> > 0xbeb4f3d0 in format_vlib_pci_vpd (
> > s=0x7fabe830 "0002:f9:00.0   0  15b3:1015   8.0 GT/s x8
> > mlx5_core   CX4121A - ConnectX-4 LX SFP28", args
> > =)
> > at /w/workspace/vpp-arm-merge-master-
> > ubuntu1804/src/vlib/pci/pci.c:230
> > 230 /w/workspace/vpp-arm-merge-master-ubuntu1804/src/vlib/pci/pci.c:
> > No such file or directory.
> > (gdb)
> > Continuing.
> >
> > Thread 1 "vpp_main" received signal SIGABRT, Aborted.
> > __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:51
> > 51  ../sysdeps/unix/sysv/linux/raise.c: No such file or directory.
> > (gdb)
> >
> >
> > Issue started after MLX was installed into Taishan.
> >
> >
> > @Benoit Ganne (bganne) can you please help fixing the root cause?
> >
> > Thank you.
> >
> > Peter Mikus
> > Engineer - Software
> > Cisco Systems Limited

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#14764): https://lists.fd.io/g/vpp-dev/message/14764
Mute This Topic: https://lists.fd.io/mt/64332740/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] CSIT - performance tests failing on Taishan

2019-12-04 Thread Juraj Linkeš
Hi Ben, Lijian, Honnappa,

The issue is reproducible after the second invocation of show pci:
DBGvpp# show pci
Address  Sock VID:PID Link Speed   Driver  Product Name 
   Vital Product Data
:11:00.0   2  8086:10fb   5.0 GT/s x8  ixgbe
   
:11:00.1   2  8086:10fb   5.0 GT/s x8  ixgbe
   
0002:f9:00.0   0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 
LX SFP28   PN: MCX4121A-ACAT_C12

   EC: A1

   SN: MT1745K13032

   V0: 0x 50 43 49 65 47 65 6e 33 ...

   RV: 0x ba
0002:f9:00.1   0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 
LX SFP28   PN: MCX4121A-ACAT_C12

   EC: A1

   SN: MT1745K13032

   V0: 0x 50 43 49 65 47 65 6e 33 ...

   RV: 0x ba
DBGvpp# show pci
Address  Sock VID:PID Link Speed   Driver  Product Name 
   Vital Product Data
:11:00.0   2  8086:10fb   5.0 GT/s x8  ixgbe
   
:11:00.1   2  8086:10fb   5.0 GT/s x8  ixgbe
   
Aborted
Makefile:546: recipe for target 'run' failed
make: *** [run] Error 134

I've tried to do some debugging with a debug build:
(gdb) bt
...
#5  0xbe775000 in format_vlib_pci_vpd (s=0x7efa9e80 "0002:f9:00.0   
0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 LX SFP28", 
args=0x7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230
...
(gdb) frame 5
#5  0xbe775000 in format_vlib_pci_vpd (s=0x7efa9e80 "0002:f9:00.0   
0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 LX SFP28", 
args=0x7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230
230   else if (*(u16 *) & data[p] == *(u16 *) id)
(gdb) info locals
data = 0x7efa9cd0 "PN\025MCX4121A-ACAT_C12EC\002A1SN\030MT1745K13032", 
' ' , "V0\023PCIeGen3 x8RV\001\272"
id = 0xaaa8 
indent = 91
string_types = {0xbe7b7950 "PN", 0xbe7b7958 "EC", 0xbe7b7960 "SN", 
0xbe7b7968 "MN", 0x0}
p = 0
first_line = 1

Looks like something went wrong with the 'id' variable. More is attached.

As a temporary workaround (until we fix this), we're going to replace show pci 
with something else in CSIT: https://gerrit.fd.io/r/c/csit/+/23785

Juraj

-Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco)  
Sent: Tuesday, December 3, 2019 3:58 PM
To: Juraj Linkeš ; Benoit Ganne (bganne) 
; Maciek Konstantynowicz (mkonstan) ; 
vpp-dev ; csit-...@lists.fd.io
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
lijian.zh...@arm.com; Honnappa Nagarahalli 
Subject: RE: CSIT - performance tests failing on Taishan

Latest update is that Benoit has no access over VPN so he did try to replicate 
in local lab (assuming x86).
I will do quick fix in CSIT. I will disable MLX driver on Taishan.

Peter Mikus
Engineer - Software
Cisco Systems Limited

> -Original Message-
> From: Juraj Linkeš 
> Sent: Tuesday, December 3, 2019 3:09 PM
> To: Benoit Ganne (bganne) ; Peter Mikus -X (pmikus - 
> PANTHEON TECH SRO at Cisco) ; Maciek Konstantynowicz
> (mkonstan) ; vpp-dev ; csit- 
> d...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) 
> ; lijian.zh...@arm.com; Honnappa Nagarahalli 
> 
> Subject: RE: CSIT - performance tests failing on Taishan
> 
> Hi Benoit,
> 
> Do you have access to FD.io lab? The Taishan servers are in it.
> 
> Juraj
> 
> -Original Message-
> From: Benoit Ganne (bganne) 
> Sent: Friday, November 29, 2019 4:03 PM
> To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) 
> ; Juraj Linkeš ; Maciek 
> Konstantynowicz (mkonstan) ; vpp-dev  d...@lists.fd.io>; csit-...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) 
> ; lijian.zh...@arm.com; Honnappa Nagarahalli 
> 
> Subject: RE: CSIT - performance tests failing on Taishan
> 
> Hi Peter, can I get access to the setup to investigate?
> 
> Best
> ben
> 
> > ---

Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

2019-12-04 Thread Juraj Linkeš
I looked into this and there are some problems.

The first problem is the inability to fine tune any parameters we might want to 
for target cpu/microarchitecture (for arm, that would be building packages with 
specifics for ThunderX, McBin, Raspberry PI etc.). I'm not sure how Qemu does 
the emulation, but I doubt there's extensive support for a myriad of cpus. And 
on top of that, it's slow.

The other thing is with using the x86 version of aarch64-linux-gnu-gcc. In that 
case it's just regular cross-compilation (x86 version of aarch64-linux-gnu-gcc 
is the cross-compiler) and what we gain from these target platform containers 
is having the proper libraries that VPP build depends on. For Ubuntu I've found 
that it's actually not worth it - it's easier to have an x86 container and 
install arm dependency packages that to install x86 cross compiler inside an 
aarch64 container. Another thing I've observed (in htop) is that even when 
using the x86 version of aarch64-linux-gnu-gcc inside the aarch64 container, it 
was still going through the emulator, but maybe I did something wrong.

Damjan, you mentioned that my current patch doesn't solve anything. It 
certainly isn't a comprehensive solution, but it does one thing and that is it 
allows users to specify platform specific config args (well, at least some of 
the supported ones in build-data/platforms/.mk) which then get 
propagated to all parts of the build and it's possible to do cross compilation 
given that the environemnt has been already set up. It modifies the current 
ebuild system, but that might not be the appropriate place to do that. However, 
I don't how else would we do this.

I'm not sure what all of this means, but the docker solution is certainly 
incomplete, if not outright unsuitable. Maybe we could use containers for just 
environment setup (e.g. for Ubuntu, installing both host and target packages) 
and then could run cross-compilation in them with a solution that would do 
something like my patch (i.e. cross-compile DPDK and VPP with config args 
defined in one file).

Thoughts?
Juraj

From: Damjan Marion via Lists.Fd.Io 
Sent: Thursday, October 31, 2019 1:49 PM
To: Benoit Ganne (bganne) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support



> On 31 Oct 2019, at 13:18, Benoit Ganne (bganne) 
> mailto:bga...@cisco.com>> wrote:
>
>> I was going to remain silent, but since there's now multiple people saying
>> this sounds good -- I think this sounds horrible. :)
>> To wit, it seems too complex and too much setup/overhead. I'll try and
>> look closer at this soon to see if I can feed back our local changes that
>> seem to be working.
>
> It is not that bad in my opinion [1] :
> 1) add support for multiarch (must be done once after reboot)
> ~# docker run --rm --privileged multiarch/qemu-user-static --reset 
> --persistent yes --credential yes
> 2) create your chroot (must be done once - I am sharing my homedir with my 
> chroot and same UID/GID)
> ~# docker run --name aarch64_u1804 --privileged --net host -v $HOME:$HOME -v 
> /dev:/dev -v/lib/modules:/lib/modules/host:ro -td arm64v8/ubuntu:18.04 
> /bin/bash
> ~# docker container exec aarch64_u1804 sh -c "apt -qy update && apt 
> dist-upgrade -qy && apt install -qy vim sudo make git && groupadd -g $(id 
> -rg) $USER && useradd -u $(id -ru) -g $(id -rg) -M -d $HOME -s /bin/bash 
> $USER && echo '$USER ALL=(ALL) NOPASSWD:ALL' >> /etc/sudoers && echo 
> aarch64_u1804 > /etc/debian_chroot"
> 3) compile vpp (I already checked out VPP in $HOME/src/vpp but you can 
> checkout it there too if you prefer)
> ~# docker container exec aarch64_u1804 su "$USER" -l -c "UNATTENTED=y make -C 
> src/vpp install-dep install-ext-deps pkg-deb"
> [...]
> dpkg-deb: building package 'libvppinfra-dev' in 
> '../libvppinfra-dev_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp-dbg' in 
> '../vpp-dbg_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'libvppinfra' in 
> '../libvppinfra_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp-api-python' in 
> '../vpp-api-python_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp' in 
> '../vpp_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp-plugin-dpdk' in 
> '../vpp-plugin-dpdk_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'python3-vpp-api' in 
> '../python3-vpp-api_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp-dev' in 
> '../vpp-dev_20.01-rc0~538-gbb41ee925_arm64.deb'.
> dpkg-deb: building package 'vpp-plugin-core' in 
> '../vpp-plugin-core_20.01-rc0~538-gbb41ee925_arm64.deb'.
> make[2]: Leaving directory 
> '/home/bganne/src/vpp/build-root/build-vpp-native/vpp'
> dpkg-genbuildinfo --build=binary
> dpkg-genchanges --build=binary >../vpp_20.01-rc0~538-gbb41ee925_arm64.changes
> dpkg-genchanges: info: binary-only upload (no source code included)
> dpkg-source -

Re: [vpp-dev] CSIT - performance tests failing on Taishan

2019-12-05 Thread Juraj Linkeš
Hi Lijian,

The patch helped, I can't reproduce the issue now.

Thanks,
Juraj

-Original Message-
From: Lijian Zhang (Arm Technology China)  
Sent: Thursday, December 5, 2019 7:16 AM
To: Juraj Linkeš ; Peter Mikus -X (pmikus - 
PANTHEON TECH SRO at Cisco) ; Benoit Ganne (bganne) 
; Maciek Konstantynowicz (mkonstan) ; 
vpp-dev ; csit-...@lists.fd.io
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
Honnappa Nagarahalli 
Subject: RE: CSIT - performance tests failing on Taishan

Hi Juraj,
Could you please try the attached patch?
Thanks.
-Original Message-
From: Juraj Linkeš 
Sent: 2019年12月4日 18:12
To: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) ; 
Benoit Ganne (bganne) ; Maciek Konstantynowicz (mkonstan) 
; vpp-dev ; csit-...@lists.fd.io
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
Lijian Zhang (Arm Technology China) ; Honnappa 
Nagarahalli 
Subject: RE: CSIT - performance tests failing on Taishan

Hi Ben, Lijian, Honnappa,

The issue is reproducible after the second invocation of show pci:
DBGvpp# show pci
Address  Sock VID:PID Link Speed   Driver  Product Name 
   Vital Product Data
:11:00.0   2  8086:10fb   5.0 GT/s x8  ixgbe
:11:00.1   2  8086:10fb   5.0 GT/s x8  ixgbe
0002:f9:00.0   0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 
LX SFP28   PN: MCX4121A-ACAT_C12

   EC: A1

   SN: MT1745K13032

   V0: 0x 50 43 49 65 47 65 6e 33 ...

   RV: 0x ba
0002:f9:00.1   0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 
LX SFP28   PN: MCX4121A-ACAT_C12

   EC: A1

   SN: MT1745K13032

   V0: 0x 50 43 49 65 47 65 6e 33 ...

   RV: 0x ba DBGvpp# show pci
Address  Sock VID:PID Link Speed   Driver  Product Name 
   Vital Product Data
:11:00.0   2  8086:10fb   5.0 GT/s x8  ixgbe
:11:00.1   2  8086:10fb   5.0 GT/s x8  ixgbe
Aborted
Makefile:546: recipe for target 'run' failed
make: *** [run] Error 134

I've tried to do some debugging with a debug build:
(gdb) bt
...
#5  0xbe775000 in format_vlib_pci_vpd (s=0x7efa9e80 "0002:f9:00.0   
0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 LX SFP28", 
args=0x7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230
...
(gdb) frame 5
#5  0xbe775000 in format_vlib_pci_vpd (s=0x7efa9e80 "0002:f9:00.0   
0  15b3:1015   8.0 GT/s x8  mlx5_core   CX4121A - ConnectX-4 LX SFP28", 
args=0x7ef729b0) at /home/testuser/vpp/src/vlib/pci/pci.c:230
230   else if (*(u16 *) & data[p] == *(u16 *) id)
(gdb) info locals
data = 0x7efa9cd0 "PN\025MCX4121A-ACAT_C12EC\002A1SN\030MT1745K13032", 
' ' , "V0\023PCIeGen3 x8RV\001\272"
id = 0xaaa8  indent = 91 string_types = {0xbe7b7950 "PN", 
0xbe7b7958 "EC", 0xbe7b7960 "SN", 0xbe7b7968 "MN", 0x0} p = 0 
first_line = 1

Looks like something went wrong with the 'id' variable. More is attached.

As a temporary workaround (until we fix this), we're going to replace show pci 
with something else in CSIT: https://gerrit.fd.io/r/c/csit/+/23785

Juraj

-----Original Message-
From: Peter Mikus -X (pmikus - PANTHEON TECH SRO at Cisco) 
Sent: Tuesday, December 3, 2019 3:58 PM
To: Juraj Linkeš ; Benoit Ganne (bganne) 
; Maciek Konstantynowicz (mkonstan) ; 
vpp-dev ; csit-...@lists.fd.io
Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) ; 
lijian.zh...@arm.com; Honnappa Nagarahalli 
Subject: RE: CSIT - performance tests failing on Taishan

Latest update is that Benoit has no access over VPN so he did try to replicate 
in local lab (assuming x86).
I will do quick fix in CSIT. I will disable MLX driver on Taishan.

Peter Mikus
Engineer - Software
Cisco Systems Limited

> -Original Message-
> From: Juraj Linkeš 
> Sent: Tuesday, December 3, 2019 3:09 PM
> To: Benoit Ganne (bganne) ; Peter Mikus -X (pmikus - 
> PANTHEON TECH SRO at Cisco) ; Maciek Konstantynowicz
> (mkonstan) ; vpp-dev ; csit- 
> d...@lists.fd.io
> Cc: Vratko Polak -X (vrpolak - PANTHEON TECH SRO at Cisco) 
> ; lijian.zh...@arm.com; Honnappa Nagarahalli 
> 
> Subject: RE: 

Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

2019-12-17 Thread Juraj Linkeš
I tried a more basic thing in my previous patch: 
https://gerrit.fd.io/r/c/vpp/+/21035

The patch has one file with all cross-compile arguments/parameters: 
build-data/platforms/aarch64-generic.mk.

It also modifies the ebuild system to propagate those into both VPP and 
external (dpdk, rdma and such) builds. If we don't want to modify the the 
ebuild system in this way, I can create separate make targets like Damjan did 
here: https://gerrit.fd.io/r/c/vpp/+/23153

If anyone's interested in trying it out, there's a script that builds aarch64 
dpdk and other external libs along with VPP libraries: 
build-root/scripts/aarch64-crossbuild.sh. The binaries should be in the same 
repository.

What do you think about this approach? That is, splitting the cross compilation 
from the environment setup. It seems like a good first step if we're not going 
to do the emulation builds (see my previous e-mail for reasons why I think it's 
inadequate).

Thanks,
Juraj

From: Juraj Linkeš 
Sent: Wednesday, December 4, 2019 3:36 PM
To: dmar...@me.com; Benoit Ganne (bganne) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

I looked into this and there are some problems.

The first problem is the inability to fine tune any parameters we might want to 
for target cpu/microarchitecture (for arm, that would be building packages with 
specifics for ThunderX, McBin, Raspberry PI etc.). I'm not sure how Qemu does 
the emulation, but I doubt there's extensive support for a myriad of cpus. And 
on top of that, it's slow.

The other thing is with using the x86 version of aarch64-linux-gnu-gcc. In that 
case it's just regular cross-compilation (x86 version of aarch64-linux-gnu-gcc 
is the cross-compiler) and what we gain from these target platform containers 
is having the proper libraries that VPP build depends on. For Ubuntu I've found 
that it's actually not worth it - it's easier to have an x86 container and 
install arm dependency packages that to install x86 cross compiler inside an 
aarch64 container. Another thing I've observed (in htop) is that even when 
using the x86 version of aarch64-linux-gnu-gcc inside the aarch64 container, it 
was still going through the emulator, but maybe I did something wrong.

Damjan, you mentioned that my current patch doesn't solve anything. It 
certainly isn't a comprehensive solution, but it does one thing and that is it 
allows users to specify platform specific config args (well, at least some of 
the supported ones in build-data/platforms/.mk) which then get 
propagated to all parts of the build and it's possible to do cross compilation 
given that the environemnt has been already set up. It modifies the current 
ebuild system, but that might not be the appropriate place to do that. However, 
I don't how else would we do this.

I'm not sure what all of this means, but the docker solution is certainly 
incomplete, if not outright unsuitable. Maybe we could use containers for just 
environment setup (e.g. for Ubuntu, installing both host and target packages) 
and then could run cross-compilation in them with a solution that would do 
something like my patch (i.e. cross-compile DPDK and VPP with config args 
defined in one file).

Thoughts?
Juraj

From: Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>>
Sent: Thursday, October 31, 2019 1:49 PM
To: Benoit Ganne (bganne) mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support



> On 31 Oct 2019, at 13:18, Benoit Ganne (bganne) 
> mailto:bga...@cisco.com>> wrote:
>
>> I was going to remain silent, but since there's now multiple people saying
>> this sounds good -- I think this sounds horrible. :)
>> To wit, it seems too complex and too much setup/overhead. I'll try and
>> look closer at this soon to see if I can feed back our local changes that
>> seem to be working.
>
> It is not that bad in my opinion [1] :
> 1) add support for multiarch (must be done once after reboot)
> ~# docker run --rm --privileged multiarch/qemu-user-static --reset 
> --persistent yes --credential yes
> 2) create your chroot (must be done once - I am sharing my homedir with my 
> chroot and same UID/GID)
> ~# docker run --name aarch64_u1804 --privileged --net host -v $HOME:$HOME -v 
> /dev:/dev -v/lib/modules:/lib/modules/host:ro -td arm64v8/ubuntu:18.04 
> /bin/bash
> ~# docker container exec aarch64_u1804 sh -c "apt -qy update && apt 
> dist-upgrade -qy && apt install -qy vim sudo make git && groupadd -g $(id 
> -rg) $USER && useradd -u $(id -ru) -g $(id -rg) -M -d $HOME -s /bin/bash 
> $USER && echo '$USER AL

Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

2020-02-10 Thread Juraj Linkeš
Hi Dave,

This discussion about cross-compilation didn't go anywhere. We're thinking of 
talking about the cross compilation in the monthly VPP call tomorrow, now that 
latest release is done.. I want to get some agreement on how to properly do the 
cross-compilation - how feasible the qemu emulation is, how much of the current 
ebuild system should (or shouldn't) we use or do we need to do something 
completely different (that would presumably be a lot simpler).

Thanks,
Juraj

From: Juraj Linkeš 
Sent: Tuesday, December 17, 2019 10:25 AM
To: Juraj Linkeš ; dmar...@me.com; Benoit Ganne 
(bganne) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

I tried a more basic thing in my previous patch: 
https://gerrit.fd.io/r/c/vpp/+/21035

The patch has one file with all cross-compile arguments/parameters: 
build-data/platforms/aarch64-generic.mk.

It also modifies the ebuild system to propagate those into both VPP and 
external (dpdk, rdma and such) builds. If we don't want to modify the the 
ebuild system in this way, I can create separate make targets like Damjan did 
here: https://gerrit.fd.io/r/c/vpp/+/23153

If anyone's interested in trying it out, there's a script that builds aarch64 
dpdk and other external libs along with VPP libraries: 
build-root/scripts/aarch64-crossbuild.sh. The binaries should be in the same 
repository.

What do you think about this approach? That is, splitting the cross compilation 
from the environment setup. It seems like a good first step if we're not going 
to do the emulation builds (see my previous e-mail for reasons why I think it's 
inadequate).

Thanks,
Juraj

From: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>
Sent: Wednesday, December 4, 2019 3:36 PM
To: dmar...@me.com<mailto:dmar...@me.com>; Benoit Ganne (bganne) 
mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

I looked into this and there are some problems.

The first problem is the inability to fine tune any parameters we might want to 
for target cpu/microarchitecture (for arm, that would be building packages with 
specifics for ThunderX, McBin, Raspberry PI etc.). I'm not sure how Qemu does 
the emulation, but I doubt there's extensive support for a myriad of cpus. And 
on top of that, it's slow.

The other thing is with using the x86 version of aarch64-linux-gnu-gcc. In that 
case it's just regular cross-compilation (x86 version of aarch64-linux-gnu-gcc 
is the cross-compiler) and what we gain from these target platform containers 
is having the proper libraries that VPP build depends on. For Ubuntu I've found 
that it's actually not worth it - it's easier to have an x86 container and 
install arm dependency packages that to install x86 cross compiler inside an 
aarch64 container. Another thing I've observed (in htop) is that even when 
using the x86 version of aarch64-linux-gnu-gcc inside the aarch64 container, it 
was still going through the emulator, but maybe I did something wrong.

Damjan, you mentioned that my current patch doesn't solve anything. It 
certainly isn't a comprehensive solution, but it does one thing and that is it 
allows users to specify platform specific config args (well, at least some of 
the supported ones in build-data/platforms/.mk) which then get 
propagated to all parts of the build and it's possible to do cross compilation 
given that the environemnt has been already set up. It modifies the current 
ebuild system, but that might not be the appropriate place to do that. However, 
I don't how else would we do this.

I'm not sure what all of this means, but the docker solution is certainly 
incomplete, if not outright unsuitable. Maybe we could use containers for just 
environment setup (e.g. for Ubuntu, installing both host and target packages) 
and then could run cross-compilation in them with a solution that would do 
something like my patch (i.e. cross-compile DPDK and VPP with config args 
defined in one file).

Thoughts?
Juraj

From: Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>>
Sent: Thursday, October 31, 2019 1:49 PM
To: Benoit Ganne (bganne) mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support



> On 31 Oct 2019, at 13:18, Benoit Ganne (bganne) 
> mailto:bga...@cisco.com>> wrote:
>
>> I was going to remain silent, but since there's now multiple people saying
>> this sounds good -- I think this sounds horrible. :)
>> To wit, it seems too complex and too much setup/overhead. I'll try and
>> look closer at this soon to see if I can feed back our l

Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 Ubuntu support

2020-02-10 Thread Juraj Linkeš
I forgot to actually ask to add this item to the agenda - can we talk about 
this tomorrow? :)

From: Juraj Linkeš 
Sent: Monday, February 10, 2020 11:12 AM
To: Dave Barach (dbarach) 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

Hi Dave,

This discussion about cross-compilation didn't go anywhere. We're thinking of 
talking about the cross compilation in the monthly VPP call tomorrow, now that 
latest release is done.. I want to get some agreement on how to properly do the 
cross-compilation - how feasible the qemu emulation is, how much of the current 
ebuild system should (or shouldn't) we use or do we need to do something 
completely different (that would presumably be a lot simpler).

Thanks,
Juraj

From: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>
Sent: Tuesday, December 17, 2019 10:25 AM
To: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>; 
dmar...@me.com<mailto:dmar...@me.com>; Benoit Ganne (bganne) 
mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

I tried a more basic thing in my previous patch: 
https://gerrit.fd.io/r/c/vpp/+/21035

The patch has one file with all cross-compile arguments/parameters: 
build-data/platforms/aarch64-generic.mk.

It also modifies the ebuild system to propagate those into both VPP and 
external (dpdk, rdma and such) builds. If we don't want to modify the the 
ebuild system in this way, I can create separate make targets like Damjan did 
here: https://gerrit.fd.io/r/c/vpp/+/23153

If anyone's interested in trying it out, there's a script that builds aarch64 
dpdk and other external libs along with VPP libraries: 
build-root/scripts/aarch64-crossbuild.sh. The binaries should be in the same 
repository.

What do you think about this approach? That is, splitting the cross compilation 
from the environment setup. It seems like a good first step if we're not going 
to do the emulation builds (see my previous e-mail for reasons why I think it's 
inadequate).

Thanks,
Juraj

From: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>
Sent: Wednesday, December 4, 2019 3:36 PM
To: dmar...@me.com<mailto:dmar...@me.com>; Benoit Ganne (bganne) 
mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Change in vpp[master]: ebuild: Cross compilation aarch64 
Ubuntu support

I looked into this and there are some problems.

The first problem is the inability to fine tune any parameters we might want to 
for target cpu/microarchitecture (for arm, that would be building packages with 
specifics for ThunderX, McBin, Raspberry PI etc.). I'm not sure how Qemu does 
the emulation, but I doubt there's extensive support for a myriad of cpus. And 
on top of that, it's slow.

The other thing is with using the x86 version of aarch64-linux-gnu-gcc. In that 
case it's just regular cross-compilation (x86 version of aarch64-linux-gnu-gcc 
is the cross-compiler) and what we gain from these target platform containers 
is having the proper libraries that VPP build depends on. For Ubuntu I've found 
that it's actually not worth it - it's easier to have an x86 container and 
install arm dependency packages that to install x86 cross compiler inside an 
aarch64 container. Another thing I've observed (in htop) is that even when 
using the x86 version of aarch64-linux-gnu-gcc inside the aarch64 container, it 
was still going through the emulator, but maybe I did something wrong.

Damjan, you mentioned that my current patch doesn't solve anything. It 
certainly isn't a comprehensive solution, but it does one thing and that is it 
allows users to specify platform specific config args (well, at least some of 
the supported ones in build-data/platforms/.mk) which then get 
propagated to all parts of the build and it's possible to do cross compilation 
given that the environemnt has been already set up. It modifies the current 
ebuild system, but that might not be the appropriate place to do that. However, 
I don't how else would we do this.

I'm not sure what all of this means, but the docker solution is certainly 
incomplete, if not outright unsuitable. Maybe we could use containers for just 
environment setup (e.g. for Ubuntu, installing both host and target packages) 
and then could run cross-compilation in them with a solution that would do 
something like my patch (i.e. cross-compile DPDK and VPP with config args 
defined in one file).

Thoughts?
Juraj

From: Damjan Marion via Lists.Fd.Io 
mailto:dmarion=me@lists.fd.io>>
Sent: Thursday, October 31, 2019 1:49 PM
To: Benoit Ganne (bganne) mailto:bga...@cisco.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-

Re: [vpp-dev] Help with creating patch

2020-05-14 Thread Juraj Linkeš
Hi folks,

I've ran into the same issue when trying to submit a CSIT patch after upgrading 
my Debian distro (which also updated git review). It turns out that my git 
review (version 1.25.0) is using improper default gerrit namespace. From git 
review 1.27.0 release notes [0]:
Update default gerrit namespace for newer gerrit. According to Gerrit 
documentation for 2.15.3, refs/for/’branch’ should be used when pushing changes 
to Gerrit instead of refs/publish/’branch’.

I didn't figure out how to configure the default namespace for git review 
(maybe it could be done in .gitreview, but the docs don't mention that) so I 
just submitted the change manually:
git push ssh://juraj.lin...@gerrit.fd.io:29418/csit.git 
HEAD:refs/for/master%wip -o topic=1n-tx2_doc_update

Regards,
Juraj

[0] 
https://docs.openstack.org/infra/git-review/releasenotes.html#relnotes-1-27-0

> -Original Message-
> From: Govindarajan Mohandoss 
> Sent: Thursday, May 14, 2020 4:31 AM
> To: Luke, Chris ; vpp-dev 
> Cc: nd ; nd 
> Subject: Re: [vpp-dev] Help with creating patch
> 
> Hi Chris,
>   I didn't create a local branch.  Thanks !!
>   I didn’t change the subject thinking that it could be related to code 
> freeze.
> Sorry for that.
> 
> Thanks
> Govind
> 
> > -Original Message-
> > From: Luke, Chris 
> > Sent: Wednesday, May 13, 2020 9:12 PM
> > To: Govindarajan Mohandoss ; vpp-
> dev
> > 
> > Cc: nd 
> > Subject: RE: [vpp-dev] Help with creating patch
> >
> > Govind,
> >
> > Did you create a branch locally before making a commit? It looks like
> > you tried to push to master which won't work. A typical workflow
> > involves creating a local branch, making some changes and commits and
> > then pushing to Gerrit.
> >
> > Also, I changed the email subject; you should really have started a
> > new thread instead of replying to an existing thread with something 
> > unrelated.
> >
> > Chris.
> >
> > -Original Message-
> > From: vpp-dev@lists.fd.io  On Behalf Of
> > Govindarajan Mohandoss
> > Sent: Wednesday, May 13, 2020 19:24
> > To: ayour...@gmail.com; vpp-dev 
> > Cc: nd 
> > Subject: [EXTERNAL] Re: [vpp-dev] VPP 20.05 RC1 milestone is complete!
> > RC2
> > - on Wednesday 20th May
> >
> > Hello Maintainers,
> >  I am doing the patch submission for the first time.
> >  I am following the page
> > https://urldefense.com/v3/__https://wiki.fd.io/view/VPP/Pulling,_Build
> > ing,_
> > Running,_Hacking_and_Pushing_VPP_Code*Pulling__;Iw!!CQl3mcHX2A!X_Yl
> > Df6H02w8Ew6AQDrBpiMP7UZ5XJeDWGNgAaY0wqMSqos0VyWPgbGH8cP27P
> > ol6w$  and getting the error below. Can you please help to fix this ?
> >
> > #:~/vpp_external/vpp$ git review
> > remote: error: branch refs/publish/master:
> > remote: You need 'Create' rights to create new references.
> > remote: User: mgovind
> > remote: Contact an administrator to fix the permissions
> > remote:
> > remote: Processing changes: refs: 1
> > remote: Processing changes: refs: 1, done To ssh://gerrit.fd.io:29418/vpp
> >  ! [remote rejected] HEAD -> refs/publish/master (prohibited by Gerrit: 
> > not
> > permitted: create)
> > error: failed to push some refs to 'ssh://mgov...@gerrit.fd.io:29418/vpp'
> >
> > Thanks
> > Govind
> >
> > > -Original Message-
> > > From: vpp-dev@lists.fd.io  On Behalf Of Andrew
> > > Yourtchenko via lists.fd.io
> > > Sent: Wednesday, May 13, 2020 6:05 PM
> > > To: vpp-dev 
> > > Subject: [vpp-dev] VPP 20.05 RC1 milestone is complete! RC2 - on
> > > Wednesday 20th May
> > >
> > > Hi all,
> > >
> > > This is to announce that the VPP 20.05 RC1 milestone is complete!
> > >
> > > The newly created stable/2005 branch is ready for your fixes in
> > > preparation for the RC2 milestone.
> > >
> > > They need to have a Jira ticket for the issue, and to avoid
> > > forgetting adding them to master, where practical, *should* be first
> > > merged there and then cherry-picked into the stable/2005 branch -
> > > but as soon as the Jira ticket is mentioned in the commit message
> > > and the fix ends up in both master and
> > > stable/2005 (and if it is important/urgent - maybe earlier
> > > branches), then either order is fine.
> > >
> > > The installation packages for the RC1 for Ubuntu 18.04 and Centos 7
> > > from the new branch are available on
> > > https://urldefense.com/v3/__https://packagecloud.io/fdio/2005/__;!!C
> > > Ql
> > >
> > 3mcHX2A!X_YlDf6H02w8Ew6AQDrBpiMP7UZ5XJeDWGNgAaY0wqMSqos0Vy
> > WPgbGH8cM1me
> > > QxkA$
> > >
> > > The master branch is open for all commits.
> > >
> > > Our next milestone for VPP 20.05 is RC2, happening next Wednesday
> > > 20th May.
> > >
> > > Thanks a lot to Vanessa Valderrama, Dave Wallace and Ed Warnicke for
> > > the help!
> > >
> > > --a
> > > /* Your friendly 2005 release manager */
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#16377): https://lists.fd.io/g/vpp-dev/message/16377
Mute This Topic: https://lists.fd.io/mt/74197290/21656
Group Owner: vpp-dev+ow

Re: [vpp-dev] vpp_papi parsing failure while running make test on a Taishan ARM board

2018-06-27 Thread Juraj Linkeš
Hi vpp-devs,

I think I've at least partially uncovered what's the problem. On the Taishan 
board, the api json files are processed in different order that on other 
platforms - 
./build-root/install-vpp-native/vpp/share/vpp/api/core/stats.api.json is 
processed first, but it doesn't contain the type definition for 
vl_api_fib_path_t, which causes the crash (since vl_api_fib_path_t is not yet 
known).

I've created https://jira.fd.io/browse/CSIT-1148 to track this and I've 
attached apifiles_taishan.txt and apifiles_cavium.txt for comparison of how the 
order of processed apifiles looks on a platform where the failure doesn't occur.

@Ole: could you please verify that this is accurate?

Thanks,
Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Monday, June 25, 2018 2:38 PM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] vpp_papi parsing failure while running make test on a 
Taishan ARM board

Hi vpp-dev,

I'm trying to build VPP and run tests on Huawei TaiShan 2280 ARM board and 
while I can build VPP, the test execution fails:
Traceback (most recent call last):
  File "sanity_run_vpp.py", line 21, in 
tc.setUpClass()
  File "/home/testuser/vpp/test/framework.py", line 345, in setUpClass
cls.vapi = VppPapiProvider(cls.shm_prefix, cls.shm_prefix, cls)
  File "/home/testuser/vpp/test/vpp_papi_provider.py", line 73, in __init__
self.vpp = VPP(jsonfiles, logger=test_class.logger, read_timeout=5)
  File "build/bdist.linux-aarch64/egg/vpp_papi/vpp_papi.py", line 220, in 
__init__
  File "build/bdist.linux-aarch64/egg/vpp_papi/vpp_papi.py", line 160, in 
process_json_file
ValueError: Unresolved type definitions {u'vl_api_bier_neighbor_counter_t': 
{'data': [u'vl_api_bier_neighbor_counter_t', [u'vl_api_bier_table_id_t', 
u'tbl_id'], [u'vl_api_fib_path_t', u'path'], [u'u64', u'packets'], [u'u64', 
u'bytes'], {u'crc': u'0x91fe1748'}], 'type': 'type'}}

I attached the whole log. It seems that the failure is happening because 
vl_api_bier_neighbor_counter_t is not a recognized type. I tried it on a 
different ARM board, Cavium Thunderx and it does work there (and it obviously 
works on x86). Does anyone have any pointers on how to debug this further?

I observed the failure on Ubuntu 17.10 and Ubuntu 16.04.

Thanks,
Juraj
_._,_._,_

Links:

You receive all messages sent to this group.

View/Reply Online (#9688)<https://lists.fd.io/g/vpp-dev/message/9688> | Reply 
To 
Sender<mailto:juraj.lin...@pantheon.tech?subject=Private:%20Re:%20%5Bvpp-dev%5D%20vpp_papi%20parsing%20failure%20while%20running%20make%20test%20on%20a%20Taishan%20ARM%20board>
 | Reply To 
Group<mailto:vpp-dev@lists.fd.io?subject=Re:%20%5Bvpp-dev%5D%20vpp_papi%20parsing%20failure%20while%20running%20make%20test%20on%20a%20Taishan%20ARM%20board>
 | Mute This Topic<https://lists.fd.io/mt/22675255/899915> | New 
Topic<https://lists.fd.io/g/vpp-dev/post>

Your Subscription<https://lists.fd.io/g/vpp-dev/editsub/899915> | Contact Group 
Owner<mailto:vpp-dev+ow...@lists.fd.io> | 
Unsubscribe<https://lists.fd.io/g/vpp-dev/unsub> [juraj.lin...@pantheon.tech]
_._,_._,_
['/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/sr_mpls.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/vxlan_gpe.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/qos.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/interface.api.json',
 
'/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/dhcp6_pd_client_cp.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/cop.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/vpe.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/geneve.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/lisp_gpe.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/punt.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/l2.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/classify.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/span.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/bond.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/bier.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/dhcp.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/mpls.api.json',
 '/vpp/build-root/install-vpp-native/vpp/share/vpp/api/core/session.api.json',

[vpp-dev] Parallel test execution in VPP Test Framework

2018-07-19 Thread Juraj Linkeš
Hi VPP devs,

I'm implementing parallel test execution of tests in VPP Test Framework (the 
patch is here https://gerrit.fd.io/r/#/c/13491/) and the last big outstanding 
question is how scalable the parallelization actually is. The tests are 
spawning one VPP instance per each VPPTestCase class and the question is - how 
do the required compute resources per each VPP instance (cpu, ram, shm) scale 
and how much resources do we need with increasing number of VPP instances 
running in parallel (in the context of VPP Test Framework tests)?

The second question would be a generic "is there anything else I need to be 
aware of when trying to run VPPs in parallel?".

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9886): https://lists.fd.io/g/vpp-dev/message/9886
Mute This Topic: https://lists.fd.io/mt/23744978/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Parallel test execution in VPP Test Framework

2018-07-27 Thread Juraj Linkeš
Hi Maciek and vpp-devs,

I've run into a significant problem regarding VPP assignment to cores. All VPPs 
that are spawned are assigned to core 1. I looked at 
https://wiki.fd.io/view/VPP/Command-line_Arguments and I guess it's because 
that's the default behavior of VPP (dpdk coremask is not configured and  Note 
that the "main" thread always occupies the lowest core-id specified in the DPDK 
[process-level] coremask.").

Is my reading of the config options accurate?

Obviously, all VPP instances running on the same core goes against running the 
tests on multiple cores. There are a couple of solutions that come to mind:

·Assign VPP instances to cores manually. With possible multiple jobs 
running on a given host, this creates a situation where the different jobs 
don't know cores are already occupied (and by how many VPP instances) and thus 
introduces additional challenges to solve.

·Add an option to override this default behavior and let the Linux CFS 
scheduler assign VPPs to cores or something similar where VPPs would land on 
different cores.

Is there some other solution?

Vpp-devs, what do you think about the second solution? What it be possible?

Thanks,
Juraj

From: Maciek Konstantynowicz (mkonstan) [mailto:mkons...@cisco.com]
Sent: Wednesday, July 25, 2018 1:10 PM
To: Juraj Linkeš 
Cc: vpp-dev@lists.fd.io; csit-dev 
Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework




On 19 Jul 2018, at 15:44, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi VPP devs,

I'm implementing parallel test execution of tests in VPP Test Framework (the 
patch is here https://gerrit.fd.io/r/#/c/13491/) and the last big outstanding 
question is how scalable the parallelization actually is.

That’s a good question. What do the tests say? :)


The tests are spawning one VPP instance per each VPPTestCase class

How many VPP instances are spawned and run in parallel? Cause assuming
there is at least one VPPTestCase class per test_, that’s 70 VPP
instances ..



and the question is - how do the required compute resources per each VPP 
instance (cpu, ram, shm) scale and how much resources do we need with 
increasing number of VPP instances running in parallel (in the context of VPP 
Test Framework tests)?

I guess this will vary between tests too. FIB scale tests will require
more fib heap memory. What do the tests say? :)

-Maciek



The second question would be a generic "is there anything else I need to be 
aware of when trying to run VPPs in parallel?".

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9886): https://lists.fd.io/g/vpp-dev/message/9886
Mute This Topic: https://lists.fd.io/mt/23744978/675185
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[mkons...@cisco.com<mailto:mkons...@cisco.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#9950): https://lists.fd.io/g/vpp-dev/message/9950
Mute This Topic: https://lists.fd.io/mt/23744978/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework

2018-07-30 Thread Juraj Linkeš
Hi,

A couple of corrections/additions:

Python spawns processes with proper CFS scheduling (I've tested this), so it's 
VPP that's overriding CFS scheduling.

Damjan, assigning cpus to VPPs is not the problem. The problem is when multiple 
make test frameworks in different Jenkins slaves try to do the same thing - 
e.g. when a framework assigns any n cores to different VPP instances, the other 
framework instances running in other Jenkins slaves don't know which cores are 
currently assigned to how many VPPs. I guess I could parse this from all 
running VPP pids, then looking at their affinity and assign cores based on 
that, but I wanted to know about other approaches. I'll look into this in the 
meantime.

VPP Test Framework doesn't load the dpdk plugin, does it make sense to use CFS 
scheduler by default when it isn't loaded? Or maybe just use the CFS scheduler 
by default when dpdk plugin is not loaded and no workers are used?

Are there plans for running multiple workers in make test? I don't see that in 
the framework at the moment, but maybe I'm missing something.

Thanks,
Juraj

From: Damjan Marion [mailto:dmar...@me.com]
Sent: Saturday, July 28, 2018 1:28 PM
To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) 
Cc: Maciek Konstantynowicz ; Alec Hothan (ahothan) 
; Juraj Linkeš ; 
vpp-dev@lists.fd.io
Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework


Dear All,

My personal preference is that make test  framework implements cpu assignment 
code.
It should't be rocket science to parse /sys/devices/system/cpu/online and give 
one cpu to each instance.
It will also help to test framework to understand how many parallel jobs it can 
run...

Enforcing single cpu assignment in vpp is done intentionally, to avoid cross 
numa memory allocation.
If main-core is not specified, vpp simply uses cpu core 1 (unless only 0 
exists).
While adding something like "cpu { main-core any} " should be quite straight 
forward, it will have broken
behaviour when dpdk is loaded and it will just confuse people. Also, we will 
need to come back to the
drawing board when we decide to run multiple workers in make test, as logic 
there is more complex and will likely require
rework of the thread placement code.

--
Damjan


On 27 Jul 2018, at 20:46, Peter Mikus via Lists.Fd.Io 
mailto:pmikus=cisco@lists.fd.io>> wrote:

Hello,

>  What is the “significant problem” you’re running into?

The problem can be better described as: When python is spawning N instances of 
VPP process, all processes are from unknown reason placed with affinity 0x2 
(bin 10). This can be verified by taskset –p . CFS is then placing all 
VPP process to the same core, making it inefficient on multicore jenkins slave 
container.
The default vpp startup.conf is not modified thus there is no input to know 
where to pin the vpp threads. Simply one can said or think that this is related 
to python multiprocess/subprocess.popen code, which is hard-setting affinity 
mask to 0x2.

There are multiple solutions for workaround that Juraj proposed or Maciek, but 
none of them is answering why is this happening.

Peter Mikus
Engineer – Software
Cisco Systems Limited

From: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> 
[mailto:csit-...@lists.fd.io] On Behalf Of Maciek Konstantynowicz (mkonstan) 
via Lists.Fd.Io
Sent: Friday, July 27, 2018 6:53 PM
To: Alec Hothan (ahothan) mailto:ahot...@cisco.com>>; Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>
Cc: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>
Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework

Alec, This is about make test and not real packet forwarding. Per Juraj’s patch 
[1]

Juraj, My understanding is that if you’re starting VPP without specifying core 
placement in startup.conf [2] cpu {..}, then Linux CFS will be placing the 
threads onto available cpu core resources. If you’re saying this is not the 
case, and indeed the wiki comment indicates this, then the way to address it is 
to specify different core for main.c thread per vpp instance.

What is the “significant problem” you’re running into? Are tests not executing 
in parallel using python multiprocessing, are vpp’s having issues, else? Could 
you describe it a bit more?

-Maciek

[1] https://gerrit.fd.io/r/#/c/13491/
[2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf



On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) 
mailto:ahot...@cisco.com>> wrote:

Hi Juraj,
How many instances and what level of performance are you looking at?
Even if you assign different cores to each VPP instance, results can be skewed 
due to interference at the LLC and PCIe/NIC level (this can be somewhat 
mitigated by running on separate sockets)

   Alec


From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>
Date: Friday, July 27, 2018 at 7

[vpp-dev] Large memory spike during make verify on ARM machine ThunderX

2018-08-01 Thread Juraj Linkeš
Hi vpp-devs,

I noticed that during a specific portion of make verify build on an ARM 
ThunderX machine the build consumes a lot of memory - around 25GB. I can 
identify the spot in the logs:
Jul 31 03:12:48   CXX  gbp_contract.lo

25GB memory hog

Jul 31 03:16:13   CXXLDlibvom.la

but not much else. I created a ticket which 
contains some more information. I didn't see this memory spike when trying to 
reproducing the behavior on my x86 laptop. Does anyone has any idea what could 
be the cause or how to debug this?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10008): https://lists.fd.io/g/vpp-dev/message/10008
Mute This Topic: https://lists.fd.io/mt/24005970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Large memory spike during make verify on ARM machine ThunderX

2018-08-02 Thread Juraj Linkeš
Hi Neale,

I'm not specifying -j, but I see a lot of processes running in parallel when 
the spike is happening. The processes are attached. They utilized most of 96 
available cores and most of them used more than 400MB - is that how much they 
should be using?

Also, here's the gcc version on the box:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/5/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
5.4.0-6ubuntu1~16.04.4' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs 
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-5 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-libquadmath --enable-plugin --with-system-zlib 
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home 
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 
--with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror 
--enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu 
--target=aarch64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4)

Thanks,
Juraj

From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Wednesday, August 1, 2018 5:09 PM
To: Juraj Linkeš ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Large memory spike during make verify on ARM machine 
ThunderX

Hi Juraj,

How many parallel compiles do you have? What’s the j factor

/neale



From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>
Date: Wednesday, 1 August 2018 at 16:59
To: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] Large memory spike during make verify on ARM machine ThunderX

Hi vpp-devs,

I noticed that during a specific portion of make verify build on an ARM 
ThunderX machine the build consumes a lot of memory - around 25GB. I can 
identify the spot in the logs:
Jul 31 03:12:48   CXX  gbp_contract.lo

25GB memory hog

Jul 31 03:16:13   CXXLDlibvom.la

but not much else. I created a ticket<https://jira.fd.io/browse/VPP-1371> which 
contains some more information. I didn't see this memory spike when trying to 
reproducing the behavior on my x86 laptop. Does anyone has any idea what could 
be the cause or how to debug this?

Thanks,
Juraj
jlinkes  93644 98.9  0.1 468356 453468 pts/0   R+   00:34   0:42 
/usr/lib/gcc/aarch64-linux-gnu/5/cc1plus -fpreprocessed 
/home/jlinkes/2nd_vpp/build-root/.ccache/tmp/arp_proxy_.stdout.fdio-cavium6.92196.sx3jpw.ii
 -quiet -dumpbase arp_proxy_.stdout.fdio-cavium6.92196.sx3jpw.ii 
-mlittle-endian -mabi=lp64 -auxbase-strip .libs/arp_proxy_binding.o -g -O2 
-Wall -Werror -std=gnu++11 -fstack-protector -fPIC -Wformat-security -o 
/tmp/ccGYhYng.s
jlinkes  93650 98.9  0.1 450552 436712 pts/0   R+   00:34   0:42 
/usr/lib/gcc/aarch64-linux-gnu/5/cc1plus -fpreprocessed 
/home/jlinkes/2nd_vpp/build-root/.ccache/tmp/arp_proxy_.stdout.fdio-cavium6.91620.2Fcr7a.ii
 -quiet -dumpbase arp_proxy_.stdout.fdio-cavium6.91620.2Fcr7a.ii 
-mlittle-endian -mabi=lp64 -auxbase-strip .libs/arp_proxy_binding_cmds.o -g -O2 
-Wall -Werror -std=gnu++11 -fstack-protector -fPIC -Wformat-security -o 
/tmp/ccC7i5Ug.s
jlinkes  93659 98.9  0.1 448416 432828 pts/0   R+   00:34   0:42 
/usr/lib/gcc/aarch64-linux-gnu/5/cc1plus -fpreprocessed 
/home/jlinkes/2nd_vpp/build-root/.ccache/tmp/interface_.stdout.fdio-cavium6.92528.uoidGE.ii
 -quiet -dumpbase interface_.stdout.fdio-cavium6.92528.uoidGE.ii 
-mlittle-endian -mabi=lp64 -auxbase-strip .libs/interface_ip6_nd_cmds.o -g -O2 
-Wall -Werror -std=gnu++11 -fstack-protector -fPIC -Wformat-security -o 
/tmp/cc9IMdNf.s
jlinkes  93665 98.9  0.1 434236 424820 pts/0   R+   00:34   0:42 
/usr/lib/gcc/aarch64-linux-gnu/5/cc1plus -fpreprocessed 
/home/jlinkes/2nd_vpp/build-root/.ccache/tmp/bridge_dom.stdout.fdio-cavium6.92180.cNFjqv.ii
 -quiet -dumpbase bridge_dom.stdout.fdio-cavium6.92180.cNFjqv.ii 
-mlittle-endian -mabi=lp64 -auxbase-strip .libs/bridge_domain_arp_entry.o -g 
-O2 -Wall -Werror -std=gnu++11 -fstack-protector -fPIC -Wformat-security -o 
/tmp/ccd4Ddoi.s
jlinkes  93676 98.8  0.1 472524 457256 pts/0   R+   00:34   0:42 
/usr/lib/gcc/aarch64-linux-gnu/5/cc1plus -fpreprocessed 
/home/jlinkes/2nd_vpp/build-root/.ccache/tmp/vxlan_tunn.stdout.fdio-cavium6.93314.4EcYD1.ii
 -quiet -dumpbase vxlan_tunn.stdout.fdio-cavium6.93314.4EcYD1.ii 
-mli

Re: [vpp-dev] Large memory spike during make verify on ARM machine ThunderX

2018-08-03 Thread Juraj Linkeš
Hi Neale,

Yea they do require a lot of memory - the same is true for x86. Is there a way 
to specify the max number of these? Or is that done with -j?

Would it be worthwhile to investigate if it's possible to reduce the memory 
requirements of these?

Is there a way to clear the cache so that I could run make verify back to back 
without deleting and recloning the vpp repo? ccache -C didn't work for me.

Thanks,
Juraj

From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Thursday, August 2, 2018 11:11 AM
To: Juraj Linkeš ; vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] Large memory spike during make verify on ARM machine 
ThunderX

Hi Juraj,

I couldn’t say how much each compile ‘should’ use, but it has been noted in the 
past that these template heavy C++ files do require a lot of memory to compile. 
With the many cores you have, then that’s a lot in total.
‘make wipe’ does not clear the ccache, so any subsequent builds will require 
less memory because the compile is skipped.

/neale

From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>
Date: Thursday, 2 August 2018 at 10:10
To: "Neale Ranns (nranns)" mailto:nra...@cisco.com>>, 
"vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: Re: [vpp-dev] Large memory spike during make verify on ARM machine 
ThunderX

Hi Neale,

I'm not specifying -j, but I see a lot of processes running in parallel when 
the spike is happening. The processes are attached. They utilized most of 96 
available cores and most of them used more than 400MB - is that how much they 
should be using?

Also, here's the gcc version on the box:
gcc -v
Using built-in specs.
COLLECT_GCC=gcc
COLLECT_LTO_WRAPPER=/usr/lib/gcc/aarch64-linux-gnu/5/lto-wrapper
Target: aarch64-linux-gnu
Configured with: ../src/configure -v --with-pkgversion='Ubuntu/Linaro 
5.4.0-6ubuntu1~16.04.4' --with-bugurl=file:///usr/share/doc/gcc-5/README.Bugs 
--enable-languages=c,ada,c++,java,go,d,fortran,objc,obj-c++ --prefix=/usr 
--program-suffix=-5 --enable-shared --enable-linker-build-id 
--libexecdir=/usr/lib --without-included-gettext --enable-threads=posix 
--libdir=/usr/lib --enable-nls --with-sysroot=/ --enable-clocale=gnu 
--enable-libstdcxx-debug --enable-libstdcxx-time=yes 
--with-default-libstdcxx-abi=new --enable-gnu-unique-object 
--disable-libquadmath --enable-plugin --with-system-zlib 
--disable-browser-plugin --enable-java-awt=gtk --enable-gtk-cairo 
--with-java-home=/usr/lib/jvm/java-1.5.0-gcj-5-arm64/jre --enable-java-home 
--with-jvm-root-dir=/usr/lib/jvm/java-1.5.0-gcj-5-arm64 
--with-jvm-jar-dir=/usr/lib/jvm-exports/java-1.5.0-gcj-5-arm64 
--with-arch-directory=aarch64 --with-ecj-jar=/usr/share/java/eclipse-ecj.jar 
--enable-multiarch --enable-fix-cortex-a53-843419 --disable-werror 
--enable-checking=release --build=aarch64-linux-gnu --host=aarch64-linux-gnu 
--target=aarch64-linux-gnu
Thread model: posix
gcc version 5.4.0 20160609 (Ubuntu/Linaro 5.4.0-6ubuntu1~16.04.4)

Thanks,
Juraj

From: Neale Ranns (nranns) [mailto:nra...@cisco.com]
Sent: Wednesday, August 1, 2018 5:09 PM
To: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>; 
vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] Large memory spike during make verify on ARM machine 
ThunderX

Hi Juraj,

How many parallel compiles do you have? What’s the j factor

/neale



From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>
Date: Wednesday, 1 August 2018 at 16:59
To: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" 
mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] Large memory spike during make verify on ARM machine ThunderX

Hi vpp-devs,

I noticed that during a specific portion of make verify build on an ARM 
ThunderX machine the build consumes a lot of memory - around 25GB. I can 
identify the spot in the logs:
Jul 31 03:12:48   CXX  gbp_contract.lo

25GB memory hog

Jul 31 03:16:13   CXXLDlibvom.la

but not much else. I created a ticket<https://jira.fd.io/browse/VPP-1371> which 
contains some more information. I didn't see this memory spike when trying to 
reproducing the behavior on my x86 laptop. Does anyone has any idea what could 
be the cause or how to debug this?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10029): https://lists.fd.io/g/vpp-dev/message/10029
Mute This Topic: https://lists.fd.io/mt/24005970/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] vpp-arm-verify-master-ubuntu1604 failing

2018-09-12 Thread Juraj Linkeš
Hi Matt,

As you might've noticed, the job got reverted, which was the right thing to do.

But nonetheless, the issue you reported is a known issue and I'll be looking 
into figuring out what's going on. The tests work fine x86, but not on ARM and 
there isn't an apparent reason for why that's the case, since the tests worked 
fine before.

Juraj

From: Ed Kern (ejk) [mailto:e...@cisco.com]
Sent: Thursday, September 6, 2018 11:54 PM
To: Matthew Smith 
Cc: vpp-dev ; Juraj Linkeš 
Subject: Re: [vpp-dev] vpp-arm-verify-master-ubuntu1604 failing




On Sep 6, 2018, at 3:42 PM, Matthew Smith 
mailto:mgsm...@netgate.com>> wrote:


Hi,

The jenkins job vpp-arm-verify-master-ubuntu1604 seems to have failed every 
time it has run over the last 36 hours or so. Is that a known issue?

Yes.

Its actually been failing for about a week now.  Since the change was merged to 
have the arm jobs run make test.

Having said that….voting for the job has been off for a few days so it 
shouldn’t be impacting from a gerrit voting perspective.



https://jenkins.fd.io/job/vpp-arm-verify-master-ubuntu1604/buildTimeTrend

Is vpp-dev the appropriate place to report issues like this?

Well it works…Im honestly not sure if there is a “better” one.  Ill let others 
answer that.



Or is there some other email alias that will go directly to whomever might need 
to kick jenkins?


While I have a deep seeded loathing for jenkins (on par with folks who think vi 
is a real editor) this is not a jenkins issue.
Jenkins has 99 problems but this…..patch….ain't one.

ill cc juraj directly so he can relay status on either arm make test working or 
a revert.

Ed




Thanks,
-Matt

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10416): https://lists.fd.io/g/vpp-dev/message/10416
Mute This Topic: https://lists.fd.io/mt/25265335/675649
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[e...@cisco.com<mailto:e...@cisco.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10470): https://lists.fd.io/g/vpp-dev/message/10470
Mute This Topic: https://lists.fd.io/mt/25265335/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Make test failures on ARM

2018-09-24 Thread Juraj Linkeš
Hi vpp-devs,

Especially ARM vpp devs :)

We're experiencing a number of failures on Cavium ThunderX and we'd like to fix 
the issues. I've created a number of Jira tickets:

*GRE crash

*SCTP failure/crash

o   Me and Marco resolved a similar issue in the past, but this could be 
something different

*SPAN crash

*IP4 failures

o   These are multiple failures and I'm not sure that grouping them together is 
correct

*L2 failures/crash

o   As in IP4, these are multiple failures and I'm not sure that grouping them 
together is correct

*ECMP failure

*Multicast failure

*ACL failure

o   I'm already working with Andrew on fixing this

The reason I didn't reach out to all authors individually is that I wanted 
someone to look at the issues and assess whether there's an overlap (or I 
grouped the failures improperly), since some of the failures look similar.

Then there's the issue of hardware availability - if anyone willing to help has 
access to fd.io lab, I can setup access to a Cavium ThunderX, otherwise we 
could set up a call if further debugging is needed.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10628): https://lists.fd.io/g/vpp-dev/message/10628
Mute This Topic: https://lists.fd.io/mt/26201433/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Make test failures on ARM - IP4, L2, ECMP, Multicast, GRE, SCTP, SPAN, ACL

2018-09-25 Thread Juraj Linkeš
I created the new tickets under CSIT, which is an oversight, but I fixed it and 
now the tickets are under VPP:

*GRE crash<https://jira.fd.io/browse/VPP-1429>

*SCTP failure/crash<https://jira.fd.io/browse/VPP-1430>

o   Me and Marco resolved a similar issue in the past, but this could be 
something different

*SPAN crash<https://jira.fd.io/browse/VPP-1434>

*IP4 failures<https://jira.fd.io/browse/VPP-1433>

o   These are multiple failures and I'm not sure that grouping them together is 
correct

*L2 failures/crash<https://jira.fd.io/browse/VPP-1432>

o   As in IP4, these are multiple failures and I'm not sure that grouping them 
together is correct

*ECMP failure<https://jira.fd.io/browse/VPP-1431>

*Multicast failure<https://jira.fd.io/browse/VPP-1428>

*ACL failure<https://jira.fd.io/browse/VPP-1418>

o   I'm already working with Andrew on fixing this

There seem to be a lot of people who touched the code. I would like to ask the 
authors to tell me who to turn to (at least for IP and L2).

Regards,
Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Monday, September 24, 2018 6:26 PM
To: vpp-dev 
Cc: csit-dev 
Subject: [vpp-dev] Make test failures on ARM

Hi vpp-devs,

Especially ARM vpp devs :)

We're experiencing a number of failures on Cavium ThunderX and we'd like to fix 
the issues. I've created a number of Jira tickets:

*GRE crash<https://jira.fd.io/browse/CSIT-1307>

*SCTP failure/crash<https://jira.fd.io/browse/CSIT-1313>

o   Me and Marco resolved a similar issue in the past, but this could be 
something different

*SPAN crash<https://jira.fd.io/browse/CSIT-1309>

*IP4 failures<https://jira.fd.io/browse/CSIT-1310>

o   These are multiple failures and I'm not sure that grouping them together is 
correct

*L2 failures/crash<https://jira.fd.io/browse/CSIT-1308>

o   As in IP4, these are multiple failures and I'm not sure that grouping them 
together is correct

*ECMP failure<https://jira.fd.io/browse/CSIT-1311>

*Multicast failure<https://jira.fd.io/browse/CSIT-1312>

*ACL failure<https://jira.fd.io/browse/VPP-1418>

o   I'm already working with Andrew on fixing this

The reason I didn't reach out to all authors individually is that I wanted 
someone to look at the issues and assess whether there's an overlap (or I 
grouped the failures improperly), since some of the failures look similar.

Then there's the issue of hardware availability - if anyone willing to help has 
access to fd.io lab, I can setup access to a Cavium ThunderX, otherwise we 
could set up a call if further debugging is needed.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10636): https://lists.fd.io/g/vpp-dev/message/10636
Mute This Topic: https://lists.fd.io/mt/26218436/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Make test failures on ARM - IP4, L2, ECMP, Multicast, GRE, SCTP, SPAN, ACL

2018-09-27 Thread Juraj Linkeš
Hi Neale,

I had a debugging session with Andrew about failing ACL testcases and he 
uncovered that the root cause is in l2 and ip4:

1) the timeout and big files

for some reason in the bridged setup done by a testcase, the VPP reinjects the 
packet being sent onto one of the interfaces of the bridge, in a loop.

The following crude diff eliminates the problem and the tests pass: 
https://paste.ubuntu.com/p/CSMYjXsZyX/

2) there is a failure of a mac acl testcase in the routed scenario, where the 
ip lookup picks up incorrect next index:

The following shows the problem for the properly and improperly routed packet:

https://paste.ubuntu.com/p/wTWWNhwSKY/
Could you advise on the first issue (Andrew wasn't sure the diff is a proper 
fix) and help debug the other issue (or, most likely related, issues 
https://jira.fd.io/browse/VPP-1432 and https://jira.fd.io/browse/VPP-1433?) If 
not, could you suggest someone so I can ask them?

Thanks,
Juraj

From: Juraj Linkeš
Sent: Tuesday, September 25, 2018 10:07 AM
To: 'Juraj Linkeš' ; vpp-dev 
Cc: csit-dev 
Subject: RE: Make test failures on ARM - IP4, L2, ECMP, Multicast, GRE, SCTP, 
SPAN, ACL

I created the new tickets under CSIT, which is an oversight, but I fixed it and 
now the tickets are under VPP:

*GRE crash<https://jira.fd.io/browse/VPP-1429>

*SCTP failure/crash<https://jira.fd.io/browse/VPP-1430>

o   Me and Marco resolved a similar issue in the past, but this could be 
something different

*SPAN crash<https://jira.fd.io/browse/VPP-1434>

*IP4 failures<https://jira.fd.io/browse/VPP-1433>

o   These are multiple failures and I'm not sure that grouping them together is 
correct

*L2 failures/crash<https://jira.fd.io/browse/VPP-1432>

o   As in IP4, these are multiple failures and I'm not sure that grouping them 
together is correct

*ECMP failure<https://jira.fd.io/browse/VPP-1431>

*Multicast failure<https://jira.fd.io/browse/VPP-1428>

*ACL failure<https://jira.fd.io/browse/VPP-1418>

o   I'm already working with Andrew on fixing this

There seem to be a lot of people who touched the code. I would like to ask the 
authors to tell me who to turn to (at least for IP and L2).

Regards,
Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Monday, September 24, 2018 6:26 PM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Cc: csit-dev mailto:csit-...@lists.fd.io>>
Subject: [vpp-dev] Make test failures on ARM

Hi vpp-devs,

Especially ARM vpp devs :)

We're experiencing a number of failures on Cavium ThunderX and we'd like to fix 
the issues. I've created a number of Jira tickets:

*GRE crash<https://jira.fd.io/browse/CSIT-1307>

*SCTP failure/crash<https://jira.fd.io/browse/CSIT-1313>

o   Me and Marco resolved a similar issue in the past, but this could be 
something different

*SPAN crash<https://jira.fd.io/browse/CSIT-1309>

*IP4 failures<https://jira.fd.io/browse/CSIT-1310>

o   These are multiple failures and I'm not sure that grouping them together is 
correct

*L2 failures/crash<https://jira.fd.io/browse/CSIT-1308>

o   As in IP4, these are multiple failures and I'm not sure that grouping them 
together is correct

*ECMP failure<https://jira.fd.io/browse/CSIT-1311>

*Multicast failure<https://jira.fd.io/browse/CSIT-1312>

*ACL failure<https://jira.fd.io/browse/VPP-1418>

o   I'm already working with Andrew on fixing this

The reason I didn't reach out to all authors individually is that I wanted 
someone to look at the issues and assess whether there's an overlap (or I 
grouped the failures improperly), since some of the failures look similar.

Then there's the issue of hardware availability - if anyone willing to help has 
access to fd.io lab, I can setup access to a Cavium ThunderX, otherwise we 
could set up a call if further debugging is needed.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#10680): https://lists.fd.io/g/vpp-dev/message/10680
Mute This Topic: https://lists.fd.io/mt/26218436/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Faulty java version check for JVPP GEN

2018-11-05 Thread Juraj Linkeš
Hi vpp-devs,

I'm trying to get VPP build (actually make verify for ci purposes) to run on 
Ubuntu1804 and I've hit an issue with java:
checking /usr/lib/jvm/java-11-openjdk-arm64 for Java 8 compiler... found 
version 11
...
/vpp/build-data/../extras/japi/java/jvpp-core/io/fd/vpp/jvpp/core/examples/L2AclExample.java:37:
 error: package javax.xml.bind does not exist
import javax.xml.bind.DatatypeConverter;
 ^
/vpp/build-data/../extras/japi/java/jvpp-core/io/fd/vpp/jvpp/core/examples/L2AclExample.java:123:
 error: cannot find symbol
System.out.println("Mask hex: " + 
DatatypeConverter.printHexBinary(reply.mask));
  ^
  symbol:   variable DatatypeConverter
  location: class L2AclExample

The package in question was deprecated in Java 9 and removed in Java 11, so 
that's the reason for the failure.

Java 8 present on the system and Java 11 gets installed as part of dependency 
installation of make verify (default-jdk-headless), so a workaround along the 
lines of removing Java 11 or reconfiguring Java is not really on the table.

If we want to install default-jdk-headless as part of make install-dep, we 
should be prepared for Java 11 being installed. What's the right way to fix 
this? Use a package other than javax.xml.bind or modify how the build searches 
for Java compilers?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#0): https://lists.fd.io/g/vpp-dev/message/0
Mute This Topic: https://lists.fd.io/mt/27860462/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP unit test failures (l2fib, ip4 reassemble) on environments other than x86 Ubuntu 1804

2018-11-07 Thread Juraj Linkeš
Hi vpp-devs,

There are some failures in unit tests that have cropped up in master on ARM:

*https://jira.fd.io/browse/VPP-1475

*https://jira.fd.io/browse/VPP-1476

I tried Ubuntu1604 and 1804 on ARM and the issues are reproducible on both. 
I've noticed that we switched testing from ubuntu1604 to 1804 on x86 in CI, so 
I've also tried some other OS's on x86 and sure enough, I saw both issues on 
ubuntu1604 and Debian9.

Could someone look into these?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11142): https://lists.fd.io/g/vpp-dev/message/11142
Mute This Topic: https://lists.fd.io/mt/28023151/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Faulty java version check for JVPP GEN

2018-11-07 Thread Juraj Linkeš
There's actually a JIRA ticket opened for this, so it seems I'm not the only 
one who's looking for a solution: https://jira.fd.io/browse/VPP-1477

Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Monday, November 5, 2018 5:36 PM
To: vpp-dev 
Cc: sirshak@arm.com
Subject: [vpp-dev] Faulty java version check for JVPP GEN

Hi vpp-devs,

I'm trying to get VPP build (actually make verify for ci purposes) to run on 
Ubuntu1804 and I've hit an issue with java:
checking /usr/lib/jvm/java-11-openjdk-arm64 for Java 8 compiler... found 
version 11
...
/vpp/build-data/../extras/japi/java/jvpp-core/io/fd/vpp/jvpp/core/examples/L2AclExample.java:37:
 error: package javax.xml.bind does not exist
import javax.xml.bind.DatatypeConverter;
 ^
/vpp/build-data/../extras/japi/java/jvpp-core/io/fd/vpp/jvpp/core/examples/L2AclExample.java:123:
 error: cannot find symbol
System.out.println("Mask hex: " + 
DatatypeConverter.printHexBinary(reply.mask));
  ^
  symbol:   variable DatatypeConverter
  location: class L2AclExample

The package in question was deprecated in Java 9 and removed in Java 11, so 
that's the reason for the failure.

Java 8 present on the system and Java 11 gets installed as part of dependency 
installation of make verify (default-jdk-headless), so a workaround along the 
lines of removing Java 11 or reconfiguring Java is not really on the table.

If we want to install default-jdk-headless as part of make install-dep, we 
should be prepared for Java 11 being installed. What's the right way to fix 
this? Use a package other than javax.xml.bind or modify how the build searches 
for Java compilers?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11143): https://lists.fd.io/g/vpp-dev/message/11143
Mute This Topic: https://lists.fd.io/mt/27860462/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Unit test failures in master

2018-11-14 Thread Juraj Linkeš
Hi vpp-devs,

There are a couple of make test failures in master:

*IPv4 Reassembly - https://jira.fd.io/browse/VPP-1475

o   Just one test fails - random order reassembly, there are missing packets

*L2 FIB - https://jira.fd.io/browse/VPP-1476

o   Multiple failures - the first test fails because of missing packets and the 
subsequent tests fail because all packets are missing

I've tested this on multiple environments:

*Ubuntu1604 VM on x86 windows host

*Ubuntu1804 VM on x86 windows host

*Ubuntu1804 on ARM host

And both fail on all of these. They pass in CI, but that's Ubuntu1804 container 
on a Linux host (I don't know what OS the host is).

Could some look into this? It should be fairly easy to reproduce locally in a 
VM.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11244): https://lists.fd.io/g/vpp-dev/message/11244
Mute This Topic: https://lists.fd.io/mt/28134293/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Unit test failures in master

2018-11-15 Thread Juraj Linkeš
I've looked into these failure a bit more. Both of these failures produce a 
core with debug build, so I was able to do a bit of gdb debugging.

I attached gdb logs to both tickets - please have a look.

Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Wednesday, November 14, 2018 11:59 AM
To: vpp-dev 
Cc: Neale Ranns (nranns) 
Subject: [vpp-dev] Unit test failures in master

Hi vpp-devs,

There are a couple of make test failures in master:

*IPv4 Reassembly - https://jira.fd.io/browse/VPP-1475

o   Just one test fails - random order reassembly, there are missing packets

*L2 FIB - https://jira.fd.io/browse/VPP-1476

o   Multiple failures - the first test fails because of missing packets and the 
subsequent tests fail because all packets are missing

I've tested this on multiple environments:

*Ubuntu1604 VM on x86 windows host

*Ubuntu1804 VM on x86 windows host

*Ubuntu1804 on ARM host

And both fail on all of these. They pass in CI, but that's Ubuntu1804 container 
on a Linux host (I don't know what OS the host is).

Could some look into this? It should be fairly easy to reproduce locally in a 
VM.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11259): https://lists.fd.io/g/vpp-dev/message/11259
Mute This Topic: https://lists.fd.io/mt/28134293/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] test-ext failures seen on master

2018-11-15 Thread Juraj Linkeš
This makes sense from my point. I've seen some tests fail with one of the 
builds but not with the other. This would obviously increase the demand for 
compute resources, but we could mitigate that with running tests in parallel 
(the TEST_JOBS variable).

Juraj

-Original Message-
From: Klement Sekera via Lists.Fd.Io [mailto:ksekera=cisco@lists.fd.io] 
Sent: Thursday, November 15, 2018 11:47 AM
To: vpp-dev@lists.fd.io
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] test-ext failures seen on master

Hi all,

I'm seeing failures on master branch on ubuntu 18.04 when invoking `make 
test-ext`

FAILURES AND ERRORS IN TESTS:
  Testcase name: VCL Cut Thru Tests 
FAILURE: run VCL cut thru uni-directional (multiple sockets) test
  Testcase name: L2BD Test Case 
  ERROR: L2BD MAC learning dual-loop test
  ERROR: L2BD MAC learning dual-loop test
  ERROR: L2BD MAC learning single-loop test
  Testcase name: Classifier PBR Test Case 
  ERROR: IP PBR test

digging a bit further, L2BD failure also occurs in `make test-debug`, while it 
doesn't appear in `make test`. This is a core due to assert.

I think we should run both `make test` (release build) and `make test-debug` 
(debug build) as part of verify process. If it was up to me, I would run all 
the tests which we have in the verify job.

Thoughts?

Regards,
Klement

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11260): https://lists.fd.io/g/vpp-dev/message/11260
Mute This Topic: https://lists.fd.io/mt/28144643/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Enabling unit tests in VPP job for ARM and make test failures

2018-11-16 Thread Juraj Linkeš
Hi vpp-dev,

As many of you already know, we tried enabling unit tests in ARM VPP jobs the 
last release cycle, but we only managed to fix all make test failures during 
release procedures and we agreed that enabling it would be better after 1810 is 
released.

Enabling the unit testing (i.e. running full make verify) when there are 
failures is, in my opinion, a fool's errand. If people see consistent failures 
that are not related to their patches, they're much less likely to investigate 
whether this time there's really a legitimate failure and more likely to just 
ignore the job since it's just not working. That means we'll really need to 
iron out all of the failures before doing anything else.

So let's talk about the failures. There were new failures in master almost 
immediately after the release and these are seemingly also reproducible on x86 
(although I have no idea why they don't show up in CI):

*VPP-1475 - IP4 random reassembly 
failure

*VPP-1476 - L2fib missing packets

Not long after these issues, new issues cropped up. At one point even the 
sanity test didn't pass (that was addressed by 
https://gerrit.fd.io/r/#/c/15841/, thanks, Neale!) and there was an issue with 
sessions tests (fixed by https://gerrit.fd.io/r/#/c/15947/, thanks, Florin!). 
But there are still more issues that need our attention:

*VPP-1490 - Looks like traffic 
isn't working on ARM on Ubuntu1604

*VPP-1491 - GBD l2 endpoint 
learning. The tests actually pass with the debug build

*VPP-1497 - Parallel test execution 
on ARM produced many more failures. I haven't investigated this much yet

*And there is a new failure in a CDP test, this is not in Jira yet 
(there are some problems with accessing stuff in lab, curses!)

This very much seems like a game of whack-a-mole - we fix a few issues and new 
appear right away. This might suggest that the current approach of me finding 
issues on an ARM server and then notifying vpp-dev might not be ideal if we 
want to enable unit testing in 1901 (and we really do! :)). Or maybe this is 
not the right time to enable testing and we should focus on it more a few weeks 
before release? What's the best way to ensure that we'll get testing in as soon 
as possible?

In any case, we'll need a lot of help from you. I urge everyone (or at least a 
few key people) to get access to the FD.io lab (you'll need a GPG key that Ed 
Warnicke or some other trusted anchor will sign and then request access using 
the fd.io helpdesk) so that you can use our hardware we're reserved for this 
purpose. We could also always debug via a call, but that's just not efficient 
and you'll need some ARM hardware for development anyway (or to just fix issues 
that show up in verify jobs).

When it comes to the individual issues, any feedback is appreciated, like just 
the author acknowledging the issue and maybe adding whether they have time to 
look at it or what more information they need.

Let's make VPP development much more smoother for ARM ASAP, guys. :)

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11289): https://lists.fd.io/g/vpp-dev/message/11289
Mute This Topic: https://lists.fd.io/mt/28167603/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Enabling unit tests in VPP job for ARM and make test failures

2018-11-22 Thread Juraj Linkeš
Hi Folks,

There's another approach we could try and that is we could exclude the 
currently failing tests and run just what passes so that we can prevent other 
failures creeping up on us. Of course, we would remove the exclusions when the 
failures are fixed.

I've put together a patch that introduces test blacklisting: 
https://gerrit.fd.io/r/#/c/16118/

I've also updated the corresponding CI patch 
(https://gerrit.fd.io/r/#/c/15251/) with the blacklist.

What do you thing about this approach?

Thanks,
Juraj

From: Juraj Linkeš [mailto:juraj.lin...@pantheon.tech]
Sent: Friday, November 16, 2018 4:58 PM
To: vpp-dev@lists.fd.io
Cc: csit-dev 
Subject: [vpp-dev] Enabling unit tests in VPP job for ARM and make test failures

Hi vpp-dev,

As many of you already know, we tried enabling unit tests in ARM VPP jobs the 
last release cycle, but we only managed to fix all make test failures during 
release procedures and we agreed that enabling it would be better after 1810 is 
released.

Enabling the unit testing (i.e. running full make verify) when there are 
failures is, in my opinion, a fool's errand. If people see consistent failures 
that are not related to their patches, they're much less likely to investigate 
whether this time there's really a legitimate failure and more likely to just 
ignore the job since it's just not working. That means we'll really need to 
iron out all of the failures before doing anything else.

So let's talk about the failures. There were new failures in master almost 
immediately after the release and these are seemingly also reproducible on x86 
(although I have no idea why they don't show up in CI):

*VPP-1475<https://jira.fd.io/browse/VPP-1475> - IP4 random reassembly 
failure

*VPP-1476<https://jira.fd.io/browse/VPP-1476> - L2fib missing packets

Not long after these issues, new issues cropped up. At one point even the 
sanity test didn't pass (that was addressed by 
https://gerrit.fd.io/r/#/c/15841/, thanks, Neale!) and there was an issue with 
sessions tests (fixed by https://gerrit.fd.io/r/#/c/15947/, thanks, Florin!). 
But there are still more issues that need our attention:

*VPP-1490<https://jira.fd.io/browse/VPP-1490> - Looks like traffic 
isn't working on ARM on Ubuntu1604

*VPP-1491<https://jira.fd.io/browse/VPP-1491> - GBD l2 endpoint 
learning. The tests actually pass with the debug build

*VPP-1497<https://jira.fd.io/browse/VPP-1497> - Parallel test execution 
on ARM produced many more failures. I haven't investigated this much yet

*And there is a new failure in a CDP test, this is not in Jira yet 
(there are some problems with accessing stuff in lab, curses!)

This very much seems like a game of whack-a-mole - we fix a few issues and new 
appear right away. This might suggest that the current approach of me finding 
issues on an ARM server and then notifying vpp-dev might not be ideal if we 
want to enable unit testing in 1901 (and we really do! :)). Or maybe this is 
not the right time to enable testing and we should focus on it more a few weeks 
before release? What's the best way to ensure that we'll get testing in as soon 
as possible?

In any case, we'll need a lot of help from you. I urge everyone (or at least a 
few key people) to get access to the FD.io lab (you'll need a GPG key that Ed 
Warnicke or some other trusted anchor will sign and then request access using 
the fd.io helpdesk) so that you can use our hardware we're reserved for this 
purpose. We could also always debug via a call, but that's just not efficient 
and you'll need some ARM hardware for development anyway (or to just fix issues 
that show up in verify jobs).

When it comes to the individual issues, any feedback is appreciated, like just 
the author acknowledging the issue and maybe adding whether they have time to 
look at it or what more information they need.

Let's make VPP development much more smoother for ARM ASAP, guys. :)

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11371): https://lists.fd.io/g/vpp-dev/message/11371
Mute This Topic: https://lists.fd.io/mt/28167603/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Build failing on AArch64

2018-11-27 Thread Juraj Linkeš
Hi Sirshak and Ole,

I'm hitting the same issue. The build fails on a clean repository, but the 
subsequent build works fine, which is fine for local builds, but still needs to 
be fixed.

Running the build with V=2 doesn't actually produce more output. There one more 
bit of information I can provide - this behavior is present on Ubuntu1804 
(4.15.0-38-generic), but builds on Ubuntu1604 (4.4.0-138-generic) work right 
away, which explains why CI didn't catch it.

This is the patch that introduced the issue: https://gerrit.fd.io/r/#/c/16109/

Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Monday, November 26, 2018 9:26 AM
To: Sirshak Das 
Cc: vpp-dev@lists.fd.io; Honnappa Nagarahalli ; 
Juraj Linkeš ; Lijian Zhang (Arm Technology China) 

Subject: Re: [vpp-dev] Build failing on AArch64

Sirshak,

Can you touch one of the .api files and rebuild with V=2 and show the output of 
that?
It might be that vppapigen fails for some reason (or try to run it manually and 
see).

Ole

> On 26 Nov 2018, at 06:48, Sirshak Das 
> mailto:sirshak@arm.com>> wrote:
>
> Hi all,
>
> I am currently facing these build failures in master on AArch64.
>
> [38/1160] Building C object vat/CMakeFiles/vpp_api_test.dir/types.c.o
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commita/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commita/vpp/src/vat/types.c
> In file included from 
> /home/sirdas/code/commita/vpp/src/vpp/api/vpe_all_api_h.h:25,
> from /home/sirdas/code/commita/vpp/src/vpp/api/types.h:20,
> from /home/sirdas/code/commita/vpp/src/vat/types.c:19:
> /home/sirdas/code/commita/vpp/src/vnet/vnet_all_api_h.h:33:10: fatal error: 
> vnet/devices/af_packet/af_packet.api.h: No such file or directory
> #include 
>  ^~~~
> compilation terminated.
> [85/1160] Building C object 
> vnet/CMakeFiles/vnet_cortexa72.dir/ethernet/node.c.o
> ninja: build stopped: subcommand failed.
> Makefile:691: recipe for target 'vpp-build' failed
> make[1]: *** [vpp-build] Error 1
> make[1]: Leaving directory '/home/sirdas/code/commita/vpp/build-root'
> Makefile:366: recipe for target 'build-release' failed
> make: *** [build-release] Error 2
>
> [114/1310] Building C object vat/CMakeFiles/vpp_api_test.dir/types.c.o
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commitb/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commitb/vpp/src/vat/types.c
> In file included from 
> /home/sirdas/code/commitb/vpp/src/vpp/api/vpe_all_api_h.h:25,
> from /home/sirdas/code/commitb/vpp/src/vpp/api/types.h:20,
> from /home/sirdas/code/commitb/vpp/src/vat/types.c:19:
> /home/sirdas/code/commitb/vpp/src/vnet/vnet_all_api_h.h:32:10: fatal error: 
> vnet/bonding/bond.api.h: No such file or directory
> #include 
>  ^
> compilation terminated.
> [161/1310] Building C object 
> vnet/CMakeFiles/vnet_thunderx2t99.dir/ethernet/node.c.o
> ninja: build stopped: subcommand failed.
> Makefile:691: recipe for target 'vpp-build' failed
> make[1]: *** [vpp-build] Error 1
> make[1]: Leaving directory '/home/sirdas/code/commitb/vpp/build-root'
> Makefile:366: recipe for target 'build-release' failed
> make: *** [build-release] Error 2
>
>
> Its all someway or the other related to *.api files and genereated
> header files.
>
> I am not able to isolate any particular commit that did this.
>
> Does anybody know if anything changed off the top of their head ?
>
> Thank you
> Sirshak Das
> -=-=-=-=-=-=-=-=-=-=-=-
> Links: You receive all messages sent to this group.
>
> View/Reply Online (#11400): https://lists.fd.io/g/vpp-dev/message/11400
> Mute This Topic: https://lists.fd.io/mt/28318534/675193
> Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
> Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [otr...@employees.org]
> -=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=

Re: [vpp-dev] Build failing on AArch64

2018-11-27 Thread Juraj Linkeš
Hi Ole,

   I'm hitting the same issue.

   Running the build with V=2 doesn't actually produce more output.

Which means my logs are the same as Sirshak's. But in any case I attached the 
output from a run with V=2.

I can provide other info if there's more you need - or you can try accessing 
one of our ThunderX's in the FD.io lab if you have access.

Thanks,
Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Tuesday, November 27, 2018 5:43 PM
To: Juraj Linkeš 
Cc: Sirshak Das ; vpp-dev@lists.fd.io; Honnappa 
Nagarahalli ; Lijian Zhang (Arm Technology China) 

Subject: Re: [vpp-dev] Build failing on AArch64

Juraj,

Without a make log this is just a guessing game.

Cheers
Ole

On 27 Nov 2018, at 17:34, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:
Hi Sirshak and Ole,

I'm hitting the same issue. The build fails on a clean repository, but the 
subsequent build works fine, which is fine for local builds, but still needs to 
be fixed.

Running the build with V=2 doesn't actually produce more output. There one more 
bit of information I can provide - this behavior is present on Ubuntu1804 
(4.15.0-38-generic), but builds on Ubuntu1604 (4.4.0-138-generic) work right 
away, which explains why CI didn't catch it.

This is the patch that introduced the issue: https://gerrit.fd.io/r/#/c/16109/

Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Monday, November 26, 2018 9:26 AM
To: Sirshak Das mailto:sirshak@arm.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Honnappa Nagarahalli 
mailto:honnappa.nagaraha...@arm.com>>; Juraj 
Linkeš mailto:juraj.lin...@pantheon.tech>>; Lijian 
Zhang (Arm Technology China) mailto:lijian.zh...@arm.com>>
Subject: Re: [vpp-dev] Build failing on AArch64

Sirshak,

Can you touch one of the .api files and rebuild with V=2 and show the output of 
that?
It might be that vppapigen fails for some reason (or try to run it manually and 
see).

Ole

> On 26 Nov 2018, at 06:48, Sirshak Das 
> mailto:sirshak@arm.com>> wrote:
>
> Hi all,
>
> I am currently facing these build failures in master on AArch64.
>
> [38/1160] Building C object vat/CMakeFiles/vpp_api_test.dir/types.c.o
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commita/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commita/vpp/src/vat/types.c
> In file included from 
> /home/sirdas/code/commita/vpp/src/vpp/api/vpe_all_api_h.h:25,
> from /home/sirdas/code/commita/vpp/src/vpp/api/types.h:20,
> from /home/sirdas/code/commita/vpp/src/vat/types.c:19:
> /home/sirdas/code/commita/vpp/src/vnet/vnet_all_api_h.h:33:10: fatal error: 
> vnet/devices/af_packet/af_packet.api.h: No such file or directory
> #include 
>  ^~~~
> compilation terminated.
> [85/1160] Building C object 
> vnet/CMakeFiles/vnet_cortexa72.dir/ethernet/node.c.o
> ninja: build stopped: subcommand failed.
> Makefile:691: recipe for target 'vpp-build' failed
> make[1]: *** [vpp-build] Error 1
> make[1]: Leaving directory '/home/sirdas/code/commita/vpp/build-root'
> Makefile:366: recipe for target 'build-release' failed
> make: *** [build-release] Error 2
>
> [114/1310] Building C object vat/CMakeFiles/vpp_api_test.dir/types.c.o
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commitb/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commitb/vpp/src/vat/types.c
> In file included from 
> /home/sirdas/code/commitb/vpp/src/vpp/api/vpe_all_api_h.h:25,
> from /home/sirdas/code/commitb/vpp/src/vpp/api/types.h:20,
> from /home/sirdas/code/commitb/vpp/src/vat/types.c:19:
> /home/sirdas/code/commitb/vpp/src/vnet/vnet_all_api_h.h:32:10: fatal error: 
> vnet/bonding/bond.api.h: No such file or directory
> #include 
>  ^
> compilation terminated.
> [161/1310] Building C object 
> vnet/CMakeFiles/vnet_thunderx2t99.dir/ethernet/node.c.o
> ninja: build stopped: subcommand fa

Re: [vpp-dev] Verify issues (GRE)

2018-11-29 Thread Juraj Linkeš
Hi Ole,

I've noticed a few thing about the VCL testcases:

-The VCL testcasess are all using the same ports, which makes them 
unsuitable for parallel test runs

-Another thing about these testcases is that when they're don't finish 
properly the sock_test_server and client stay running as zombie processes (and 
thus use up ports). It's easily reproducible locally by interrupting the tests, 
but I'm not sure whether this could actually arise in CI

-Which means that if one testcase finishes improperly (e.g. is killed 
because of a timeout) all of the other VCL testcases will likely also fail

Hope this helps if there's anyone looking into those tests,
Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Wednesday, November 28, 2018 7:56 PM
To: vpp-dev 
Subject: [vpp-dev] Verify issues (GRE)

Guys,

The verify job have been unstable over the last few days.
We see some instability in the Jenkins build system, in the test harness 
itself, and in the tests.
On my 18.04 machine I’m seeing intermittent failures in GRE, GBP, DHCP, VCL.

It looks like Jenkins is functioning correctly now.
Ed and I are also testing a revert of all the changes made to the test 
framework itself over the last couple of days. A bit harsh, but we think this 
might be the quickest way back to some level of stability.

Then we need to fix the tests that are in themselves unstable.

Any volunteers to see if they can figure out why GRE fails?

Cheers,
Ole


GRE Test Case
==
GRE IPv4 tunnel TestsOK
GRE IPv6 tunnel TestsOK
GRE tunnel L2 Tests  OK
19:37:47,505 Unexpected packets captured:
Packet #0:
  0201FF0202FE70A06AD308004500 p.j...E.
0010  002A00013F11219FAC100101AC10 .*?.!...
0020  010204D204D2001672A9343336392033 r.4369 3
0030  2033202D31202D31  3 -1 -1

###[ Ethernet ]###
  dst   = 02:01:00:00:ff:02
  src   = 02:fe:70:a0:6a:d3
  type  = IPv4
###[ IP ]###
 version   = 4
 ihl   = 5
 tos   = 0x0
 len   = 42
 id= 1
 flags =
 frag  = 0
 ttl   = 63
 proto = udp
 chksum= 0x219f
 src   = 172.16.1.1
 dst   = 172.16.1.2
 \options   \
###[ UDP ]###
sport = 1234
dport = 1234
len   = 22
chksum= 0x72a9
###[ Raw ]###
   load  = '4369 3 3 -1 -1'

Ten more packets


###[ UDP ]###
sport = 1234
dport = 1234
len   = 22
chksum= 0x72a9
###[ Raw ]###
   load  = '4369 3 3 -1 -1'

** Ten more packets

Print limit reached, 10 out of 257 packets printed
19:37:47,770 REG: Couldn't remove configuration for object(s):
19:37:47,770 
GRE tunnel VRF Tests ERROR 
[ temp dir used by test case: /tmp/vpp-unittest-TestGRE-hthaHC ]

==
ERROR: GRE tunnel VRF Tests
--
Traceback (most recent call last):
  File "/vpp/16257/test/test_gre.py", line 61, in tearDown
super(TestGRE, self).tearDown()
  File "/vpp/16257/test/framework.py", line 546, in tearDown
self.registry.remove_vpp_config(self.logger)
  File "/vpp/16257/test/vpp_object.py", line 86, in remove_vpp_config
(", ".join(str(x) for x in failed)))
Exception: Couldn't remove configuration for object(s): 1:2.2.2.2/32

==
FAIL: GRE tunnel VRF Tests
--
Traceback (most recent call last):
  File "/vpp/16257/test/test_gre.py", line 787, in test_gre_vrf
remark="GRE decap packets in wrong VRF")
  File "/vpp/16257/test/vpp_pg_interface.py", line 264, in 
assert_nothing_captured
(self.name, remark))
AssertionError: Non-empty capture file present for interface pg0 (GRE decap 
packets in wrong VRF)
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11453): https://lists.fd.io/g/vpp-dev/message/11453
Mute This Topic: https://lists.fd.io/mt/28473762/899915
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [juraj.lin...@pantheon.tech]
-=-=-=-=-=-=-=-=-=-=-=-
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11459): https://lists.fd.io/g/vpp-dev/message/11459
Mute This Topic: https://lists.fd.io/mt/28473762/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Build failing on AArch64

2018-11-29 Thread Juraj Linkeš
Hi Ole,

I tried it and it does solve the issue, thanks!

Sirshak, does it work for you too?

Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Wednesday, November 28, 2018 9:25 AM
To: Sirshak Das 
Cc: Juraj Linkeš ; vpp-dev@lists.fd.io; Honnappa 
Nagarahalli ; Lijian Zhang (Arm Technology China) 

Subject: Re: [vpp-dev] Build failing on AArch64

Sirshak,

Can you try adding:

DEPENDS api_headers

Inside
add_vpp_executable(vpp_api_test ENABLE_EXPORTS

in src/vat/CMakeLists.txt

Cheers,
Ole

> On 28 Nov 2018, at 06:32, Sirshak Das 
> mailto:sirshak@arm.com>> wrote:
>
> It takes 3 iterations to get to a proper build:
>
> First Iteration:
>
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commitc/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commitc/vpp/src/vat/types.c
> In file included from 
> /home/sirdas/code/commitc/vpp/src/vpp/api/vpe_all_api_h.h:25,
>from /home/sirdas/code/commitc/vpp/src/vpp/api/types.h:20,
>from /home/sirdas/code/commitc/vpp/src/vat/types.c:19:
> /home/sirdas/code/commitc/vpp/src/vnet/vnet_all_api_h.h:32:10: fatal error: 
> vnet/bonding/bond.api.h: No such file or directory
> #include 
> ^
>
> Second Iteration:
>
> FAILED: vat/CMakeFiles/vpp_api_test.dir/types.c.o
> ccache /usr/lib/ccache/cc -DHAVE_MEMFD_CREATE -Dvpp_api_test_EXPORTS 
> -I/home/sirdas/code/commitc/vpp/src -I. -Iinclude -march=armv8-a+crc -g -O2 
> -DFORTIFY_SOURCE=2 -fstack-protector -fPIC -Werror   
> -Wno-address-of-packed-member -pthread -MD -MT 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o -MF 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o.d -o 
> vat/CMakeFiles/vpp_api_test.dir/types.c.o   -c 
> /home/sirdas/code/commitc/vpp/src/vat/types.c
> In file included from /home/sirdas/code/commitc/vpp/src/vpp/api/types.h:20,
>from /home/sirdas/code/commitc/vpp/src/vat/types.c:19:
> /home/sirdas/code/commitc/vpp/src/vpp/api/vpe_all_api_h.h:32:10: fatal error: 
> vpp/stats/stats.api.h: No such file or directory
> #include 
> ^~~
> compilation terminated.
> [142/1163] Building C object vat/CMakeFiles/vpp_api_test.dir/api_format.c.o^C
> ninja: build stopped: interrupted by user.
> Makefile:691: recipe for target 'vpp-build' failed
> make[1]: *** [vpp-build] Interrupt
> Makefile:366: recipe for target 'build-release' failed
> make: *** [build-release] Interrupt
>
> Had to kill as it was stuck.
>
> Third Interation:
>
> Finally it got built properly.
>
> This is a manageble error for dev purposes but will give lot false
> negatives for CI.
> Anyone familiar with VAT please help.
>
>
> Thank you
> Sirshak Das
>
> Ole Troan writes:
>
>> Juraj,
>>
>> Seems like a dependency problem. VAT depends on a generated file that hasn’t 
>> been generated yet.
>>
>> Ole
>>
>>> On 27 Nov 2018, at 18:04, Juraj Linkeš 
>>> mailto:juraj.lin...@pantheon.tech>> wrote:
>>>
>>> Hi Ole,
>>>
>>>  I'm hitting the same issue.
>>>
>>>  Running the build with V=2 doesn't actually produce more 
>>> output.
>>>
>>> Which means my logs are the same as Sirshak's. But in any case I attached 
>>> the output from a run with V=2.
>>>
>>> I can provide other info if there's more you need - or you can try 
>>> accessing one of our ThunderX's in the FD.io lab if you have access.
>>>
>>> Thanks,
>>> Juraj
>>>
>>> From: Ole Troan [mailto:otr...@employees.org]
>>> Sent: Tuesday, November 27, 2018 5:43 PM
>>> To: Juraj Linkeš 
>>> mailto:juraj.lin...@pantheon.tech>>
>>> Cc: Sirshak Das mailto:sirshak@arm.com>>; 
>>> vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Honnappa Nagarahalli 
>>> mailto:honnappa.nagaraha...@arm.com>>; Lijian 
>>> Zhang (Arm Technology China) 
>>> mailto:lijian.zh...@arm.com>>
>>> Subject: Re: [vpp-dev] Build failing on AArch64
>>>
>>> Juraj,
>>>
>>> Without a make log this is just a guessing game.
>>>
>>> Cheers
>>> Ole
>>>
>>> On 27 Nov 2018, at 17:34, Juraj Linkeš 
>&g

Re: [vpp-dev] Verify issues (GRE)

2018-12-03 Thread Juraj Linkeš
Hi Florin,

So the tests should work fine in parallel, thanks for the clarification.

I tried running the tests again and I could reproduce it with keyboard 
interrupt or when the test produced a core (and was then killed by the parent 
run_tests process), but the logs don't say anything - just that the server and 
client were started and that's where the logs stop. I guess the child vcl 
worker process is not handled in this case, though I wonder why 
run_in_venv_with_cleanup.sh doesn't clean it up.

Juraj

From: Florin Coras [mailto:fcoras.li...@gmail.com]
Sent: Thursday, November 29, 2018 5:04 PM
To: Juraj Linkeš 
Cc: Ole Troan ; vpp-dev 
Subject: Re: [vpp-dev] Verify issues (GRE)

Hi Juraj,

Those tests exercise the stack in vpp, so they don’t use up linux stack ports. 
Moreover, both cut-through and through-the-stack tests use self.shm_prefix when 
connecting to vpp’s binary api. So, as long as that variable is properly 
updated, VCL and implicitly LDP will attach and use ports on the right vpp 
instance.

As for sock_test_client/server not being properly killed, did you find 
something in the logs that would indicate why it happened?

Florin


On Nov 29, 2018, at 3:18 AM, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi Ole,

I've noticed a few thing about the VCL testcases:
-The VCL testcasess are all using the same ports, which makes them 
unsuitable for parallel test runs
-Another thing about these testcases is that when they're don't finish 
properly the sock_test_server and client stay running as zombie processes (and 
thus use up ports). It's easily reproducible locally by interrupting the tests, 
but I'm not sure whether this could actually arise in CI
-Which means that if one testcase finishes improperly (e.g. is killed 
because of a timeout) all of the other VCL testcases will likely also fail

Hope this helps if there's anyone looking into those tests,
Juraj

From: Ole Troan [mailto:otr...@employees.org]
Sent: Wednesday, November 28, 2018 7:56 PM
To: vpp-dev mailto:vpp-dev@lists.fd.io>>
Subject: [vpp-dev] Verify issues (GRE)

Guys,

The verify job have been unstable over the last few days.
We see some instability in the Jenkins build system, in the test harness 
itself, and in the tests.
On my 18.04 machine I’m seeing intermittent failures in GRE, GBP, DHCP, VCL.

It looks like Jenkins is functioning correctly now.
Ed and I are also testing a revert of all the changes made to the test 
framework itself over the last couple of days. A bit harsh, but we think this 
might be the quickest way back to some level of stability.

Then we need to fix the tests that are in themselves unstable.

Any volunteers to see if they can figure out why GRE fails?

Cheers,
Ole


GRE Test Case
==
GRE IPv4 tunnel TestsOK
GRE IPv6 tunnel TestsOK
GRE tunnel L2 Tests  OK
19:37:47,505 Unexpected packets captured:
Packet #0:
  0201FF0202FE70A06AD308004500 p.j...E.
0010  002A00013F11219FAC100101AC10 .*?.!...
0020  010204D204D2001672A9343336392033 r.4369 3
0030  2033202D31202D31  3 -1 -1

###[ Ethernet ]###
  dst   = 02:01:00:00:ff:02
  src   = 02:fe:70:a0:6a:d3
  type  = IPv4
###[ IP ]###
 version   = 4
 ihl   = 5
 tos   = 0x0
 len   = 42
 id= 1
 flags =
 frag  = 0
 ttl   = 63
 proto = udp
 chksum= 0x219f
 src   = 172.16.1.1
 dst   = 172.16.1.2
 \options   \
###[ UDP ]###
sport = 1234
dport = 1234
len   = 22
chksum= 0x72a9
###[ Raw ]###
   load  = '4369 3 3 -1 -1'

Ten more packets


###[ UDP ]###
sport = 1234
dport = 1234
len   = 22
chksum= 0x72a9
###[ Raw ]###
   load  = '4369 3 3 -1 -1'

** Ten more packets

Print limit reached, 10 out of 257 packets printed
19:37:47,770 REG: Couldn't remove configuration for object(s):
19:37:47,770 
GRE tunnel VRF Tests ERROR 
[ temp dir used by test case: /tmp/vpp-unittest-TestGRE-hthaHC ]

==
ERROR: GRE tunnel VRF Tests
--
Traceback (most recent call last):
  File "/vpp/16257/test/test_gre.py", line 61, in tearDown
super(TestGRE, self).tearDown()
  File "/vpp/16257/test/framework.py", line 546, in tearDown
self.registry.remove_vpp_config(self.logger)
  File "/vpp/16257/test/vpp_object.py", line 86, in remove_vpp_conf

Re: [vpp-dev] String tests failures

2018-12-06 Thread Juraj Linkeš
Dave, I was able to verify that the fix works and thus removed the test from 
https://gerrit.fd.io/r/#/c/16282/.

Thanks

From: Dave Barach (dbarach) [mailto:dbar...@cisco.com]
Sent: Wednesday, December 5, 2018 2:52 PM
To: Lijian Zhang (Arm Technology China) 
Cc: Juraj Linkeš ; Damjan Marion ; 
Steven Luong (sluong) ; vpp-dev@lists.fd.io
Subject: RE: String tests failures

See https://gerrit.fd.io/r/#/c/16352, which computes strlen if necessary for a 
precise copy-overlap check. Strncpy_s was always slightly wrong in this regard.

The "make test" failure showed up [only] on aarch64 due to test code stack 
variable placement / alignment differences between aarch64 and x86_64.

HTH... Dave

From: Lijian Zhang (Arm Technology China) 
mailto:lijian.zh...@arm.com>>
Sent: Wednesday, December 5, 2018 4:39 AM
To: Dave Barach (dbarach) mailto:dbar...@cisco.com>>
Cc: Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>>; Dave Barach 
(dbarach) mailto:dbar...@cisco.com>>; Damjan Marion 
mailto:dmar...@me.com>>
Subject: String tests failures

Hi Dave,
StringTest is failing on ARM machines.
I narrowed down the problem and commit an internal code review as below.
Once the internal code review process is passed, I'll upstream the patch for 
community review.

#define clib_strncpy(d,s,n) strncpy_s_inline(d,CLIB_STRING_MACRO_MAX,s,n)

errno_t
strncpy_s (char *__restrict__ dest, rsize_t dmax,
   const char *__restrict__ src, rsize_t n);

always_inline errno_t
strncpy_s_inline (char *__restrict__ dest, rsize_t dmax,
  const char *__restrict__ src, rsize_t n)
{
   u8 bad;
   uword low, hi;
   rsize_t m;
   errno_t status = EOK;

   bad = (dest == 0) + (dmax == 0) + (src == 0) + (n == 0);
   if (PREDICT_FALSE (bad != 0))
 {
   /* Not actually trying to copy anything is OK */
   if (n == 0)
return EOK;
   if (dest == 0)
clib_c11_violation ("dest NULL");
   if (src == 0)
clib_c11_violation ("src NULL");
   if (dmax == 0)
clib_c11_violation ("dmax 0");
   return EINVAL;
 }

   if (PREDICT_FALSE (n >= dmax))
 {
   /* Relax and use strnlen of src */
   clib_c11_violation ("n >= dmax");
   m = clib_strnlen (src, dmax);
   if (m >= dmax)
{
  /* Truncate, adjust copy length to fit dest */
  m = dmax - 1;
  status = EOVERFLOW;
}
 }
   else
-m = n;
+m = clib_strnlen (src, n);

   /* Check for src/dst overlap, which is not allowed */
   low = (uword) (src < dest ? src : dest);
   hi = (uword) (src < dest ? dest : src);

   if (PREDICT_FALSE (low + (m - 1) >= hi))
 {
   clib_c11_violation ("src/dest overlap");
   return EINVAL;
 }

   clib_memcpy_fast (dest, src, m);
   dest[m] = '\0';
   return status;
}
IMPORTANT NOTICE: The contents of this email and any attachments are 
confidential and may also be privileged. If you are not the intended recipient, 
please notify the sender immediately and do not disclose the contents to any 
other person, use it for any purpose, or store or copy the information in any 
medium. Thank you.
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11507): https://lists.fd.io/g/vpp-dev/message/11507
Mute This Topic: https://lists.fd.io/mt/28611039/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

2018-12-11 Thread Juraj Linkeš
Hi folks,

I've ran into an issue with hugepages on a Cavium ThunderX soc. I was trying to 
bind a physical interface to VPP. When using 1GB hugepages the interface seems 
to be working fine (well, at least I saw the interface in VPP and I was able to 
configure it and use ping with it), but when using 2MB hugepages the interface 
appeared in error state. The output from show hardware told me this:
VirtualFunctionEthernet1/0/1   1down  VirtualFunctionEthernet1/0/1
  Ethernet address 40:8d:5c:e7:b1:12
  Cavium ThunderX
carrier down
flags: pmd pmd-init-fail maybe-multiseg
rx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
tx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
pci: device 177d:a034 subsystem 177d:a134 address 0002:01:00.01 numa 0
module: unknown
max rx packet len: 9204
promiscuous: unicast off all-multicast off
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum jumbo-frame
   crc-strip scatter
rx offload active: jumbo-frame crc-strip scatter
tx offload avail:  ipv4-cksum udp-cksum tcp-cksum outer-ipv4-cksum
tx offload active:
rss avail: ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp port
   vxlan geneve nvgre
rss active:ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp
tx burst function: (nil)
rx burst function: (nil)
  Errors:
rte_eth_rx_queue_setup[port:0, errno:-22]: Unknown error -22

I dug around a bit and this seems to be what -22 means:

#define EINVAL  22  /* Invalid argument */
-EINVAL: The size of network buffers which can be allocated from the memory 
pool does not fit the various buffer sizes allowed by the device controller.

Is this something you've seen before? Is this a bug? Do I need to do something 
extra if I want to use 2MB hugepages?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11564): https://lists.fd.io/g/vpp-dev/message/11564
Mute This Topic: https://lists.fd.io/mt/28720621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

2018-12-12 Thread Juraj Linkeš
Thanks Damjan.

Nitin, Gorka, do you have any input on this?

Juraj

From: Damjan Marion via Lists.Fd.Io [mailto:dmarion=me@lists.fd.io]
Sent: Tuesday, December 11, 2018 5:21 PM
To: Juraj Linkeš 
Cc: vpp-dev@lists.fd.io
Subject: Re: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

Dear Juraj,

I don't think anybody have experience with ThunderX to help you here.
The facts that other NICs work OK indicates that that this particular driver 
requires something special.
What it is, you will probably need to ask Cavium/Marvell guys...

--
Damjan


On 11 Dec 2018, at 07:56, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi folks,

I've ran into an issue with hugepages on a Cavium ThunderX soc. I was trying to 
bind a physical interface to VPP. When using 1GB hugepages the interface seems 
to be working fine (well, at least I saw the interface in VPP and I was able to 
configure it and use ping with it), but when using 2MB hugepages the interface 
appeared in error state. The output from show hardware told me this:
VirtualFunctionEthernet1/0/1   1down  VirtualFunctionEthernet1/0/1
  Ethernet address 40:8d:5c:e7:b1:12
  Cavium ThunderX
carrier down
flags: pmd pmd-init-fail maybe-multiseg
rx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
tx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
pci: device 177d:a034 subsystem 177d:a134 address 0002:01:00.01 numa 0
module: unknown
max rx packet len: 9204
promiscuous: unicast off all-multicast off
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum jumbo-frame
   crc-strip scatter
rx offload active: jumbo-frame crc-strip scatter
tx offload avail:  ipv4-cksum udp-cksum tcp-cksum outer-ipv4-cksum
tx offload active:
rss avail: ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp port
   vxlan geneve nvgre
rss active:ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp
tx burst function: (nil)
rx burst function: (nil)
  Errors:
rte_eth_rx_queue_setup[port:0, errno:-22]: Unknown error -22

I dug around a bit and this seems to be what -22 means:

#define EINVAL  22  /* Invalid argument */
-EINVAL: The size of network buffers which can be allocated from the memory 
pool does not fit the various buffer sizes allowed by the device controller.

Is this something you've seen before? Is this a bug? Do I need to do something 
extra if I want to use 2MB hugepages?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11564): https://lists.fd.io/g/vpp-dev/message/11564
Mute This Topic: https://lists.fd.io/mt/28720621/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11574): https://lists.fd.io/g/vpp-dev/message/11574
Mute This Topic: https://lists.fd.io/mt/28720621/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Enable arm vpp master testing in ci

2018-12-12 Thread Juraj Linkeš
Hello,

We're trying to enable testing for vpp master on ARM in CI. Here's the patch: 
https://gerrit.fd.io/r/#/c/15251/

All of the affected jobs have been tested in sandbox and are working well. 
Please review the patch and give it a +1 if you think it's okay so that Ed can 
finally merge it.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11575): https://lists.fd.io/g/vpp-dev/message/11575
Mute This Topic: https://lists.fd.io/mt/28731088/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

2018-12-12 Thread Juraj Linkeš
It would be great if we could figure out the reason.

The contiv-vpp documentation needed to spell out that you need 1GB hugepages 
precisely because 2MB pages don't work.

And there's also dpdk documentation [0] which doesn't mention the hugepages 
problem, making it seem like it shoudn't be an issue, but maybe that's a 
documentation overight.

Juraj

[0] https://doc.dpdk.org/guides-18.11/nics/thunderx.html

From: Gorka Garcia [mailto:ggar...@marvell.com]
Sent: Wednesday, December 12, 2018 11:03 AM
To: Juraj Linkeš ; dmar...@me.com; Nitin Saxena 

Cc: vpp-dev@lists.fd.io; Sirshak Das 
Subject: RE: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

I am not sure on the reason for this, but it is documented here:

https://github.com/contiv/vpp/blob/master/docs/arm64/MANUAL_INSTALL_CAVIUM.md

“To mention the most important thing from DPDK setup instructions you need to 
setup 1GB hugepages. The allocation of hugepages should be done at boot time or 
as soon as possible after system boot to prevent memory from being fragmented 
in physical memory. Add parameters hugepagesz=1GB hugepages=16 
default_hugepagesz=1GB to the file /etc/default/grub”

Gorka

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Juraj Linkeš
Sent: Wednesday, December 12, 2018 9:07 AM
To: dmar...@me.com<mailto:dmar...@me.com>; 
gorka.gar...@cavium.com<mailto:gorka.gar...@cavium.com>; Nitin Saxena 
mailto:nitin.sax...@cavium.com>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; Sirshak Das 
mailto:sirshak@arm.com>>
Subject: [EXT] Re: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

External Email


External Email
Thanks Damjan.

Nitin, Gorka, do you have any input on this?

Juraj

From: Damjan Marion via Lists.Fd.Io [mailto:dmarion=me....@lists.fd.io]
Sent: Tuesday, December 11, 2018 5:21 PM
To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>>
Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: Re: [vpp-dev] 2MB vs 1GB hugepages on ARM ThunderX

Dear Juraj,

I don't think anybody have experience with ThunderX to help you here.
The facts that other NICs work OK indicates that that this particular driver 
requires something special.
What it is, you will probably need to ask Cavium/Marvell guys...

--
Damjan

On 11 Dec 2018, at 07:56, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi folks,

I've ran into an issue with hugepages on a Cavium ThunderX soc. I was trying to 
bind a physical interface to VPP. When using 1GB hugepages the interface seems 
to be working fine (well, at least I saw the interface in VPP and I was able to 
configure it and use ping with it), but when using 2MB hugepages the interface 
appeared in error state. The output from show hardware told me this:
VirtualFunctionEthernet1/0/1   1down  VirtualFunctionEthernet1/0/1
  Ethernet address 40:8d:5c:e7:b1:12
  Cavium ThunderX
carrier down
flags: pmd pmd-init-fail maybe-multiseg
rx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
tx: queues 1 (max 96), desc 1024 (min 0 max 65535 align 1)
pci: device 177d:a034 subsystem 177d:a134 address 0002:01:00.01 numa 0
module: unknown
max rx packet len: 9204
promiscuous: unicast off all-multicast off
vlan offload: strip off filter off qinq off
rx offload avail:  vlan-strip ipv4-cksum udp-cksum tcp-cksum jumbo-frame
   crc-strip scatter
rx offload active: jumbo-frame crc-strip scatter
tx offload avail:  ipv4-cksum udp-cksum tcp-cksum outer-ipv4-cksum
tx offload active:
rss avail: ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp port
   vxlan geneve nvgre
rss active:ipv4 ipv4-tcp ipv4-udp ipv6 ipv6-tcp ipv6-udp
tx burst function: (nil)
rx burst function: (nil)
  Errors:
rte_eth_rx_queue_setup[port:0, errno:-22]: Unknown error -22

I dug around a bit and this seems to be what -22 means:

#define EINVAL  22  /* Invalid argument */
-EINVAL: The size of network buffers which can be allocated from the memory 
pool does not fit the various buffer sizes allowed by the device controller.

Is this something you've seen before? Is this a bug? Do I need to do something 
extra if I want to use 2MB hugepages?

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11564): https://lists.fd.io/g/vpp-dev/message/11564
Mute This Topic: https://lists.fd.io/mt/28720621/675642
Group Owner: vpp-dev+ow...@lists.fd.io<mailto:vpp-dev+ow...@lists.fd.io>
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  
[dmar...@me.com<mailto:dmar...@me.com>]
-=-=-=-=-=-=-=-=-=-=-=-

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11577): https://lists.fd.io/g/vpp-dev/message/11577
Mute This Topic: https://lists

[vpp-dev] Ipv4 random reassembly failure on x86 and ARM

2018-12-20 Thread Juraj Linkeš
Hi Klement and vpp-dev,

https://jira.fd.io/browse/VPP-1522 fixed the issue with an assert we've been 
seeing with random reassembly, however, there's still some other failure in 
that test: https://jira.fd.io/browse/VPP-1475

It seems that not all fragments are sent properly. The run documented in Jira 
shows only 3089 fragments out of 5953 being sent and the test only sees 39 out 
of 257 packets received.

Could you or anyone from vpp-dev who's more familiar with the feature/code 
advise on how to debug this further?

I was able to reproduce this on my local x86 Bionic and Xenial VMs as well as 
our Cavium ThunderX machines (the ones we also use in CI). I'd love to see 
whether anyone else can also reproduce it on an x86 machine outside of CI 
(where the failure doesn't happen).

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11730): https://lists.fd.io/g/vpp-dev/message/11730
Mute This Topic: https://lists.fd.io/mt/28810299/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Ipv4 random reassembly failure on x86 and ARM

2018-12-20 Thread Juraj Linkeš
Thanks for bringing that patch to my attention, I didn't use it (I believe it 
hadn't been merged yet). A quick re-test shows that the failure is gone - 
thanks!

Juraj

-Original Message-
From: Klement Sekera [mailto:ksek...@cisco.com] 
Sent: Thursday, December 20, 2018 1:26 PM
To: Juraj Linkeš 
Cc: vpp-dev@lists.fd.io; Sirshak Das ; Lijian Zhang (Arm 
Technology China) ; Stanislav Chlebec 

Subject: Re: Ipv4 random reassembly failure on x86 and ARM

Is this with https://gerrit.fd.io/r/#/c/16548/ merged?

Quoting Juraj Linkeš (2018-12-20 12:09:12)
>Hi Klement and vpp-dev,
> 
> 
> 
>[1]https://jira.fd.io/browse/VPP-1522 fixed the issue with an assert we've
>been seeing with random reassembly, however, there's still some other
>failure in that test: [2]https://jira.fd.io/browse/VPP-1475
> 
> 
> 
>It seems that not all fragments are sent properly. The run documented in
>Jira shows only 3089 fragments out of 5953 being sent and the test only
>sees 39 out of 257 packets received.
> 
> 
> 
>Could you or anyone from vpp-dev who's more familiar with the feature/code
>advise on how to debug this further?
> 
> 
> 
>I was able to reproduce this on my local x86 Bionic and Xenial VMs as well
>as our Cavium ThunderX machines (the ones we also use in CI). I'd love to
>see whether anyone else can also reproduce it on an x86 machine outside of
>CI (where the failure doesn't happen).
> 
> 
> 
>Thanks,
> 
>Juraj
> 
> References
> 
>Visible links
>1. https://jira.fd.io/browse/VPP-1522
>2. https://jira.fd.io/browse/VPP-1475
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11735): https://lists.fd.io/g/vpp-dev/message/11735
Mute This Topic: https://lists.fd.io/mt/28810299/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] Enable all ARM tests

2019-01-10 Thread Juraj Linkeš
Hi folks,

All of the remaining ARM failures have been fixed and now we need to enable the 
disabled tests for ARM CI:
https://gerrit.fd.io/r/#/c/16581/
https://gerrit.fd.io/r/#/c/16569/

Could someone please merge these? They're simple changes and verify shows that 
the errors were, indeed, fixed.

Thanks,
Juraj
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#11896): https://lists.fd.io/g/vpp-dev/message/11896
Mute This Topic: https://lists.fd.io/mt/28994360/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


Re: [vpp-dev] Jenkins ARM node broken?

2019-02-07 Thread Juraj Linkeš
Hi Ben,

This is an intermittent failure not unique to any job - it happens pretty much 
randomly as far as I can. Try running the verify again (add "reverify" comment).

Maybe Ed knows more details about this issue.

Juraj

-Original Message-
From: Benoit Ganne (bganne) via Lists.Fd.Io 
[mailto:bganne=cisco@lists.fd.io] 
Sent: Thursday, February 7, 2019 2:20 PM
To: vpp-dev 
Cc: vpp-dev@lists.fd.io
Subject: [vpp-dev] Jenkins ARM node broken?

Hi,

Does anyone knows if something is going on with 
vpp-arm-verify-master-ubuntu1804?
I am getting build failures due to agent connection error:
13:40:00 Triggered by Gerrit: https://gerrit.fd.io/r/17379
13:40:01 ERROR: Issue with creating launcher for agent jenkins-2a3794a5b01dd. 
The agent is being disconnected
13:40:01 [EnvInject] - Loading node environment variables.
13:40:01 ERROR: SEVERE ERROR occurs

https://jenkins.fd.io/job/vpp-arm-verify-master-ubuntu1804/907/console

ben
-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.

View/Reply Online (#12207): https://lists.fd.io/g/vpp-dev/message/12207
Mute This Topic: https://lists.fd.io/mt/29689190/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub  [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-


[vpp-dev] VPP Device jobs randomly failing

2020-11-25 Thread Juraj Linkeš
Hi Damjan, Benoit,

In the CSIT call I've learned that you were looking into why VPP Device jobs 
sometimes fail. I'm working on adding VPP Device jobs for arm and we're seeing 
the same issue that's behind these random failures, possibly even more 
frequently.

There are two high level observations I'll start with:

*The surface level issue is that VPP doesn't return interface dump info 
from all interfaces (we're using 2 VFs in the tests and sometimes we're getting 
info from only 1 and sometimes not even that)

*The issue happens only when there are multiple jobs running on the 
same server

Looking at logs, I was able to figure out that the failure is tied to VPP 
startup, particularly this in DPDK plugin:
2020/11/09 16:05:39:251 notice dpdk   EAL: Probe PCI driver: 
net_i40e_vf (8086:154c) device: :91:02.5 (socket 1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_check_api_version(): 
PF/VF API version mismatch:(0.0)-(1.1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_init_vf(): check_api 
version failed
2020/11/09 16:05:39:251 notice dpdk   i40evf_dev_init(): Init vf 
failed
2020/11/09 16:05:39:251 notice dpdk   EAL: Releasing pci mapped 
resource for :91:02.5
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101014000
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101024000
2020/11/09 16:05:39:251 notice dpdk   EAL: Requested device 
:91:02.5 cannot be used

There are multiple variations of the same failure (DPDK failing to talk to the 
PF). I've documented them here: https://jira.fd.io/browse/VPP-1943

This leads me to think we're dealing with a race condition when multiple VPPs 
are trying to access the same PF (they're using different VFs that belong to 
the same PF) during VPP startup.

In case this is a DPDK bug, I've created a bug in their bugzilla: 
https://bugs.dpdk.org/show_bug.cgi?id=578

How do we debug this further? Putting together a script that loops over 
multiple VPPs starting at the same time should reproduce this issue, but I 
don't know what to look for. We could also try updating firmware/kernel (for a 
newer vfio-pci version). I've documented the versions we use on aarch64/x86_64 
in the Jira ticket.

What do you think?
Juraj

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18137): https://lists.fd.io/g/vpp-dev/message/18137
Mute This Topic: https://lists.fd.io/mt/78502299/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP Device jobs randomly failing

2020-12-02 Thread Juraj Linkeš
I've looked into this a bit more and I'm seeing an error with avf in logs, but 
that actually doesn't impact VPP negatively:
2020/12/02 09:36:48:219 error  avf:05:10.0: send_to_pf 
failed (timeout 1.269s)
And a different log:
2020/12/02 09:36:47:176 error  avf:05:10.2: aq_desc_enq 
failed (timeout .266s)

When this error appear in logs, the interfaces take a bit longer to show up in 
show int, whereas with dpdk they never show up. This hints at an issue with the 
PF driver, since avf seems to be able to handle the error.

The PF driver is not the latest and I'll try to test with the latest. We'll 
then need to document that users will need to update their Ubuntu18.04 drivers 
or document a know issue with old drivers (if the newer version fixes it). I'll 
update this thread when I test the latest version.

Juraj

From: Damjan Marion (damarion) 
Sent: Wednesday, November 25, 2020 5:06 PM
To: Juraj Linkeš 
Cc: Benoit Ganne (bganne) ; vpp-dev ; 
csit-...@lists.fd.io; Andrew Yourtchenko (ayourtch) 
Subject: Re: VPP Device jobs randomly failing


On 25.11.2020., at 16:55, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi Damjan, Benoit,

In the CSIT call I've learned that you were looking into why VPP Device jobs 
sometimes fail. I'm working on adding VPP Device jobs for arm and we're seeing 
the same issue that's behind these random failures, possibly even more 
frequently.

There are two high level observations I'll start with:
•The surface level issue is that VPP doesn't return interface dump info 
from all interfaces (we're using 2 VFs in the tests and sometimes we're getting 
info from only 1 and sometimes not even that)
•The issue happens only when there are multiple jobs running on the 
same server

Looking at logs, I was able to figure out that the failure is tied to VPP 
startup, particularly this in DPDK plugin:
2020/11/09 16:05:39:251 notice dpdk   EAL: Probe PCI driver: 
net_i40e_vf (8086:154c) device: :91:02.5 (socket 1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_check_api_version(): 
PF/VF API version mismatch:(0.0)-(1.1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_init_vf(): check_api 
version failed
2020/11/09 16:05:39:251 notice dpdk   i40evf_dev_init(): Init vf 
failed
2020/11/09 16:05:39:251 notice dpdk   EAL: Releasing pci mapped 
resource for :91:02.5
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101014000
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101024000
2020/11/09 16:05:39:251 notice dpdk   EAL: Requested device 
:91:02.5 cannot be used

There are multiple variations of the same failure (DPDK failing to talk to the 
PF). I've documented them here: https://jira.fd.io/browse/VPP-1943

This leads me to think we're dealing with a race condition when multiple VPPs 
are trying to access the same PF (they're using different VFs that belong to 
the same PF) during VPP startup.

In case this is a DPDK bug, I've created a bug in their bugzilla: 
https://bugs.dpdk.org/show_bug.cgi?id=578

How do we debug this further? Putting together a script that loops over 
multiple VPPs starting at the same time should reproduce this issue, but I 
don't know what to look for. We could also try updating firmware/kernel (for a 
newer vfio-pci version). I've documented the versions we use on aarch64/x86_64 
in the Jira ticket.

What do you think?


This looks to me like a linux PF driver issue.
Are you using latest intel provided PF driver[1]?

(in case you find it useful I’m maintaining my own DKMS debian packaging [2] 
for intel driver)

Do you see the same issue with avf plugin?

[1] https://sourceforge.net/projects/e1000/files/i40e%20stable/
[2] https://github.com/dmarion/deb-i40e

—
Damjan


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18215): https://lists.fd.io/g/vpp-dev/message/18215
Mute This Topic: https://lists.fd.io/mt/78502299/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP Device jobs randomly failing

2020-12-02 Thread Juraj Linkeš
Updating to the latest PF version (2.13.10) did not help. I'm seeing the same 
failures. We'll tank about other options in the CSIT calls, things like whether 
it makes sense to try newer firmware or vfio-pci versions.

Juraj

From: vpp-dev@lists.fd.io  On Behalf Of Juraj Linkeš
Sent: Wednesday, December 2, 2020 11:05 AM
To: Damjan Marion (damarion) 
Cc: Benoit Ganne (bganne) ; vpp-dev ; 
csit-...@lists.fd.io; Andrew Yourtchenko (ayourtch) 
Subject: Re: [vpp-dev] VPP Device jobs randomly failing

I've looked into this a bit more and I'm seeing an error with avf in logs, but 
that actually doesn't impact VPP negatively:
2020/12/02 09:36:48:219 error  avf:05:10.0: send_to_pf 
failed (timeout 1.269s)
And a different log:
2020/12/02 09:36:47:176 error  avf:05:10.2: aq_desc_enq 
failed (timeout .266s)

When this error appear in logs, the interfaces take a bit longer to show up in 
show int, whereas with dpdk they never show up. This hints at an issue with the 
PF driver, since avf seems to be able to handle the error.

The PF driver is not the latest and I'll try to test with the latest. We'll 
then need to document that users will need to update their Ubuntu18.04 drivers 
or document a know issue with old drivers (if the newer version fixes it). I'll 
update this thread when I test the latest version.

Juraj

From: Damjan Marion (damarion) mailto:damar...@cisco.com>>
Sent: Wednesday, November 25, 2020 5:06 PM
To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>>
Cc: Benoit Ganne (bganne) mailto:bga...@cisco.com>>; vpp-dev 
mailto:vpp-dev@lists.fd.io>>; 
csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>; Andrew Yourtchenko 
(ayourtch) mailto:ayour...@cisco.com>>
Subject: Re: VPP Device jobs randomly failing


On 25.11.2020., at 16:55, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi Damjan, Benoit,

In the CSIT call I've learned that you were looking into why VPP Device jobs 
sometimes fail. I'm working on adding VPP Device jobs for arm and we're seeing 
the same issue that's behind these random failures, possibly even more 
frequently.

There are two high level observations I'll start with:
•The surface level issue is that VPP doesn't return interface dump info 
from all interfaces (we're using 2 VFs in the tests and sometimes we're getting 
info from only 1 and sometimes not even that)
•The issue happens only when there are multiple jobs running on the 
same server

Looking at logs, I was able to figure out that the failure is tied to VPP 
startup, particularly this in DPDK plugin:
2020/11/09 16:05:39:251 notice dpdk   EAL: Probe PCI driver: 
net_i40e_vf (8086:154c) device: :91:02.5 (socket 1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_check_api_version(): 
PF/VF API version mismatch:(0.0)-(1.1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_init_vf(): check_api 
version failed
2020/11/09 16:05:39:251 notice dpdk   i40evf_dev_init(): Init vf 
failed
2020/11/09 16:05:39:251 notice dpdk   EAL: Releasing pci mapped 
resource for :91:02.5
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101014000
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101024000
2020/11/09 16:05:39:251 notice dpdk   EAL: Requested device 
:91:02.5 cannot be used

There are multiple variations of the same failure (DPDK failing to talk to the 
PF). I've documented them here: https://jira.fd.io/browse/VPP-1943

This leads me to think we're dealing with a race condition when multiple VPPs 
are trying to access the same PF (they're using different VFs that belong to 
the same PF) during VPP startup.

In case this is a DPDK bug, I've created a bug in their bugzilla: 
https://bugs.dpdk.org/show_bug.cgi?id=578

How do we debug this further? Putting together a script that loops over 
multiple VPPs starting at the same time should reproduce this issue, but I 
don't know what to look for. We could also try updating firmware/kernel (for a 
newer vfio-pci version). I've documented the versions we use on aarch64/x86_64 
in the Jira ticket.

What do you think?


This looks to me like a linux PF driver issue.
Are you using latest intel provided PF driver[1]?

(in case you find it useful I’m maintaining my own DKMS debian packaging [2] 
for intel driver)

Do you see the same issue with avf plugin?

[1] https://sourceforge.net/projects/e1000/files/i40e%20stable/
[2] https://github.com/dmarion/deb-i40e

—
Damjan


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#18218): https://lists.fd.io/g/vpp-dev/message/18218
Mute This Topic: https://lists.fd.io/mt/78502299/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP Device jobs randomly failing

2020-12-02 Thread Juraj Linkeš
It happens in CI on x86 as well (with the older PF driver), but I didn't 
reproduce it manually, since I did't want to touch x86 hardware in production 
(x86 is running voting jobs, aarch64 is running non-voting, so aarch64 is a bit 
safer to tinker with).

Juraj

From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Wednesday, December 2, 2020 4:48 PM
To: Juraj Linkeš 
Cc: Benoit Ganne (bganne) ; vpp-dev ; 
csit-...@lists.fd.io; Andrew Yourtchenko (ayourtch) 
Subject: Re: [vpp-dev] VPP Device jobs randomly failing


I doubt changing anything around vfio-pci will help. That module doesn’t 
participate in communication between PF and VF.

Indeed, this looks like a PF driver bug. This is on AArch64, right? Are you 
able to repro on x86?

—
Damjan


On 02.12.2020., at 14:44, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Updating to the latest PF version (2.13.10) did not help. I'm seeing the same 
failures. We'll tank about other options in the CSIT calls, things like whether 
it makes sense to try newer firmware or vfio-pci versions.

Juraj

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Juraj Linkeš
Sent: Wednesday, December 2, 2020 11:05 AM
To: Damjan Marion (damarion) mailto:damar...@cisco.com>>
Cc: Benoit Ganne (bganne) mailto:bga...@cisco.com>>; vpp-dev 
mailto:vpp-dev@lists.fd.io>>; 
csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>; Andrew Yourtchenko 
(ayourtch) mailto:ayour...@cisco.com>>
Subject: Re: [vpp-dev] VPP Device jobs randomly failing

I've looked into this a bit more and I'm seeing an error with avf in logs, but 
that actually doesn't impact VPP negatively:
2020/12/02 09:36:48:219 error  avf:05:10.0: send_to_pf 
failed (timeout 1.269s)
And a different log:
2020/12/02 09:36:47:176 error  avf:05:10.2: aq_desc_enq 
failed (timeout .266s)

When this error appear in logs, the interfaces take a bit longer to show up in 
show int, whereas with dpdk they never show up. This hints at an issue with the 
PF driver, since avf seems to be able to handle the error.

The PF driver is not the latest and I'll try to test with the latest. We'll 
then need to document that users will need to update their Ubuntu18.04 drivers 
or document a know issue with old drivers (if the newer version fixes it). I'll 
update this thread when I test the latest version.

Juraj

From: Damjan Marion (damarion) mailto:damar...@cisco.com>>
Sent: Wednesday, November 25, 2020 5:06 PM
To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>>
Cc: Benoit Ganne (bganne) mailto:bga...@cisco.com>>; vpp-dev 
mailto:vpp-dev@lists.fd.io>>; 
csit-...@lists.fd.io<mailto:csit-...@lists.fd.io>; Andrew Yourtchenko 
(ayourtch) mailto:ayour...@cisco.com>>
Subject: Re: VPP Device jobs randomly failing


On 25.11.2020., at 16:55, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi Damjan, Benoit,

In the CSIT call I've learned that you were looking into why VPP Device jobs 
sometimes fail. I'm working on adding VPP Device jobs for arm and we're seeing 
the same issue that's behind these random failures, possibly even more 
frequently.

There are two high level observations I'll start with:
•The surface level issue is that VPP doesn't return interface dump info 
from all interfaces (we're using 2 VFs in the tests and sometimes we're getting 
info from only 1 and sometimes not even that)
•The issue happens only when there are multiple jobs running on the 
same server

Looking at logs, I was able to figure out that the failure is tied to VPP 
startup, particularly this in DPDK plugin:
2020/11/09 16:05:39:251 notice dpdk   EAL: Probe PCI driver: 
net_i40e_vf (8086:154c) device: :91:02.5 (socket 1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_check_api_version(): 
PF/VF API version mismatch:(0.0)-(1.1)
2020/11/09 16:05:39:251 notice dpdk   i40evf_init_vf(): check_api 
version failed
2020/11/09 16:05:39:251 notice dpdk   i40evf_dev_init(): Init vf 
failed
2020/11/09 16:05:39:251 notice dpdk   EAL: Releasing pci mapped 
resource for :91:02.5
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101014000
2020/11/09 16:05:39:251 notice dpdk   EAL: Calling 
pci_unmap_resource for :91:02.5 at 0x2101024000
2020/11/09 16:05:39:251 notice dpdk   EAL: Requested device 
:91:02.5 cannot be used

There are multiple variations of the same failure (DPDK failing to talk to the 
PF). I've documented them here: https://jira.fd.io/browse/VPP-1943

This leads me to think we're dealing with a race condition when multiple VPPs 
are trying to access the same PF (they're using differe

[vpp-dev] AVF interface creation fails on VFs with configured VLAN with newer i40e drivers

2021-09-09 Thread Juraj Linkeš
Hi Damjan, vpp devs,

Upgrading to 2.15.9 i40e driver in CI (from Ubuntu's 2.8.20-k) makes AVF 
interface creation on VFs with configured VLANs fail:
2021/08/30 09:15:27:343 debug avf :91:04.1: request_queues: num_queue_pairs 
1
2021/08/30 09:15:27:434 debug avf :91:04.1: version: major 1 minor 1
2021/08/30 09:15:27:444 debug avf :91:04.1: get_vf_resources: bitmap 
0x180b80a1 (l2 wb-on-itr adv-link-speed vlan-v2 vlan rx-polling rss-pf 
offload-adv-rss-pf offload-fdir-pf)
2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources: num_vsis 1 
num_queue_pairs 1 max_vectors 5 max_mtu 0 vf_cap_flags 0xb0081 (l2 
adv-link-speed vlan rx-polling rss-pf) rss_key_size 52 rss_lut_size 64
2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources_vsi[0]: vsi_id 
27 num_queue_pairs 1 vsi_type 6 qset_handle 21 default_mac_addr 
ba:dc:0f:fe:02:11
2021/08/30 09:15:27:445 debug avf :91:04.1: disable_vlan_stripping
2021/08/30 09:15:27:559 error avf :00:00.0: error: avf_send_to_pf: error 
[v_opcode = 28, v_retval -5]
from avf_create_if: pci-addr :91:04.1

Syslog reveals a bit more:
Aug 30 09:15:27 s55-t13-sut1 kernel: [352169.781206] vfio-pci :91:04.1: 
enabling device ( -> 0002)
Aug 30 09:15:27 s55-t13-sut1 kernel: [352170.140729] i40e :91:00.0: Cannot 
disable vlan stripping when port VLAN is set
Aug 30 09:15:27 s55-t13-sut1 kernel: [352170.140737] i40e :91:00.0: VF 17 
failed opcode 28, retval: -5

It looks like this feature (vlan stripping on VFs with VLANs) was removed in 
later versions of the driver. I don't know what the proper solution here is, 
but adding a configuration option to not disable vlan stripping when creating 
AVF interface sound good to me.

I've documented this in https://jira.fd.io/browse/VPP-1995.

Thanks,
Juraj

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20101): https://lists.fd.io/g/vpp-dev/message/20101
Mute This Topic: https://lists.fd.io/mt/85479187/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] IPSec input/output: default action for non-matching traffic

2021-09-09 Thread Juraj Linkeš
Hi Neale,

Did you have a chance to look at this? For my part, I'm trying to figure out 
how to configure VPP with two DPDK interfaces where I would send bidirectional 
traffic (unencrypted, since the traffic generator in question (T-rex) can't 
send encrypted traffic yet) and I'd match an input rule in each direction - is 
this even possible?

Thanks,
Juraj

From: vpp-dev@lists.fd.io  On Behalf Of Zachary Leaf
Sent: Tuesday, August 17, 2021 10:30 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] IPSec input/output: default action for non-matching traffic

Hi Neale/all,

I've noticed an inconsistency between the default behaviour for non-matching 
packets in the ipsec-input and ipsec-output nodes. I'm not sure if this 
intended or not.

The summary is:
- For ipsec-output, any non-matching packets are dropped by default with the 
same mechanism as per a matching DISCARD rule
- For ipsec-input, any non-matching packets are passed to the next node as if 
they matched a BYPASS rule

Below are some packet traces that show this behaviour. The setup is 2x 
interfaces configured as ip neighbors, with an SPD bound to each. Traffic 
entering an interface is routed through the other interface and vice-versa (see 
attached ipsec-default-drop.txt for full script).

When SPD contains only matching INBOUND BYPASS rules:
00:00:07:340457: dpdk-input
00:00:07:340523: ethernet-input
00:00:07:340566: ip4-input-no-checksum
00:00:07:340601: ipsec4-input-feature
  IPSEC_ESP: sa_id 0 spd 2 policy 1 spi 1000 (0x03e8) seq 3 <- MATCHED 
INBOUND RULE (policy 1)
00:00:07:340642: ip4-lookup
00:00:07:340667: ip4-rewrite
00:00:07:340680: ipsec4-output-feature
  spd 1 policy -1 <- DID NOT MATCH ANY RULES (policy -1)
00:00:07:340693: error-drop <- PACKET DROPPED
00:00:07:340707: drop

When SPD contains only matching OUTBOUND BYPASS rules:
00:00:11:759484: dpdk-input
00:00:11:759570: ethernet-input
00:00:11:759624: ip4-input-no-checksum
00:00:11:759654: ipsec4-input-feature
  UDP: sa_id 4294967295 spd 2 policy -1 spi 612811835 (0x2486c43b) seq 
748568697 < DID NOT MATCH (policy -1)
00:00:11:759689: ip4-lookup <- PACKET *NOT* DROPPED, PASSED ON AS NORMAL
00:00:11:759721: ip4-rewrite
00:00:11:759733: ipsec4-output-feature
  spd 1 policy 1 < MATCHED OUTBOUND RULE
00:00:11:759774: TenGigabitEthernet7/0/0-output
00:00:11:759801: TenGigabitEthernet7/0/0-tx

Looking at the code in ipsec_output.c, we can see that for non-matching 
packets, we call next_node_index = im->error_drop_node_index to drop the 
packet. In ipsec_input.c, we only increment the counter ipsec_unprocessed += 1 
and we move to the next packet as per a matching BYPASS rule. From what I can 
tell, this is the same for both ipv4/ipv6 traffic.

Looking at the IPSec RFC4301 [1], it seems to suggest that the default action 
for both non-matching inbound/output packets should be DISCARD.
e.g.
“Since the SPD-I is just a part of the SPD, if a packet that is looked up in 
the SPD-I cannot be matched to an entry there, then the packet MUST
be discarded” [2]

" ...  the SPD (or associated caches) MUST be consulted during the processing 
of all traffic that crosses the IPsec protection boundary, including IPsec 
management traffic.  If no policy is found in the SPD that matches a packet 
(for either inbound or outbound traffic), the packet MUST be discarded." [3]

"Every SPD SHOULD have a nominal, final entry that catches anything that is 
otherwise unmatched, and discards it.  This ensures that non-IPsec-protected 
traffic that arrives and does not match any SPD-I entry will be discarded." [4]

In the section 5.2.  Processing Inbound IP Traffic (unprotected-to-protected) 
[4], there is also a diagram that seems to support this.

Is there a reason that the input side is setup like this? Unless there is a 
good reason for allowing inbound traffic by default, I would propose to patch 
the ipsec-input node to align with ipsec-output and drop traffic by default.

Best,

Zach

[1]: https://datatracker.ietf.org/doc/html/rfc4301
[2]: https://datatracker.ietf.org/doc/html/rfc4301#section-4.4.1
[3]: https://datatracker.ietf.org/doc/html/rfc4301#section-5
[4]: https://datatracker.ietf.org/doc/html/rfc4301#section-5.2


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20103): https://lists.fd.io/g/vpp-dev/message/20103
Mute This Topic: https://lists.fd.io/mt/84943480/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] IPSec input/output: default action for non-matching traffic

2021-09-09 Thread Juraj Linkeš
A correction, I meant inbound rule, not input rule.

Juraj

From: Juraj Linkeš
Sent: Thursday, September 9, 2021 10:59 AM
To: 'Zachary Leaf' ; 'ne...@graphiant.com' 

Cc: vpp-dev 
Subject: RE: [vpp-dev] IPSec input/output: default action for non-matching 
traffic

Hi Neale,

Did you have a chance to look at this? For my part, I'm trying to figure out 
how to configure VPP with two DPDK interfaces where I would send bidirectional 
traffic (unencrypted, since the traffic generator in question (T-rex) can't 
send encrypted traffic yet) and I'd match an input rule in each direction - is 
this even possible?

Thanks,
Juraj

From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> 
mailto:vpp-dev@lists.fd.io>> On Behalf Of Zachary Leaf
Sent: Tuesday, August 17, 2021 10:30 AM
To: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>
Subject: [vpp-dev] IPSec input/output: default action for non-matching traffic

Hi Neale/all,

I've noticed an inconsistency between the default behaviour for non-matching 
packets in the ipsec-input and ipsec-output nodes. I'm not sure if this 
intended or not.

The summary is:
- For ipsec-output, any non-matching packets are dropped by default with the 
same mechanism as per a matching DISCARD rule
- For ipsec-input, any non-matching packets are passed to the next node as if 
they matched a BYPASS rule

Below are some packet traces that show this behaviour. The setup is 2x 
interfaces configured as ip neighbors, with an SPD bound to each. Traffic 
entering an interface is routed through the other interface and vice-versa (see 
attached ipsec-default-drop.txt for full script).

When SPD contains only matching INBOUND BYPASS rules:
00:00:07:340457: dpdk-input
00:00:07:340523: ethernet-input
00:00:07:340566: ip4-input-no-checksum
00:00:07:340601: ipsec4-input-feature
  IPSEC_ESP: sa_id 0 spd 2 policy 1 spi 1000 (0x03e8) seq 3 <- MATCHED 
INBOUND RULE (policy 1)
00:00:07:340642: ip4-lookup
00:00:07:340667: ip4-rewrite
00:00:07:340680: ipsec4-output-feature
  spd 1 policy -1 <- DID NOT MATCH ANY RULES (policy -1)
00:00:07:340693: error-drop <- PACKET DROPPED
00:00:07:340707: drop

When SPD contains only matching OUTBOUND BYPASS rules:
00:00:11:759484: dpdk-input
00:00:11:759570: ethernet-input
00:00:11:759624: ip4-input-no-checksum
00:00:11:759654: ipsec4-input-feature
  UDP: sa_id 4294967295 spd 2 policy -1 spi 612811835 (0x2486c43b) seq 
748568697 < DID NOT MATCH (policy -1)
00:00:11:759689: ip4-lookup <- PACKET *NOT* DROPPED, PASSED ON AS NORMAL
00:00:11:759721: ip4-rewrite
00:00:11:759733: ipsec4-output-feature
  spd 1 policy 1 < MATCHED OUTBOUND RULE
00:00:11:759774: TenGigabitEthernet7/0/0-output
00:00:11:759801: TenGigabitEthernet7/0/0-tx

Looking at the code in ipsec_output.c, we can see that for non-matching 
packets, we call next_node_index = im->error_drop_node_index to drop the 
packet. In ipsec_input.c, we only increment the counter ipsec_unprocessed += 1 
and we move to the next packet as per a matching BYPASS rule. From what I can 
tell, this is the same for both ipv4/ipv6 traffic.

Looking at the IPSec RFC4301 [1], it seems to suggest that the default action 
for both non-matching inbound/output packets should be DISCARD.
e.g.
“Since the SPD-I is just a part of the SPD, if a packet that is looked up in 
the SPD-I cannot be matched to an entry there, then the packet MUST
be discarded” [2]

" ...  the SPD (or associated caches) MUST be consulted during the processing 
of all traffic that crosses the IPsec protection boundary, including IPsec 
management traffic.  If no policy is found in the SPD that matches a packet 
(for either inbound or outbound traffic), the packet MUST be discarded." [3]

"Every SPD SHOULD have a nominal, final entry that catches anything that is 
otherwise unmatched, and discards it.  This ensures that non-IPsec-protected 
traffic that arrives and does not match any SPD-I entry will be discarded." [4]

In the section 5.2.  Processing Inbound IP Traffic (unprotected-to-protected) 
[4], there is also a diagram that seems to support this.

Is there a reason that the input side is setup like this? Unless there is a 
good reason for allowing inbound traffic by default, I would propose to patch 
the ipsec-input node to align with ipsec-output and drop traffic by default.

Best,

Zach

[1]: https://datatracker.ietf.org/doc/html/rfc4301
[2]: https://datatracker.ietf.org/doc/html/rfc4301#section-4.4.1
[3]: https://datatracker.ietf.org/doc/html/rfc4301#section-5
[4]: https://datatracker.ietf.org/doc/html/rfc4301#section-5.2


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20104): https://lists.fd.io/g/vpp-dev/message/20104
Mute This Topic: https://lists.fd.io/mt/84943480/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] AVF interface creation fails on VFs with configured VLAN with newer i40e drivers

2021-09-09 Thread Juraj Linkeš


From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion via 
lists.fd.io
Sent: Thursday, September 9, 2021 12:01 PM
To: Juraj Linkeš 
Cc: vpp-dev ; Lijian Zhang 
Subject: Re: [vpp-dev] AVF interface creation fails on VFs with configured VLAN 
with newer i40e drivers


On 09.09.2021., at 09:14, Juraj Linkeš 
mailto:juraj.lin...@pantheon.tech>> wrote:

Hi Damjan, vpp devs,

Upgrading to 2.15.9 i40e driver in CI (from Ubuntu's 2.8.20-k) makes AVF 
interface creation on VFs with configured VLANs fail:
2021/08/30 09:15:27:343 debug avf :91:04.1: request_queues: num_queue_pairs 
1
2021/08/30 09:15:27:434 debug avf :91:04.1: version: major 1 minor 1
2021/08/30 09:15:27:444 debug avf :91:04.1: get_vf_resources: bitmap 
0x180b80a1 (l2 wb-on-itr adv-link-speed vlan-v2 vlan rx-polling rss-pf 
offload-adv-rss-pf offload-fdir-pf)
2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources: num_vsis 1 
num_queue_pairs 1 max_vectors 5 max_mtu 0 vf_cap_flags 0xb0081 (l2 
adv-link-speed vlan rx-polling rss-pf) rss_key_size 52 rss_lut_size 64
2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources_vsi[0]: vsi_id 
27 num_queue_pairs 1 vsi_type 6 qset_handle 21 default_mac_addr 
ba:dc:0f:fe:02:11
2021/08/30 09:15:27:445 debug avf :91:04.1: disable_vlan_stripping
2021/08/30 09:15:27:559 error avf :00:00.0: error: avf_send_to_pf: error 
[v_opcode = 28, v_retval -5]
from avf_create_if: pci-addr :91:04.1

Syslog reveals a bit more:
Aug 30 09:15:27 s55-t13-sut1 kernel: [352169.781206] vfio-pci :91:04.1: 
enabling device ( -> 0002)
Aug 30 09:15:27 s55-t13-sut1 kernel: [352170.140729] i40e :91:00.0: Cannot 
disable vlan stripping when port VLAN is set
Aug 30 09:15:27 s55-t13-sut1 kernel: [352170.140737] i40e :91:00.0: VF 17 
failed opcode 28, retval: -5

It looks like this feature (vlan stripping on VFs with VLANs) was removed in 
later versions of the driver. I don't know what the proper solution here is, 
but adding a configuration option to not disable vlan stripping when creating 
AVF interface sound good to me.

I've documented this in https://jira.fd.io/browse/VPP-1995.

Can you try with 2.16.11 and report back same outputs?

I've updated https://jira.fd.io/browse/VPP-1995 with 2.16.11 outputs and 
they're pretty much the same, except the last syslog line is missing.

I just updated https://github.com/dmarion/deb-i40e in case you are using it…

Thanks, we're currently using the intel download pages, but it may be easier to 
use your repo, so we'll keep it in mind for the future.

Juraj

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20112): https://lists.fd.io/g/vpp-dev/message/20112
Mute This Topic: https://lists.fd.io/mt/85479187/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] AVF interface creation fails on VFs with configured VLAN with newer i40e drivers

2021-09-28 Thread Juraj Linkeš


> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Damjan Marion
> via lists.fd.io
> Sent: Wednesday, September 15, 2021 5:54 PM
> To: Juraj Linkeš 
> Cc: vpp-dev ; Lijian Zhang 
> Subject: Re: [vpp-dev] AVF interface creation fails on VFs with configured 
> VLAN
> with newer i40e drivers
> 
> 
> 
> > On 10.09.2021., at 08:53, Juraj Linkeš  wrote:
> >
> >
> >
> > From: vpp-dev@lists.fd.io  On Behalf Of Damjan
> > Marion via lists.fd.io
> > Sent: Thursday, September 9, 2021 12:01 PM
> > To: Juraj Linkeš 
> > Cc: vpp-dev ; Lijian Zhang 
> > Subject: Re: [vpp-dev] AVF interface creation fails on VFs with
> > configured VLAN with newer i40e drivers
> >
> >
> > On 09.09.2021., at 09:14, Juraj Linkeš  wrote:
> >
> > Hi Damjan, vpp devs,
> >
> > Upgrading to 2.15.9 i40e driver in CI (from Ubuntu's 2.8.20-k) makes AVF
> interface creation on VFs with configured VLANs fail:
> > 2021/08/30 09:15:27:343 debug avf :91:04.1: request_queues:
> > num_queue_pairs 1
> > 2021/08/30 09:15:27:434 debug avf :91:04.1: version: major 1 minor
> > 1
> > 2021/08/30 09:15:27:444 debug avf :91:04.1: get_vf_resources:
> > bitmap 0x180b80a1 (l2 wb-on-itr adv-link-speed vlan-v2 vlan rx-polling
> > rss-pf offload-adv-rss-pf offload-fdir-pf)
> > 2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources:
> > num_vsis 1 num_queue_pairs 1 max_vectors 5 max_mtu 0 vf_cap_flags
> > 0xb0081 (l2 adv-link-speed vlan rx-polling rss-pf) rss_key_size 52
> > rss_lut_size 64
> > 2021/08/30 09:15:27:445 debug avf :91:04.1:
> > get_vf_resources_vsi[0]: vsi_id 27 num_queue_pairs 1 vsi_type 6
> > qset_handle 21 default_mac_addr ba:dc:0f:fe:02:11
> > 2021/08/30 09:15:27:445 debug avf :91:04.1: disable_vlan_stripping
> > 2021/08/30 09:15:27:559 error avf :00:00.0: error: avf_send_to_pf:
> > error [v_opcode = 28, v_retval -5] from avf_create_if: pci-addr
> > :91:04.1
> >
> > Syslog reveals a bit more:
> > Aug 30 09:15:27 s55-t13-sut1 kernel: [352169.781206] vfio-pci
> > :91:04.1: enabling device ( -> 0002) Aug 30 09:15:27
> > s55-t13-sut1 kernel: [352170.140729] i40e :91:00.0: Cannot disable
> > vlan stripping when port VLAN is set Aug 30 09:15:27 s55-t13-sut1
> > kernel: [352170.140737] i40e :91:00.0: VF 17 failed opcode 28,
> > retval: -5
> >
> > It looks like this feature (vlan stripping on VFs with VLANs) was removed in
> later versions of the driver. I don't know what the proper solution here is, 
> but
> adding a configuration option to not disable vlan stripping when creating AVF
> interface sound good to me.
> >
> > I've documented this in https://jira.fd.io/browse/VPP-1995.
> >
> > Can you try with 2.16.11 and report back same outputs?
> >
> > I've updated https://jira.fd.io/browse/VPP-1995 with 2.16.11 outputs and
> they're pretty much the same, except the last syslog line is missing.
> 
> OK, I was hoping new version of driver supports VLAN v2 offload APIs which
> allows us to know if stripping is supported or not on the specific interface. 
> V2
> API is already supported on ice driver (E810 NICs) and we have code to deal 
> with
> that.
> 
> So not sure what we can do here. I don’t see a way to know if stripping is
> supported or not.

If there isn't an API for this, then we'll have to get this information from 
the user, right?

Or we could try enabling stripping but not fail the interface initialization if 
it's not successful.

Thoughts?
Juraj

> 
> —
> Damjan
> 
> 
> Not sure
> 



-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20213): https://lists.fd.io/g/vpp-dev/message/20213
Mute This Topic: https://lists.fd.io/mt/85479187/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] AVF interface creation fails on VFs with configured VLAN with newer i40e drivers

2021-10-07 Thread Juraj Linkeš


> -Original Message-
> From: vpp-dev@lists.fd.io  On Behalf Of Juraj Linkeš
> Sent: Tuesday, September 28, 2021 11:43 AM
> To: damar...@cisco.com
> Cc: vpp-dev ; Lijian Zhang 
> Subject: Re: [vpp-dev] AVF interface creation fails on VFs with configured 
> VLAN
> with newer i40e drivers
> 
> 
> 
> > -Original Message-
> > From: vpp-dev@lists.fd.io  On Behalf Of Damjan
> > Marion via lists.fd.io
> > Sent: Wednesday, September 15, 2021 5:54 PM
> > To: Juraj Linkeš 
> > Cc: vpp-dev ; Lijian Zhang 
> > Subject: Re: [vpp-dev] AVF interface creation fails on VFs with
> > configured VLAN with newer i40e drivers
> >
> >
> >
> > > On 10.09.2021., at 08:53, Juraj Linkeš  wrote:
> > >
> > >
> > >
> > > From: vpp-dev@lists.fd.io  On Behalf Of Damjan
> > > Marion via lists.fd.io
> > > Sent: Thursday, September 9, 2021 12:01 PM
> > > To: Juraj Linkeš 
> > > Cc: vpp-dev ; Lijian Zhang
> > > 
> > > Subject: Re: [vpp-dev] AVF interface creation fails on VFs with
> > > configured VLAN with newer i40e drivers
> > >
> > >
> > > On 09.09.2021., at 09:14, Juraj Linkeš  wrote:
> > >
> > > Hi Damjan, vpp devs,
> > >
> > > Upgrading to 2.15.9 i40e driver in CI (from Ubuntu's 2.8.20-k) makes
> > > AVF
> > interface creation on VFs with configured VLANs fail:
> > > 2021/08/30 09:15:27:343 debug avf :91:04.1: request_queues:
> > > num_queue_pairs 1
> > > 2021/08/30 09:15:27:434 debug avf :91:04.1: version: major 1
> > > minor
> > > 1
> > > 2021/08/30 09:15:27:444 debug avf :91:04.1: get_vf_resources:
> > > bitmap 0x180b80a1 (l2 wb-on-itr adv-link-speed vlan-v2 vlan
> > > rx-polling rss-pf offload-adv-rss-pf offload-fdir-pf)
> > > 2021/08/30 09:15:27:445 debug avf :91:04.1: get_vf_resources:
> > > num_vsis 1 num_queue_pairs 1 max_vectors 5 max_mtu 0 vf_cap_flags
> > > 0xb0081 (l2 adv-link-speed vlan rx-polling rss-pf) rss_key_size 52
> > > rss_lut_size 64
> > > 2021/08/30 09:15:27:445 debug avf :91:04.1:
> > > get_vf_resources_vsi[0]: vsi_id 27 num_queue_pairs 1 vsi_type 6
> > > qset_handle 21 default_mac_addr ba:dc:0f:fe:02:11
> > > 2021/08/30 09:15:27:445 debug avf :91:04.1:
> > > disable_vlan_stripping
> > > 2021/08/30 09:15:27:559 error avf :00:00.0: error: avf_send_to_pf:
> > > error [v_opcode = 28, v_retval -5] from avf_create_if: pci-addr
> > > :91:04.1
> > >
> > > Syslog reveals a bit more:
> > > Aug 30 09:15:27 s55-t13-sut1 kernel: [352169.781206] vfio-pci
> > > :91:04.1: enabling device ( -> 0002) Aug 30 09:15:27
> > > s55-t13-sut1 kernel: [352170.140729] i40e :91:00.0: Cannot
> > > disable vlan stripping when port VLAN is set Aug 30 09:15:27
> > > s55-t13-sut1
> > > kernel: [352170.140737] i40e :91:00.0: VF 17 failed opcode 28,
> > > retval: -5
> > >
> > > It looks like this feature (vlan stripping on VFs with VLANs) was
> > > removed in
> > later versions of the driver. I don't know what the proper solution
> > here is, but adding a configuration option to not disable vlan
> > stripping when creating AVF interface sound good to me.
> > >
> > > I've documented this in https://jira.fd.io/browse/VPP-1995.
> > >
> > > Can you try with 2.16.11 and report back same outputs?
> > >
> > > I've updated https://jira.fd.io/browse/VPP-1995 with 2.16.11 outputs
> > > and
> > they're pretty much the same, except the last syslog line is missing.
> >
> > OK, I was hoping new version of driver supports VLAN v2 offload APIs
> > which allows us to know if stripping is supported or not on the
> > specific interface. V2 API is already supported on ice driver (E810
> > NICs) and we have code to deal with that.
> >
> > So not sure what we can do here. I don’t see a way to know if
> > stripping is supported or not.
> 
> If there isn't an API for this, then we'll have to get this information from 
> the
> user, right?
> 
> Or we could try enabling stripping but not fail the interface initialization 
> if it's not
> successful.
> 
> Thoughts?
> Juraj
> 

Hi Damjan,

Just pinging to get your thoughts. I really seems like we should introduce some 
sort of switch in the absence of an API.

Juraj

> >
> > —
> > Damjan
> >
> >
> > Not sure
> >
> 


-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20294): https://lists.fd.io/g/vpp-dev/message/20294
Mute This Topic: https://lists.fd.io/mt/85479187/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-



Re: [vpp-dev] VPP IPSec Related Doubts

2022-01-17 Thread Juraj Linkeš
Hi Hrishikesh,

The API has changed, try without the hyphen:
ipsec sa add 10 spi 1000 esp tunnel src 192.168.1.1 tunnel dst 192.168.1.2 
crypto-key 4339314b55523947594d6d3547666b45 crypto-alg aes-cbc-128 integ-key 
4339314b55523947594d6d3547666b45 integ-alg sha1-96

Regards,
Juraj

From: vpp-dev@lists.fd.io  On Behalf Of Hrishikesh 
Karanjikar
Sent: Thursday, December 16, 2021 10:45 AM
To: vpp-dev@lists.fd.io
Subject: [vpp-dev] VPP IPSec Related Doubts

Hi All,

I am trying to get following setup working,

https://www.intel.com/content/www/us/en/developer/articles/guide/get-started-with-ipsec-acceleration-in-the-fdio-vpp-project.html

The commands in above setup are as follows,
=
set int ip address TenGigabitEthernet6/0/0 
192.168.30.30/24
set int promiscuous on TenGigabitEthernet6/0/0
set int ip address TenGigabitEthernet6/0/1 
192.168.30.31/24
set int promiscuous on TenGigabitEthernet6/0/1

ipsec spd add 1
set interface ipsec spd TenGigabitEthernet6/0/1 1
ipsec sa add 10 spi 1000 esp tunnel-src 192.168.1.1 tunnel-dst 192.168.1.2 
crypto-key 4339314b55523947594d6d3547666b45 crypto-alg aes-cbc-128 integ-key 
4339314b55523947594d6d3547666b45 integ-alg sha1-96
ipsec policy add spd 1 outbound priority 100 action protect sa 10 
local-ip-range 192.168.20.0-192.168.20.255 remote-ip-range 
192.168.40.0-192.168.40.255
ipsec policy add spd 1 outbound priority 90 protocol 50 action bypass

ip route add 192.168.40.40/32 via 192.168.1.2 
TenGigabitEthernet6/0/1
set ip arp TenGigabitEthernet6/0/1 192.168.1.2 90:e2:ba:50:8f:19

set int state TenGigabitEthernet6/0/0 up
set int state TenGigabitEthernet6/0/1 up
=

However, a few commands in the setup are failing.
e.g.
DBGvpp# ipsec sa add 10 spi 1000 esp tunnel-src 192.168.1.1 tunnel-dst 
192.168.1.2 crypto-key 4339314b55523947594d6d3547666b45 crypto-alg aes-cbc-128 
integ-key 4339314b55523947594d6d3547666b45 integ-alg sha1-96
ipsec sa: parse error: '-src 192.168.1.1 tunnel-dst 19...'

Can anybody guide me on how to go about this?

--

Thanks and Regards,
Hrishikesh Karanjikar

-=-=-=-=-=-=-=-=-=-=-=-
Links: You receive all messages sent to this group.
View/Reply Online (#20734): https://lists.fd.io/g/vpp-dev/message/20734
Mute This Topic: https://lists.fd.io/mt/87764016/21656
Group Owner: vpp-dev+ow...@lists.fd.io
Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com]
-=-=-=-=-=-=-=-=-=-=-=-