Ok. As requested, pcap trace & test script attached. Actually I made some simplification to indicate the problem – using native IPSEC instead of DPDK.
You can see in the buffer trace that ip-lookup is referred by ip-input in the beginning then by esp-encrypt later. It means the ownership of ip-lookup will be changed back and forth, 16x3=48 bytes memcpy, per frame basis. Under some case, the trace flag in next_frame will be lost, then it leads to buffer trace broken. I made a patch for further discussion about it: https://gerrit.fd.io/r/17037 Test log shown below: DBGvpp# show version vpp v19.04-rc0~24-g0702554 built by root on ubuntu89 at Sat Jan 19 22:13:50 EST 2019 DBGvpp# DBGvpp# exec ipsec loop0 DBGvpp# DBGvpp# pcap dispatch trace on max 1000 file vpp.pcap buffer-trace pg-input 10 Buffer tracing of 10 pkts from pg-input enabled... pcap dispatch capture on... DBGvpp# DBGvpp# DBGvpp# packet-generator enable-stream ipsec0 DBGvpp# DBGvpp# pcap dispatch trace off captured 14 pkts... saved to /tmp/vpp.pcap... DBGvpp# DBGvpp# show trace ------------------- Start of thread 0 kw_main ------------------- Packet 1 00:00:53:959410: pg-input stream ipsec0, 100 bytes, 0 sw_if_index current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0 UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959426: ip4-input UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959519: ip4-lookup fib 0 dpo-idx 2 flow hash: 0x00000000 UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959598: ip4-rewrite tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 flow hash: 0x00000000 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f 00:00:53:959687: ipsec0-output ipsec0 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223 00000040: 2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243 00000060: 44454647000000000000000000000000000000000000000000000000 00:00:53:959802: ipsec0-tx IPSec: spi 1 seq 1 00:00:53:959934: esp4-encrypt esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96 00:00:53:960084: ip4-lookup fib 0 dpo-idx 0 flow hash: 0x00000000 IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 tos 0x00, ttl 254, length 168, checksum 0x96ea fragment id 0x0000 00:00:53:960209: ip4-glean IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 tos 0x00, ttl 254, length 168, checksum 0x96ea fragment id 0x0000 00:00:53:960336: loop0-output loop0 ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff request, type ethernet/IP4, address size 6/4 de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241 00:00:53:960491: error-drop ip4-glean: ARP requests sent 00:00:53:960780: ethernet-input ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff 00:00:53:960927: arp-input request, type ethernet/IP4, address size 6/4 de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241 00:00:53:961126: error-drop arp-input: IP4 source address matches local interface From: Dave Barach (dbarach) <dbar...@cisco.com> Sent: Wednesday, January 23, 2019 11:33 PM To: Kingwel Xie <kingwel....@ericsson.com>; vpp-dev <vpp-dev@lists.fd.io> Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership Please write up the issue and share the config and pg input script as I asked. You might find that the issue disappears pretty rapidly, with no further action on your part... (😉)... The basic graph engine is not a place to start hacking based on “I think I get it...” Thanks... Dave From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie Sent: Wednesday, January 23, 2019 10:18 AM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership thanks. I think I get it. By maintaining the ownership, vPP is able to enqueue all buffers destinated to the same target node into the owner's next frame at one time. This avoids dispatching the node function multiple times. The bug is still there. I will create a patch later for further discussion. And, maybe there has some space to improve: considering we have two input nodes which will both add elements to the third node, we will see the ownership of this node being switched per frame basis. - Kingwel -------- 原始邮件 -------- 主题: RE: Question about vlib_next_frame_change_ownership 来自: "Dave Barach (dbarach)" <dbar...@cisco.com<mailto:dbar...@cisco.com>> 发至: 2019年1月23日 下午8:49 抄送: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>,vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> As you've probably noticed, the buffer manager has been under active development. That may or may not have anything to do with the issue. Please follow the bug reporting process: https://wiki.fd.io/view/VPP/BugReports. In this case, using master/latest, please create a Jira ticket including the exact configuration, packet generator input script, and a dispatch pcap trace: * "pcap dispatch trace on file dtrace max 10000 buffer-trace pg-input 1000", * start the pg stream * "pcap dispatch trace off". * Results in /tmp/dtrace. I'm not going to speculate on what's going on at this point. Please write up the issue so we can look at it. For a decent explanation of the frame ownership scheme, take a look at https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/developers/vlib.html under "Complications". HTH... Dave -----Original Message----- From: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>> Sent: Wednesday, January 23, 2019 2:16 AM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: Question about vlib_next_frame_change_ownership Hi Dave and all, I'm looking at a buffer trace issue with DPDK IPSEC. It turns out the flag VLIB_FRAME_TRACE is broken in vlib_next_frame_change_ownership(). The node path in my setup is: pg-input -> ip-input -> ip-lookup -> ... -> dkdp-esp-encrypt -> cryptodev -> crypto-input -> ip-lookup -> ... As you can see, the ip-lookup node has the owner node ip-input in the beginning, then owner will be changed to crypto-input shortly. This change causes that we swap the current next_frame with the owner's in vlib_next_frame_change_ownership(). As a result, the VLIB_FRAME_TRACE in next_frame->flag will be overwritten. The fix could be very simple, but I'm wondering why we have to change the ownership of the next_frame? Actually I can observe the ownership is changed back and forth between ip-input and crypto-input for every frame, which leads to performance degradation. However, it looks good to me even that we don’t care the ownership. In this case, ip-lookup will be dispatched by either ip-input or crypto-input, with different next_frame. I guess I must have missed something, appreciate if you can elaborate. Regards, Kingwel
vpp.pcap
Description: vpp.pcap
ipsec
Description: ipsec
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11984): https://lists.fd.io/g/vpp-dev/message/11984 Mute This Topic: https://lists.fd.io/mt/29430823/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-