Thanks, Dave. Crystally clear! It is a deliberate design, indeed. Regards, Kingwel
-------- 原始邮件 -------- 主题: RE: [vpp-dev] Question about vlib_next_frame_change_ownership 来自: "Dave Barach (dbarach)" <dbar...@cisco.com> 发至: 2019年1月25日 下午9:03 抄送: Kingwel Xie <kingwel....@ericsson.com>,vpp-dev <vpp-dev@lists.fd.io> Dear Kingwei, On a per-thread basis, only one input node is active at a time. In the 2x active input node case you sketched, the second input node will take frame ownership of the ethernet input frame - which should be pending but not yet dispatched - and add more buffer indices to it. It’s possible that the first frame will fill and that a second frame will be allocated and [partially] filled; that’s not the typical case. One lap around the track later, the first input node will take back ownership. Let’s say that input node 1 adds 50 packets to the ethernet-input frame, and input node 2 adds another 50 packets. The frame ownership change dance yields a vector of size 100 in ethernet-input. My guess is that the increase in efficiency from ethernet-input forward in the graph more than compensates for the fixed cost of the frame change ownership dance. This is to confirm what you wrote. I usually call this effect “input aggregation.” Also holds true for the handoff node, especially when handing off frames from multiple threads to a single thread. The alternative: dispatch two smaller frames instead of one big one. Doing that might not be awful if all input nodes produced a fair number of packets. The situation becomes awful when the 2..N input nodes produce very different numbers of packets, e.g. 99 and 1. Anyhow, the code doesn’t work that way and isn’t going to work that way, so I digress... Thanks... Dave From: Kingwel Xie <kingwel....@ericsson.com> Sent: Friday, January 25, 2019 5:13 AM To: Dave Barach (dbarach) <dbar...@cisco.com>; vpp-dev <vpp-dev@lists.fd.io> Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership Hi Dave, After checking the code and some debug sessions, I realized where the bug is – crypto-input, which is calling vlib_trace_buffer() earlier than get_next_frame/put_next_frame. Therefore, the next_frame->flag is overwritten by get_next_frame/change_owenrship. I’ve made a patch for it: https://gerrit.fd.io/r/17079 appreciate your comments. Again, about vlib_next_frame_change_ownership, as I understand, this mechanism will always try to enqueue buffers to the owner’s frame_index, so please consider this scenario: We have 2 input nodes running at the same time – dpdk-input and avf-input, they will try to compete with each other to enqueue to ethernet-input. Consequently, bad side, memcpy per frame as we discussed, and, good side, assume they both have 10 buffers per frame, then ethernet-input will have chance to get 20 buffers for batch processing, good for performance. Please confirm if this is expected behavior of frame ownership. Regards, Kingwel From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Dave Barach via Lists.Fd.Io Sent: Friday, January 25, 2019 4:52 AM To: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership The vpp packet trace which I extracted from your dispatch trace seems exactly as I would have expected. See below. In a pg test like this one using a loopback interface, anything past loopN-tx is irrelevant. The ipsec packet turns into an ARP request for 18.1.0.241. In non-cyclic graph cases, we don’t end up changing frame ownership at all. In this case, you’re doing a double lookup. One small memcpy per frame is a trivial cost, especially when one remembers that the cost is amortized over all the packets in the frame. Until you produce a repeatable demonstration of the claimed issue, there’s nothing that we can do. Thanks... Dave VPP Buffer Trace Trace: Trace: 00:00:53:959410: pg-input Trace: stream ipsec0, 100 bytes, 0 sw_if_index Trace: current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0 Trace: UDP: 192.168.2.255 -> 1.2.3.4 Trace: tos 0x00, ttl 64, length 28, checksum 0xb324 Trace: fragment id 0x0000 Trace: UDP: 4321 -> 1234 Trace: length 80, checksum 0x30d9 Trace: 00:00:53:959426: ip4-input Trace: UDP: 192.168.2.255 -> 1.2.3.4 Trace: tos 0x00, ttl 64, length 28, checksum 0xb324 Trace: fragment id 0x0000 Trace: UDP: 4321 -> 1234 Trace: length 80, checksum 0x30d9 Trace: 00:00:53:959519: ip4-lookup Trace: fib 0 dpo-idx 2 flow hash: 0x00000000 Trace: UDP: 192.168.2.255 -> 1.2.3.4 Trace: tos 0x00, ttl 64, length 28, checksum 0xb324 Trace: fragment id 0x0000 Trace: UDP: 4321 -> 1234 Trace: length 80, checksum 0x30d9 Trace: 00:00:53:959598: ip4-rewrite Trace: tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 flow hash: 0x00000000 Trace: 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 Trace: 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f Trace: 00:00:53:959687: ipsec0-output Trace: ipsec0 Trace: 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 Trace: 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223 Trace: 00000040: 2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243 Trace: 00000060: 44454647000000000000000000000000000000000000000000000000 Trace: 00:00:53:959802: ipsec0-tx Trace: IPSec: spi 1 seq 1 Trace: 00:00:53:959934: esp4-encrypt Trace: esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96 Trace: 00:00:53:960084: ip4-lookup Trace: fib 0 dpo-idx 0 flow hash: 0x00000000 Trace: IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 Trace: tos 0x00, ttl 254, length 168, checksum 0x96ea Trace: fragment id 0x0000 Trace: 00:00:53:960209: ip4-glean Trace: IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 Trace: tos 0x00, ttl 254, length 168, checksum 0x96ea Trace: fragment id 0x0000 Trace: 00:00:53:960336: loop0-output Trace: loop0 Trace: ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff Trace: request, type ethernet/IP4, address size 6/4 Trace: de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241 Trace: 00:00:53:960491: error-drop Trace: ip4-glean: ARP requests sent From: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>> Sent: Thursday, January 24, 2019 2:43 AM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership Ok. As requested, pcap trace & test script attached. Actually I made some simplification to indicate the problem – using native IPSEC instead of DPDK. You can see in the buffer trace that ip-lookup is referred by ip-input in the beginning then by esp-encrypt later. It means the ownership of ip-lookup will be changed back and forth, 16x3=48 bytes memcpy, per frame basis. Under some case, the trace flag in next_frame will be lost, then it leads to buffer trace broken. I made a patch for further discussion about it: https://gerrit.fd.io/r/17037 Test log shown below: DBGvpp# show version vpp v19.04-rc0~24-g0702554 built by root on ubuntu89 at Sat Jan 19 22:13:50 EST 2019 DBGvpp# DBGvpp# exec ipsec loop0 DBGvpp# DBGvpp# pcap dispatch trace on max 1000 file vpp.pcap buffer-trace pg-input 10 Buffer tracing of 10 pkts from pg-input enabled... pcap dispatch capture on... DBGvpp# DBGvpp# DBGvpp# packet-generator enable-stream ipsec0 DBGvpp# DBGvpp# pcap dispatch trace off captured 14 pkts... saved to /tmp/vpp.pcap... DBGvpp# DBGvpp# show trace ------------------- Start of thread 0 kw_main ------------------- Packet 1 00:00:53:959410: pg-input stream ipsec0, 100 bytes, 0 sw_if_index current data 0, length 100, buffer-pool 0, clone-count 0, trace 0x0 UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959426: ip4-input UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959519: ip4-lookup fib 0 dpo-idx 2 flow hash: 0x00000000 UDP: 192.168.2.255 -> 1.2.3.4 tos 0x00, ttl 64, length 28, checksum 0xb324 fragment id 0x0000 UDP: 4321 -> 1234 length 80, checksum 0x30d9 00:00:53:959598: ip4-rewrite tx_sw_if_index 2 dpo-idx 2 : ipv4 via 0.0.0.0 ipsec0: mtu:9000 flow hash: 0x00000000 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f 00:00:53:959687: ipsec0-output ipsec0 00000000: 4500001c000000003f11b424c0a802ff0102030410e104d2005030d900010203 00000020: 0405060708090a0b0c0d0e0f101112131415161718191a1b1c1d1e1f20212223 00000040: 2425262728292a2b2c2d2e2f303132333435363738393a3b3c3d3e3f40414243 00000060: 44454647000000000000000000000000000000000000000000000000 00:00:53:959802: ipsec0-tx IPSec: spi 1 seq 1 00:00:53:959934: esp4-encrypt esp: spi 1 seq 1 crypto aes-cbc-128 integrity sha1-96 00:00:53:960084: ip4-lookup fib 0 dpo-idx 0 flow hash: 0x00000000 IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 tos 0x00, ttl 254, length 168, checksum 0x96ea fragment id 0x0000 00:00:53:960209: ip4-glean IPSEC_ESP: 18.1.0.71 -> 18.1.0.241 tos 0x00, ttl 254, length 168, checksum 0x96ea fragment id 0x0000 00:00:53:960336: loop0-output loop0 ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff request, type ethernet/IP4, address size 6/4 de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241 00:00:53:960491: error-drop ip4-glean: ARP requests sent 00:00:53:960780: ethernet-input ARP: de:ad:00:00:00:00 -> ff:ff:ff:ff:ff:ff 00:00:53:960927: arp-input request, type ethernet/IP4, address size 6/4 de:ad:00:00:00:00/18.1.0.71 -> 00:00:00:00:00:00/18.1.0.241 00:00:53:961126: error-drop arp-input: IP4 source address matches local interface From: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>> Sent: Wednesday, January 23, 2019 11:33 PM To: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: RE: [vpp-dev] Question about vlib_next_frame_change_ownership Please write up the issue and share the config and pg input script as I asked. You might find that the issue disappears pretty rapidly, with no further action on your part... (😉)... The basic graph engine is not a place to start hacking based on “I think I get it...” Thanks... Dave From: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io> <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> On Behalf Of Kingwel Xie Sent: Wednesday, January 23, 2019 10:18 AM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: Re: [vpp-dev] Question about vlib_next_frame_change_ownership thanks. I think I get it. By maintaining the ownership, vPP is able to enqueue all buffers destinated to the same target node into the owner's next frame at one time. This avoids dispatching the node function multiple times. The bug is still there. I will create a patch later for further discussion. And, maybe there has some space to improve: considering we have two input nodes which will both add elements to the third node, we will see the ownership of this node being switched per frame basis. - Kingwel -------- 原始邮件 -------- 主题: RE: Question about vlib_next_frame_change_ownership 来自: "Dave Barach (dbarach)" <dbar...@cisco.com<mailto:dbar...@cisco.com>> 发至: 2019年1月23日 下午8:49 抄送: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>>,vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> As you've probably noticed, the buffer manager has been under active development. That may or may not have anything to do with the issue. Please follow the bug reporting process: https://wiki.fd.io/view/VPP/BugReports. In this case, using master/latest, please create a Jira ticket including the exact configuration, packet generator input script, and a dispatch pcap trace: * "pcap dispatch trace on file dtrace max 10000 buffer-trace pg-input 1000", * start the pg stream * "pcap dispatch trace off". * Results in /tmp/dtrace. I'm not going to speculate on what's going on at this point. Please write up the issue so we can look at it. For a decent explanation of the frame ownership scheme, take a look at https://fdio-vpp.readthedocs.io/en/latest/gettingstarted/developers/vlib.html under "Complications". HTH... Dave -----Original Message----- From: Kingwel Xie <kingwel....@ericsson.com<mailto:kingwel....@ericsson.com>> Sent: Wednesday, January 23, 2019 2:16 AM To: Dave Barach (dbarach) <dbar...@cisco.com<mailto:dbar...@cisco.com>>; vpp-dev <vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>> Subject: Question about vlib_next_frame_change_ownership Hi Dave and all, I'm looking at a buffer trace issue with DPDK IPSEC. It turns out the flag VLIB_FRAME_TRACE is broken in vlib_next_frame_change_ownership(). The node path in my setup is: pg-input -> ip-input -> ip-lookup -> ... -> dkdp-esp-encrypt -> cryptodev -> crypto-input -> ip-lookup -> ... As you can see, the ip-lookup node has the owner node ip-input in the beginning, then owner will be changed to crypto-input shortly. This change causes that we swap the current next_frame with the owner's in vlib_next_frame_change_ownership(). As a result, the VLIB_FRAME_TRACE in next_frame->flag will be overwritten. The fix could be very simple, but I'm wondering why we have to change the ownership of the next_frame? Actually I can observe the ownership is changed back and forth between ip-input and crypto-input for every frame, which leads to performance degradation. However, it looks good to me even that we don’t care the ownership. In this case, ip-lookup will be dispatched by either ip-input or crypto-input, with different next_frame. I guess I must have missed something, appreciate if you can elaborate. Regards, Kingwel
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#12011): https://lists.fd.io/g/vpp-dev/message/12011 Mute This Topic: https://lists.fd.io/mt/29430823/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-