After some examination, we believe the (apparently) random part is assignment of VirtualEthernet rx (not tx) queues to VPP workers. It is basically this sequence of VAT commands: create_vhost_user_if socket /tmp/sock-1-1 sw_interface_dump sw_interface_dump create_vhost_user_if socket /tmp/sock-1-2 sw_interface_dump sw_interface_dump sw_interface_set_flags sw_if_index 4 admin-up link-up sw_interface_set_flags sw_if_index 5 admin-up link-up
If confirmed, we can make sure that CSIT code will move the rx queues to where we want them. Vratko. From: vpp-dev@lists.fd.io <vpp-dev@lists.fd.io> On Behalf Of Vratko Polak -X (vrpolak - PANTHEON TECHNOLOGIES at Cisco) via Lists.Fd.Io Sent: Thursday, 2019-January-03 18:13 To: vpp-dev@lists.fd.io Cc: vpp-dev@lists.fd.io Subject: [vpp-dev] graph node placement on workers Hi developers. When examining some performance test results, I started seeing patterns, so I now have multiple questions. I have noticed that (with hyper-threading on) l2-input-vtr graph node is present only on half of VPP workers. (At least for the dot1q test below.) In CSIT, the node is always seen on low numbered logical cores (lcore 2, as opposed to lcore 58). Is that expected? Also, some graph nodes ({if}-output and {if}-tx) tend to be only on some workers (which makes sense) determined apparently by random chance (which makes less sense). I believe this (apparent) randomness affects results of trending. I have two examples, both are scatter-plots on this [0] chart. The first example is the dot1q-l2bdbasemaclrn-eth-2vhostvr1024-1vm test. This test uses asymmetric load (encap one way, decap the other). There is 100% match between performance (whether the dot lands in the 6Mpps band, or 5Mpps band) and which TenGigabitEthernet tx node ends up on vpp_wk_0 (lcore 2). Due to asymmetric load, I believe one subgraph has more work to do, and it is random whether the more busy subgraph ends up on a worker also burdened with l2-input-vtr. The second example is the eth-l2xcbase-eth-4vhostvr1024-2vm test. Here the load is symmetric, but I see randomness in placement of the four VirtualEthernet tx nodes. I have not examined many runs, but I suspect the performance depends on which pair of the four end up on the same node. Finally, few less coherent questions. Is "worker handoff" involved? Should l2-input-vtr node stick to the subgraph which needs it? Can we (CSIT) give VPP a hint on which VirtualEthernet tx node should be handled by which worker? Vratko. [0] https://docs.fd.io/csit/master/trending/trending/vm_vhost_l2-2n-skx-x710-64b-base.html#t1c
-=-=-=-=-=-=-=-=-=-=-=- Links: You receive all messages sent to this group. View/Reply Online (#11841): https://lists.fd.io/g/vpp-dev/message/11841 Mute This Topic: https://lists.fd.io/mt/28925781/21656 Group Owner: vpp-dev+ow...@lists.fd.io Unsubscribe: https://lists.fd.io/g/vpp-dev/unsub [arch...@mail-archive.com] -=-=-=-=-=-=-=-=-=-=-=-