[dpdk-dev] LLC miss in librte_distributor

2014-11-13 Thread jigsaw
Hi, Well, I give up the idea of optimizing QPI caused LLC miss. The queue based messaging has even worse performance than polling a same buf from both cores. It is the nature of busy polling model. I guess we have to accept it as a fact, unless the programming model can be changed to a biased lock

[dpdk-dev] LLC miss in librte_distributor

2014-11-12 Thread jigsaw
Hi Bruce, Thanks for your reply. I agree that to logically divide the distributor functionality is the best solution. Meantime I tried some tricks and the result looks good: For same amount of pkts (1M), the LLC stores and loads decrease 90% percent, and the miss rates for both decrease to 25%.

[dpdk-dev] LLC miss in librte_distributor

2014-11-12 Thread Bruce Richardson
On Wed, Nov 12, 2014 at 10:37:33AM +0200, jigsaw wrote: > Hi, > > OK it is now very clear it is due to memory transactions between different > nodes. > > The test program is here: > https://gist.github.com/jigsawecho/6a2e78d65f0fe67adf1b > > The test machine topology is: > > NUMA node0 CPU(s):

[dpdk-dev] LLC miss in librte_distributor

2014-11-12 Thread jigsaw
Hi, OK it is now very clear it is due to memory transactions between different nodes. The test program is here: https://gist.github.com/jigsawecho/6a2e78d65f0fe67adf1b The test machine topology is: NUMA node0 CPU(s): 0-7,16-23 NUMA node1 CPU(s): 8-15,24-31 Change the 3rd param from 0 t

[dpdk-dev] LLC miss in librte_distributor

2014-11-11 Thread jigsaw
Hi Bruce, I noticed that librte_distributor has quite sever LLC miss problem when running on 16 cores. While on 8 cores, there's no such problem. The test runs on a Intel(R) Xeon(R) CPU E5-2670, a SandyBridge with 32 cores on 2 sockets. The test case is the distributor_perf_autotest, i.e. in app/