Hi,
Well, I give up the idea of optimizing QPI caused LLC miss.
The queue based messaging has even worse performance than polling a same
buf from both cores.
It is the nature of busy polling model.
I guess we have to accept it as a fact, unless the programming model can be
changed to a biased lock
Hi Bruce,
Thanks for your reply.
I agree that to logically divide the distributor functionality is the best
solution.
Meantime I tried some tricks and the result looks good: For same amount of
pkts (1M), the LLC stores and loads decrease 90% percent, and the miss
rates for both decrease to 25%.
On Wed, Nov 12, 2014 at 10:37:33AM +0200, jigsaw wrote:
> Hi,
>
> OK it is now very clear it is due to memory transactions between different
> nodes.
>
> The test program is here:
> https://gist.github.com/jigsawecho/6a2e78d65f0fe67adf1b
>
> The test machine topology is:
>
> NUMA node0 CPU(s):
Hi,
OK it is now very clear it is due to memory transactions between different
nodes.
The test program is here:
https://gist.github.com/jigsawecho/6a2e78d65f0fe67adf1b
The test machine topology is:
NUMA node0 CPU(s): 0-7,16-23
NUMA node1 CPU(s): 8-15,24-31
Change the 3rd param from 0 t
Hi Bruce,
I noticed that librte_distributor has quite sever LLC miss problem when
running on 16 cores.
While on 8 cores, there's no such problem.
The test runs on a Intel(R) Xeon(R) CPU E5-2670, a SandyBridge with 32
cores on 2 sockets.
The test case is the distributor_perf_autotest, i.e.
in app/
5 matches
Mail list logo