On 2018-11-14 20:16, Venky Venkatesh wrote:
Hi,

https://mails.dpdk.org/archives/dev/2018-September/111344.html mentions that 
there is a sample application where “worker cores can sustain 300-400 million 
event/s. With a pipeline
with 1000 clock cycles of work per stage, the average event device
overhead is somewhere 50-150 clock cycles/event”. Is this sample application 
code available?

It's proprietary code, although it's also been tested by some of our partners.

The primary reason for it not being contributed to DPDK is because it's a fair amount of work to do so. I would refer to it as an eventdev pipeline simulator, rather than a sample app.

We have written a similar simple sample application where 1 core keeps 
enqueuing (as NEW/ATOMIC) and n-cores dequeue (and RELEASE) and do no other 
work. But we are not seeing anything close in terms of performance. Also we are 
seeing some counter intuitive behaviors such as a burst of 32 is worse than 
burst of 1. We surely have something wrong and would thus compare against a 
good application that you have written. Could you pls share it?


Is this enqueue or dequeue burst? How large is n? Is this explicit release?

What do you set nb_events_limit to? Good DSW performance much depends on the average burst size on the event rings, which in turn is dependent on the number of in-flight events. On really high core-count systems you might also want to increase DSW_MAX_PORT_OPS_PER_BG_TASK, since it effectively puts a limit on the maximum number of events buffered on the output buffers.

In the pipeline simulator all cores produce events initially, and then recycles events when the number of in-flight events reach a certain threshold (50% of nb_events_limit). A single lcore won't be able to fill the pipeline, if you have zero-work stages.

Even though I can't send you the simulator code at this point, I'm happy to assist you in any DSW-related endeavors.

Reply via email to