On 16-04-10 02:38 PM, Brenden Blanco wrote:
I always go for the lowest hanging fruit.
Which to me is the 60% time spent above the driver level as shown above.
[..]
It seemed it was the driver path in your case. When we removed
the driver overhead (as demoed at the tc workshop in netdev11) we saw
__netif_receive_skb_core() at the top of the profile.
So in this case seems it was mlx4_en_process_rx_cq() - thats why i
was saying the bottleneck is the driver.
I wouldn't call it a bottleneck when the time spent is additive,
aka run-to-completion.
The driver is a bottleneck regardless. It is probably the DMA interfaces
and lots of cacheline misses. So the first thing to
fix is whats at the top of the profile if you wanb
The fact you are dropping earlier is in itself an improvement
as long as you dont try to be too fancy.
Of course the second perf report is on the same machine as the commit
message. That was generated fresh for this email thread. All of the
numbers I've quoted come from the same single-sender/single-receiver
setup. I did also revert the change the in mlx4 driver and there was no
change in the tc numbers.
Ok, i misunderstood then because you hinted Daniel had seen those
numbers. If you please also add that to your commit numbers.
cheers,
jamal