It appears that miniflow_extract() in emc_processing() spends a lot of cycles waiting for the packet's data to be read.
Prefetching the next packet's data while parsing removes this delay. For a single flow pipeline the throughput improves by ~10%. With a more realistic pipeline the change has a much smaller effect (~0.5% improvement) Signed-off-by: Daniele Di Proietto <diproiet...@vmware.com> --- lib/dpif-netdev.c | 5 +++++ 1 file changed, 5 insertions(+) diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c index 5b82c8b..f13169c 100644 --- a/lib/dpif-netdev.c +++ b/lib/dpif-netdev.c @@ -3150,6 +3150,11 @@ emc_processing(struct dp_netdev_pmd_thread *pmd, struct dp_packet **packets, continue; } + if (i != cnt - 1) { + /* Prefetch next packet data */ + OVS_PREFETCH(dp_packet_data(packets[i+1])); + } + miniflow_extract(packets[i], &key.mf); key.len = 0; /* Not computed yet. */ key.hash = dpif_netdev_packet_get_rss_hash(packets[i], &key.mf); -- 2.1.4 _______________________________________________ dev mailing list dev@openvswitch.org http://openvswitch.org/mailman/listinfo/dev