On Wed, 2 May 2018 13:01:36 +0200 Björn Töpel <bjorn.to...@gmail.com> wrote:
> +static void rx_drop(struct xdpsock *xsk) > +{ > + struct xdp_desc descs[BATCH_SIZE]; > + unsigned int rcvd, i; > + > + rcvd = xq_deq(&xsk->rx, descs, BATCH_SIZE); > + if (!rcvd) > + return; > + > + for (i = 0; i < rcvd; i++) { > + u32 idx = descs[i].idx; > + > + lassert(idx < NUM_FRAMES); > +#if DEBUG_HEXDUMP > + char *pkt; > + char buf[32]; > + > + pkt = xq_get_data(xsk, idx, descs[i].offset); > + sprintf(buf, "idx=%d", idx); > + hex_dump(pkt, descs[i].len, buf); > +#endif > + } > + > + xsk->rx_npkts += rcvd; > + > + umem_fill_to_kernel_ex(&xsk->umem->fq, descs, rcvd); > +} I would really like to see an option that can enable reading the data/memory in the packet. Else the test is rather fake... I hacked it myself manually to read first u32. - Before: 10,771,083 pps - After: 9,430,741 pps The slowdown is not as big as I expected, which is good :-) With perf stat I can see more LLC-load's, but not misses. It is not getting registered as a cache-miss that I read data on the remote CPPU. p.s. these tests are with mlx5 (which only have XDP_REDIRECT RX-side). - - Best regards, Jesper Dangaard Brouer MSc.CS, Principal Kernel Engineer at Red Hat LinkedIn: http://www.linkedin.com/in/brouer Before: sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e instructions -e cache-misses -e cache-references -e LLC-store-misses -e LLC-store -e LLC-load-misses -e LLC-load -r 3 sleep 1 Performance counter stats for 'CPU(s) 3' (3 runs): 200,020 L1-icache-load-misses ( +- 0.76% ) (33.31%) 3,920,754,587 cycles ( +- 0.14% ) (44.50%) 3,062,308,209 instructions # 0.78 insn per cycle ( +- 0.28% ) (55.65%) 823 cache-misses # 0.011 % of all cache refs ( +- 70.81% ) (66.74%) 7,587,132 cache-references ( +- 0.48% ) (77.83%) 0 LLC-store-misses (77.83%) 384,401 LLC-store ( +- 2.97% ) (77.83%) 15 LLC-load-misses # 0.00% of all LL-cache hits ( +-100.00% ) (22.17%) 3,192,312 LLC-load ( +- 0.35% ) (22.17%) 1.001199221 seconds time elapsed ( +- 0.00% ) After: $ sudo ~/perf stat -C3 -e L1-icache-load-misses -e cycles -e instructions -e cache-misses -e cache-references -e LLC-store-misses -e LLC-store -e LLC-load-misses -e LLC-load -r 3 sleep 1 Performance counter stats for 'CPU(s) 3' (3 runs): 154,921 L1-icache-load-misses ( +- 3.88% ) (33.31%) 3,924,791,213 cycles ( +- 0.10% ) (44.50%) 2,930,116,185 instructions # 0.75 insn per cycle ( +- 0.33% ) (55.65%) 342 cache-misses # 0.002 % of all cache refs ( +- 65.52% ) (66.74%) 15,810,892 cache-references ( +- 0.13% ) (77.83%) 0 LLC-store-misses (77.83%) 925,544 LLC-store ( +- 2.33% ) (77.83%) 155 LLC-load-misses # 0.00% of all LL-cache hits ( +- 67.22% ) (22.17%) 12,791,264 LLC-load ( +- 0.04% ) (22.17%) 1.001206058 seconds time elapsed ( +- 0.00% )