Hi all, I've been trying to measure possible performance penalties of performing LPM table lookup on l3fwd code (as opposed to a simple forwarding without lookup, i.e., forwarding back to the ingress port).
I perform two sets of experiments -- (1) generate a fixed dst IP address from DPDK pktgen; (2) generate random dst IP address from DPDK pktgen. My hypothesis is that for case (1), upon receiving many packets with same dst IP, DPDK l3fwd should only need to fetch LPM table from the cache. However, case (2) would generate more cache misses, hence requiring fetches from memory, which should increase the latency. (My current machine has 20MB of L3 cache.) However, when I measure the average cycles it takes to perform a lookup indexed by the received dst IP address, the two cases yield almost similar results of around 34 cycles. I am using rdtsc to measure the cycles in the rte_lpm_lookup() function in rte_lpm.h (under lib/librte_lpm). I am not sure if this is due to rte_rdtsc problem, or if I am misunderstanding something. tsc1 = rte_rdtsc(); tbl_entry = *(const uint16_t *)&lpm->tbl24[tbl24_index]; tscdif = rte_rdtsc() - tsc1; aggreg_dif += tscdif; I would appreciate it if someone could provide their opinion on this phenomenon. Thanks in advance! Jun