On Thu, Mar 18, 2021 at 3:56 PM Ruifeng Wang <ruifeng.w...@arm.com> wrote: > > Both L2 and L3 headers will be used in forward processing. And these > two headers are in the same cache line. It has the same effect for > prefetching with L2 header address and prefetching with L3 header > address. > > Changed to use L2 header address for prefetching. The change showed > no measurable performance improvement, but it definitely removed
Same here. > unnecessary instructions for address calculation. > > Signed-off-by: Ruifeng Wang <ruifeng.w...@arm.com> Acked-by: Jerin Jacob <jer...@marvell.com> > --- > examples/l3fwd/l3fwd_lpm_neon.h | 10 +++++----- > 1 file changed, 5 insertions(+), 5 deletions(-) > > diff --git a/examples/l3fwd/l3fwd_lpm_neon.h b/examples/l3fwd/l3fwd_lpm_neon.h > index ae8840694..1650ae444 100644 > --- a/examples/l3fwd/l3fwd_lpm_neon.h > +++ b/examples/l3fwd/l3fwd_lpm_neon.h > @@ -98,14 +98,14 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf > **pkts_burst, > if (k) { > for (i = 0; i < FWDSTEP; i++) { > rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[i], > - struct rte_ether_hdr *) + 1); > + void *)); > } > > for (j = 0; j != k - FWDSTEP; j += FWDSTEP) { > for (i = 0; i < FWDSTEP; i++) { > rte_prefetch0_write(rte_pktmbuf_mtod( > pkts_burst[j + i + FWDSTEP], > - struct rte_ether_hdr *) + 1); > + void *)); > } > > processx4_step1(&pkts_burst[j], &dip, &ipv4_flag); > @@ -125,17 +125,17 @@ l3fwd_lpm_send_packets(int nb_rx, struct rte_mbuf > **pkts_burst, > switch (m) { > case 3: > rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], > - struct rte_ether_hdr *) + 1); > + void *)); > j++; > /* fallthrough */ > case 2: > rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], > - struct rte_ether_hdr *) + 1); > + void *)); > j++; > /* fallthrough */ > case 1: > rte_prefetch0_write(rte_pktmbuf_mtod(pkts_burst[j], > - struct rte_ether_hdr *) + 1); > + void *)); > j++; > } > > -- > 2.25.1 >