Hi all, I have a question related to usage of rte_prefetch0() function,In one of the sample files, we have implementation like:
/* Prefetch first packets */ for (j = 0; j < PREFETCH_OFFSET && j < nb_rx; j++) { rte_prefetch0(rte_pktmbuf_mtod( pkts_burst[j], void *)); } /* Prefetch and forward already prefetched packets */ for (j = 0; j < (nb_rx - PREFETCH_OFFSET); j++) { rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[ j + PREFETCH_OFFSET], void *)); l3fwd_simple_forward(pkts_burst[j], portid, qconf); } /* Forward remaining prefetched packets */ for (; j < nb_rx; j++) { l3fwd_simple_forward(pkts_burst[j], portid, qconf); } where the prefetch0() is carried out in multiple split iterations, would like to have an insight on whether it makes performance improvement to likes of: for (j = 0; j < nb_rx; j++) { rte_prefetch0(rte_pktmbuf_mtod( pkts_burst[j], void *)); } and how frequent rte_prefetch() needs to called for the same packet. and any mechanisms to call in bulk for 64 packets at once ? thanks Parikshith