Re: [PATCH] examples/l3fwd: optimize packet prefetch

huangdengdui Thu, 09 Jan 2025 03:31:19 -0800


On 2025/1/8 21:42, Konstantin Ananyev wrote:
> 
> 
>>
>> The prefetch window depending on the hardware platform. The current prefetch
>> policy may not be applicable to all platforms. In most cases, the number of
>> packets received by Rx burst is small (64 is used in most performance 
>> reports).
>> In L3fwd, the maximum value cannot exceed 512. Therefore, prefetching all
>> packets before processing can achieve better performance.
> 
> As you mentioned 'prefetch' behavior differs a lot from one HW platform to 
> another.
> So it could easily be that changes you suggesting will cause performance
> boost on one platform and degradation on another.
> In fact, right now l3fwd 'prefetch' usage is a bit of mess:
> - l3fwd_lpm_neon.h uses  FWDSTEP as a prefetch window.
> - l3fwd_fib.c uses FIB_PREFETCH_OFFSET for that purpose
> - rest of the code uses either PREFETCH_OFFSET or doesn't use 'prefetch' at 
> all
>  
> Probably what we need here is some unified approach:
> configurable at run-time prefetch_window_size that all code-paths will obey.


Agreed, I'll add a parameter to configure the prefetch window.

> 
>> Signed-off-by: Dengdui Huang <[email protected]>
>> ---
>>  examples/l3fwd/l3fwd_lpm_neon.h | 42 ++++-----------------------------
>>  1 file changed, 5 insertions(+), 37 deletions(-)
>>
>> diff --git a/examples/l3fwd/l3fwd_lpm_neon.h 
>> b/examples/l3fwd/l3fwd_lpm_neon.h
>> index 3c1f827424..0b51782b8c 100644
>> --- a/examples/l3fwd/l3fwd_lpm_neon.h
>> +++ b/examples/l3fwd/l3fwd_lpm_neon.h
>> @@ -91,53 +91,21 @@ l3fwd_lpm_process_packets(int nb_rx, struct rte_mbuf 
>> **pkts_burst,
>>      const int32_t k = RTE_ALIGN_FLOOR(nb_rx, FWDSTEP);
>>      const int32_t m = nb_rx % FWDSTEP;
>>
>> -    if (k) {
>> -            for (i = 0; i < FWDSTEP; i++) {
>> -                    rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i],
>> -                                                    void *));
>> -            }
>> -            for (j = 0; j != k - FWDSTEP; j += FWDSTEP) {
>> -                    for (i = 0; i < FWDSTEP; i++) {
>> -                            rte_prefetch0(rte_pktmbuf_mtod(
>> -                                            pkts_burst[j + i + FWDSTEP],
>> -                                            void *));
>> -                    }
>> +    /* The number of packets is small. Prefetch all packets. */
>> +    for (i = 0; i < nb_rx; i++)
>> +            rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[i], void *));
>>
>> +    if (k) {
>> +            for (j = 0; j != k; j += FWDSTEP) {
>>                      processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>>                      processx4_step2(qconf, dip, ipv4_flag, portid,
>>                                      &pkts_burst[j], &dst_port[j]);
>>                      if (do_step3)
>>                              processx4_step3(&pkts_burst[j], &dst_port[j]);
>>              }
>> -
>> -            processx4_step1(&pkts_burst[j], &dip, &ipv4_flag);
>> -            processx4_step2(qconf, dip, ipv4_flag, portid, &pkts_burst[j],
>> -                            &dst_port[j]);
>> -            if (do_step3)
>> -                    processx4_step3(&pkts_burst[j], &dst_port[j]);
>> -
>> -            j += FWDSTEP;
>>      }
>>
>>      if (m) {
>> -            /* Prefetch last up to 3 packets one by one */
>> -            switch (m) {
>> -            case 3:
>> -                    rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -                                                    void *));
>> -                    j++;
>> -                    /* fallthrough */
>> -            case 2:
>> -                    rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -                                                    void *));
>> -                    j++;
>> -                    /* fallthrough */
>> -            case 1:
>> -                    rte_prefetch0(rte_pktmbuf_mtod(pkts_burst[j],
>> -                                                    void *));
>> -                    j++;
>> -            }
>> -            j -= m;
>>              /* Classify last up to 3 packets one by one */
>>              switch (m) {
>>              case 3:
>> --
>> 2.33.0
>

Re: [PATCH] examples/l3fwd: optimize packet prefetch

Reply via email to