Thanks for the review Ferruh :) On Mon, Aug 9, 2021 at 11:18 PM Ferruh Yigit <ferruh.yi...@intel.com> wrote: > > On 8/9/2021 7:52 AM, Zhihong Wang wrote: > > This patch aims to: > > 1. Add flexibility by supporting IP & UDP src/dst fields > > What is the reason/"use case" of this flexibility?
The purpose is to emulate pkt generator behaviors. > > > 2. Improve multi-core performance by using per-core vars> > > On multi core this also has syncronization problem, OK to make it per-core. Do > you have any observed performance difference, if so how much is it? Huge difference, one example: 8 core flowgen -> rxonly results: 43 Mpps (per-core) vs. 9.3 Mpps (shared), of course the numbers "varies depending on system configuration". > > And can you please separate this to its own patch? This can be before ip/udp > update. Will do. > > > v2: fix assigning ip header cksum > > > > +1 to update, can you please make it as seperate patch? Sure. > > So overall this can be a patchset with 4 patches: > 1- Fix retry logic (nb_rx -> nb_pkt) > 2- Use 'rte_ipv4_cksum()' API (instead of static 'ip_sum()') > 3- User per-core varible (for 'next_flow') > 4- Support ip/udp src/dst variaty of packets > Great summary. Thanks a lot. > > Signed-off-by: Zhihong Wang <wangzhihong....@bytedance.com> > > --- > > app/test-pmd/flowgen.c | 137 > > +++++++++++++++++++++++++++++++------------------ > > 1 file changed, 86 insertions(+), 51 deletions(-) > > > > <...> > > > @@ -185,30 +193,57 @@ pkt_burst_flow_gen(struct fwd_stream *fs) > > } > > pkts_burst[nb_pkt] = pkt; > > > > - next_flow = (next_flow + 1) % cfg_n_flows; > > + if (++next_udp_dst < cfg_n_udp_dst) > > + continue; > > + next_udp_dst = 0; > > + if (++next_udp_src < cfg_n_udp_src) > > + continue; > > + next_udp_src = 0; > > + if (++next_ip_dst < cfg_n_ip_dst) > > + continue; > > + next_ip_dst = 0; > > + if (++next_ip_src < cfg_n_ip_src) > > + continue; > > + next_ip_src = 0; > > What is the logic here, can you please clarifiy the packet generation logic > both > in a comment here and in the commit log? It's round-robin field by field. Will add the comments. > > > } > > > > nb_tx = rte_eth_tx_burst(fs->tx_port, fs->tx_queue, pkts_burst, > > nb_pkt); > > /* > > * Retry if necessary > > */ > > - if (unlikely(nb_tx < nb_rx) && fs->retry_enabled) { > > + if (unlikely(nb_tx < nb_pkt) && fs->retry_enabled) { > > retry = 0; > > - while (nb_tx < nb_rx && retry++ < burst_tx_retry_num) { > > + while (nb_tx < nb_pkt && retry++ < burst_tx_retry_num) { > > rte_delay_us(burst_tx_delay_time); > > nb_tx += rte_eth_tx_burst(fs->tx_port, fs->tx_queue, > > - &pkts_burst[nb_tx], nb_rx - nb_tx); > > + &pkts_burst[nb_tx], nb_pkt - nb_tx); > > } > > +1 to this fix, thanks for it. But can you please make a seperate patch for > this, with proper 'Fixes:' tag etc.. Ok. > > > } > > - fs->tx_packets += nb_tx; > > > > inc_tx_burst_stats(fs, nb_tx); > > - if (unlikely(nb_tx < nb_pkt)) { > > - /* Back out the flow counter. */ > > - next_flow -= (nb_pkt - nb_tx); > > - while (next_flow < 0) > > - next_flow += cfg_n_flows; > > + fs->tx_packets += nb_tx; > > + /* Catch up flow idx by actual sent. */ > > + for (i = 0; i < nb_tx; ++i) { > > + RTE_PER_LCORE(_next_udp_dst) = RTE_PER_LCORE(_next_udp_dst) + > > 1; > > + if (RTE_PER_LCORE(_next_udp_dst) < cfg_n_udp_dst) > > + continue; > > + RTE_PER_LCORE(_next_udp_dst) = 0; > > + RTE_PER_LCORE(_next_udp_src) = RTE_PER_LCORE(_next_udp_src) + > > 1; > > + if (RTE_PER_LCORE(_next_udp_src) < cfg_n_udp_src) > > + continue; > > + RTE_PER_LCORE(_next_udp_src) = 0; > > + RTE_PER_LCORE(_next_ip_dst) = RTE_PER_LCORE(_next_ip_dst) + 1; > > + if (RTE_PER_LCORE(_next_ip_dst) < cfg_n_ip_dst) > > + continue; > > + RTE_PER_LCORE(_next_ip_dst) = 0; > > + RTE_PER_LCORE(_next_ip_src) = RTE_PER_LCORE(_next_ip_src) + 1; > > + if (RTE_PER_LCORE(_next_ip_src) < cfg_n_ip_src) > > + continue; > > + RTE_PER_LCORE(_next_ip_src) = 0; > > + } > > Why per-core variables are not used in forward function, but local variables > (like 'next_ip_src' etc..) used? Is it for the performance, if so what is the > impact? > > And why not directly assign from local variables to per-core variables, but > have > above catch up loop? > > Local vars are for generating pkts, global ones catch up finally when nb_tx is clear. So flow indexes only increase by actual sent pkt number. It serves the same purpose of the original "/* backout the flow counter */". My math isn't good enough to make it look more intelligent though.