On Sun, 7 Apr 2024 11:36:59 +0200 Mattias Rönnblom <hof...@lysator.liu.se> wrote:
> On 2024-04-06 17:28, Morten Brørup wrote: > >> From: Tyler Retzlaff [mailto:roret...@linux.microsoft.com] > >> Sent: Thursday, 4 April 2024 19.15 > >> > >> RFC sample illustrating simple conversion of VLA to alloca(). > >> > >> Signed-off-by: Tyler Retzlaff <roret...@linux.microsoft.com> > >> --- > > > > [...] > > > >> --- a/lib/latencystats/rte_latencystats.c > >> +++ b/lib/latencystats/rte_latencystats.c > >> @@ -159,7 +159,7 @@ struct latency_stats_nameoff { > >> { > >> unsigned int i, cnt = 0; > >> uint64_t now; > >> - float latency[nb_pkts]; > >> + float *latency = alloca(sizeof(float) * nb_pkts); > > > > In cases where we are processing packet bursts, I would prefer introducing > > a global #define RTE_MAX_PKT_BURST_SIZE, indicating the max packet burst > > size supported by libraries and drivers. > > First question: what is meant by a "packet" here? An mbuf? A > network-layer PDU? Something that in some way relates to zero or more > packets, like an rte_event? Or just any object that are sent or receive > of some DPDK API in batches or bursts? > > Second question: is RTE_MAX_PKT_BURST_SIZE meant as an upper bound, so > no API can consumer or produce a burst larger than this, it does all > APIs literally have to support that burst size. > > Third question: why not just keep VLAs? > > > For reference, rte_config.h already has #define RTE_GRAPH_BURST_SIZE 256. > > > > Such a common define should also be used by functions such as > > rte_pktmbuf_free_bulk(); although it also supports segmented packets, so it > > must still be able to handle more mbufs. > > https://elixir.bootlin.com/dpdk/v24.03/source/lib/mbuf/rte_mbuf.c#L486 > > Looking at the maths here, calc_lantency can be seriously improved: - the calc latency is in the fast path. for transmit. - it is doing floating point math; floating point is much slower than doing fixed point - the latency[] array is a temporary, it should be possible to compute total latency without it. - it acquires a lock, in order to achieve DPDK level performance of 40 Mpps, it is necessary to not do absolute minimum of locking.