Re: [PATCH v1 1/2] eal: add lcore busyness telemetry
16/07/2022 00:13, Morten Brørup: > > From: Anatoly Burakov [mailto:anatoly.bura...@intel.com] > > Sent: Friday, 15 July 2022 15.13 > > > > Currently, there is no way to measure lcore busyness in a passive way, > > without any modifications to the application. This patch adds a new EAL > > API that will be able to passively track core busyness. > > > > The busyness is calculated by relying on the fact that most DPDK API's > > will poll for packets. > > This is an "alternative fact"! Only run-to-completion applications polls for > RX. Pipelined applications do not poll for packets in every pipeline stage. > > > Empty polls can be counted as "idle", while > > non-empty polls can be counted as busy. To measure lcore busyness, we > > simply call the telemetry timestamping function with the number of > > polls > > a particular code section has processed, and count the number of cycles > > we've spent processing empty bursts. The more empty bursts we > > encounter, > > the less cycles we spend in "busy" state, and the less core busyness > > will be reported. > > > > In order for all of the above to work without modifications to the > > application, the library code needs to be instrumented with calls to > > the lcore telemetry busyness timestamping function. The following parts > > of DPDK are instrumented with lcore telemetry calls: > > > > - All major driver API's: > > - ethdev > > - cryptodev > > - compressdev > > - regexdev > > - bbdev > > - rawdev > > - eventdev > > - dmadev > > - Some additional libraries: > > - ring > > - distributor > > > > To avoid performance impact from having lcore telemetry support, a > > global variable is exported by EAL, and a call to timestamping function > > is wrapped into a macro, so that whenever telemetry is disabled, it > > only > > takes one additional branch and no function calls are performed. It is > > also possible to disable it at compile time by commenting out > > RTE_LCORE_BUSYNESS from build config. > > Since all of this can be completely disabled at build time, and thus has > exactly zero performance impact, I will not object to this patch. > > > > > This patch also adds a telemetry endpoint to report lcore busyness, as > > well as telemetry endpoints to enable/disable lcore telemetry. > > > > Signed-off-by: Kevin Laatz > > Signed-off-by: Conor Walsh > > Signed-off-by: David Hunt > > Signed-off-by: Anatoly Burakov > > --- > > > > Notes: > > We did a couple of quick smoke tests to see if this patch causes > > any performance > > degradation, and it seemed to have none that we could measure. > > Telemetry can be > > disabled at compile time via a config option, while at runtime it > > can be > > disabled, seemingly at a cost of one additional branch. > > > > That said, our benchmarking efforts were admittedly not very > > rigorous, so > > comments welcome! > > This patch does not reflect lcore business, it reflects some sort of ingress > activity level. > > All the considerations regarding non-intrusiveness and low overhead are good, > but everything in this patch needs to be renamed to reflect what it truly > does, so it is clear that pipelined applications cannot use this telemetry > for measuring lcore business (except on the ingress pipeline stage). +1 Anatoly, please reflect polling activity in naming. > It's a shame that so much effort clearly has gone into this patch, and no one > stopped to consider pipelined applications. :-( That's because no RFC was sent I think.
Re: [PATCH] doc: notice to deprecate DPAA2 cmdif raw driver
29/06/2022 06:38, Gagandeep Singh: > dpaa2_cmdif raw driver is no longer in use, so it > will be removed in v22.11 > > Signed-off-by: Gagandeep Singh Acked-by: Hemant Agrawal Acked-by: Thomas Monjalon Acked-by: Bruce Richardson Applied, thanks.
Re: [PATCH] doc: announce some raw/ifpga API removal
30/06/2022 11:41, David Marchand: > rte_pmd_ifpga_get_pci_bus() documentation is vague and it is unclear > what could be done with it. > On the other hand, EAL provides a standard API to retrieve a bus object > by name. > > Announce removal of this driver specific API for v22.11. > > Signed-off-by: David Marchand Acked-by: Wei Huang Acked-by: Thomas Monjalon Acked-by: Maxime Coquelin Applied, thanks.
Re: [PATCH v2] doc: announce rename of octeontx_ep driver
13/07/2022 11:28, Jerin Jacob: > On Wed, Jul 13, 2022 at 1:41 PM Veerasenareddy Burru > wrote: > > > > To enable single unified driver to support current OcteonTx and > > future Octeon PCI endpoint NICs, octeontx_ep driver will be renamed > > to octeon_ep to reflect common driver for all Octeon based > > PCI endpoint NICs. > > > > Signed-off-by: Veerasenareddy Burru > > Acked-by: Jerin Jacob Acked-by: Thomas Monjalon There are only 2 acks but it is not controversial at all, so Applied, thanks.
Re: [PATCH v3] doc: announce changes to rte_eth_set_queue_rate_limit API
15/07/2022 18:29, Ajit Khaparde: > On Fri, Jul 15, 2022 at 7:23 AM Andrew Rybchenko > wrote: > > > > On 7/15/22 16:25, skotesh...@marvell.com wrote: > > > From: Satha Rao > > > > > > rte_eth_set_queue_rate_limit argument rate modified to uint32_t > > > to support more than 64Gbps. > > > > > > Signed-off-by: Satha Rao > > > Acked-by: Jerin Jacob [...] > > > +* ethdev: The function ``rte_eth_set_queue_rate_limit`` takes ``rate`` > > > in Mbps. > > > + This parameter declared as uint16_t, queue rate limited to 64Gbps. > > > ``rate`` > > > + parameter will be modified to uint32_t in DPDK 22.11 so that it can > > > work for > > > + more than 64Gbps. > > > > Acked-by: Andrew Rybchenko > Acked-by: Ajit Khaparde With a bit of English grammar rewording, Acked-by: Thomas Monjalon Applied, thanks.
Re: [PATCH v2] doc: announce header split deprecation
15/07/2022 22:30, xuan.d...@intel.com: > From: Xuan Ding > > RTE_ETH_RX_OFFLOAD_HEADER_SPLIT offload was introduced some time ago to > substitute bit-field header_split in struct rte_eth_rxmode. It allows > to enable per-port header split offload with the header size controlled > using split_hdr_size in the same structure. > > Right now, no single PMD actually supports RTE_ETH_RX_OFFLOAD_HEADER_SPLIT > with above definition. Many examples and test apps initialize the field > to 0 explicitly. The most of drivers simply ignore split_hdr_size since > the offload is not advertised, but some double-check that its value is 0. > > So the RTE_ETH_RX_OFFLOAD_HEADER_SPLIT and split_header_size field > will be removed in DPDK 22.11. After DPDK 22.11 LTS, the > RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT can still be used for per-queue Rx > packet split offload, which is configured by rte_eth_rxseg_split. > > Signed-off-by: Xuan Ding > Acked-by: Ray Kinsella > Acked-by: Andrew Rybchenko > Acked-by: Ferruh Yigit > Acked-by: Viacheslav Ovsiienko Acked-by: Thomas Monjalon This v2 is a lot better for the users to understand what happens. Fixed the indent and moved few words: * ethdev: Since no single PMD supports ``RTE_ETH_RX_OFFLOAD_HEADER_SPLIT`` offload and the ``split_hdr_size`` field in structure ``rte_eth_rxmode`` to enable per-port header split, they will be removed in DPDK 22.11. The per-queue Rx packet split offload ``RTE_ETH_RX_OFFLOAD_BUFFER_SPLIT`` can still be used, and it is configured by ``rte_eth_rxseg_split``. Applied, thanks.
RE: [PATCH v1 1/2] eal: add lcore busyness telemetry
> Subject: RE: [PATCH v1 1/2] eal: add lcore busyness telemetry > > > From: Anatoly Burakov [mailto:anatoly.bura...@intel.com] > > Sent: Friday, 15 July 2022 15.13 > > > > Currently, there is no way to measure lcore busyness in a passive way, > > without any modifications to the application. This patch adds a new > > EAL API that will be able to passively track core busyness. > > > > The busyness is calculated by relying on the fact that most DPDK API's > > will poll for packets. > > This is an "alternative fact"! Only run-to-completion applications polls for > RX. > Pipelined applications do not poll for packets in every pipeline stage. I guess you meant, poll for packets from NIC. They still need to receive packets from queues. We could do a similar thing for rte_ring APIs. > > > Empty polls can be counted as "idle", while non-empty polls can be > > counted as busy. To measure lcore busyness, we simply call the > > telemetry timestamping function with the number of polls a particular > > code section has processed, and count the number of cycles we've spent > > processing empty bursts. The more empty bursts we encounter, the less > > cycles we spend in "busy" state, and the less core busyness will be > > reported. > > > > In order for all of the above to work without modifications to the > > application, the library code needs to be instrumented with calls to > > the lcore telemetry busyness timestamping function. The following > > parts of DPDK are instrumented with lcore telemetry calls: > > > > - All major driver API's: > > - ethdev > > - cryptodev > > - compressdev > > - regexdev > > - bbdev > > - rawdev > > - eventdev > > - dmadev > > - Some additional libraries: > > - ring > > - distributor > > > > To avoid performance impact from having lcore telemetry support, a > > global variable is exported by EAL, and a call to timestamping > > function is wrapped into a macro, so that whenever telemetry is > > disabled, it only takes one additional branch and no function calls > > are performed. It is also possible to disable it at compile time by > > commenting out RTE_LCORE_BUSYNESS from build config. > > Since all of this can be completely disabled at build time, and thus has > exactly > zero performance impact, I will not object to this patch. > > > > > This patch also adds a telemetry endpoint to report lcore busyness, as > > well as telemetry endpoints to enable/disable lcore telemetry. > > > > Signed-off-by: Kevin Laatz > > Signed-off-by: Conor Walsh > > Signed-off-by: David Hunt > > Signed-off-by: Anatoly Burakov > > --- > > > > Notes: > > We did a couple of quick smoke tests to see if this patch causes > > any performance > > degradation, and it seemed to have none that we could measure. > > Telemetry can be > > disabled at compile time via a config option, while at runtime it > > can be > > disabled, seemingly at a cost of one additional branch. > > > > That said, our benchmarking efforts were admittedly not very > > rigorous, so > > comments welcome! > > This patch does not reflect lcore business, it reflects some sort of ingress > activity level. > > All the considerations regarding non-intrusiveness and low overhead are > good, but everything in this patch needs to be renamed to reflect what it > truly > does, so it is clear that pipelined applications cannot use this telemetry for > measuring lcore business (except on the ingress pipeline stage). > > It's a shame that so much effort clearly has gone into this patch, and no one > stopped to consider pipelined applications. :-(
RE: [Bug 1053] ConnectX6 / mlx5 DPDK - bad RSS/ rte_flow performance on mixed traffic ( rxq_cqe_comp_en=4 )
Hello, Can you please share the output of xstats? Regards, Asaf Penso >-Original Message- >From: bugzi...@dpdk.org >Sent: Friday, July 15, 2022 7:07 PM >To: dev@dpdk.org >Subject: [Bug 1053] ConnectX6 / mlx5 DPDK - bad RSS/ rte_flow performance >on mixed traffic ( rxq_cqe_comp_en=4 ) > >https://bugs.dpdk.org/show_bug.cgi?id=1053 > >Bug ID: 1053 > Summary: ConnectX6 / mlx5 DPDK - bad RSS/ rte_flow performance >on mixed traffic ( rxq_cqe_comp_en=4 ) > Product: DPDK > Version: 21.11 > Hardware: x86 >OS: Linux >Status: UNCONFIRMED > Severity: normal > Priority: Normal > Component: ethdev > Assignee: dev@dpdk.org > Reporter: r...@gmx.net > Target Milestone: --- > >Our team has been chasing major performance issues with ConnectX6 cards. > >*Customer challenge:* >Flow stable ( symmetric RSS) load-balancing of flows to 8 worker lcores. > >*Observation:* >Performance is fine up to 100Gbps using either tcp *or* udp-only traffic >profiles. >Mixed traffic drops down to 50% loss with all packets showing up as xstats: >rx_phy_discard_packets > >card infos at end of email. > > >There appears to be a huge performance issue on mixed UDP/TCP using >symmetric load-balancing accross multiple workers. >E.g. compiling a DPDK v20.11 or newer test-pmd apps: > > >> sudo ./dpdk-testpmd -n 8 -l 4,6,8,10,12,14,16,18,20 -a > >> :4b:00.0,rxq_cqe_comp_en=4 -a :4b:00.1,rxq_cqe_comp_en=4 -- >> --forward-> mode=mac --rxq=8 --txq=8 --nb-cores=8 --numa -i -a >> --forward-> --disable-rss > > >and configuring: > > >> flow create 0 ingress pattern eth / ipv4 / tcp / end actions rss types >> ipv4-tcp > end queues 0 1 2 3 4 5 6 7 end key > >> >6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5 >A6D5A6D5A6D >> 5A6D5A6D5A >> / > end > >> flow create 0 ingress pattern eth / ipv4 / udp / end actions rss types >> ipv4-udp > end queues 0 1 2 3 4 5 6 7 end key > >> >6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5 >A6D5A6D5A6D >> 5A6D5A6D5A >> / > end > >> flow create 1 ingress pattern eth / ipv4 / tcp / end actions rss types >> ipv4-tcp > end queues 0 1 2 3 4 5 6 7 end key > >> >6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5 >A6D5A6D5A6D >> 5A6D5A6D5A >> / > end > >> flow create 1 ingress pattern eth / ipv4 / udp / end actions rss types >> ipv4-udp > end queues 0 1 2 3 4 5 6 7 end key > >> >6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5A6D5 >A6D5A6D5A6D >> 5A6D5A6D5A >> / > end > >will see *significant* packet drops at a load > 50 Gbps on any type of mixed >UDP/TCP traffic. E.g. > >https://github.com/cisco-system-traffic-generator/trex- >core/blob/master/scripts/cap2/sfr3.yaml >Whenever those packet drops occur, I see those in the xstats as >"rx_phy_discard_packets" > > >On the other hand using a TCP-or UDP-only traffic profile perfectly scales up >to 100Gbps w/o drops. > > >Thanks for your help! > > > >> {code} >> ConnectX6DX > > > >> >> psid="DEL27" > >> partNumber="0F6FXM_08P2T2_Ax"> >> >> >> >> >> >> {code} > >-- >You are receiving this mail because: >You are the assignee for the bug.