Re: [PATCH 1/2] net/virtio: propagate return value of called function
On 3/28/23 06:14, Xia, Chenbo wrote: -Original Message- From: Boleslav Stankevich Sent: Wednesday, March 22, 2023 6:23 PM To: dev@dpdk.org Cc: Boleslav Stankevich ; sta...@dpdk.org; Andrew Rybchenko ; Maxime Coquelin ; Xia, Chenbo ; David Marchand ; Hyong Youb Kim ; Harman Kalra Subject: [PATCH 1/2] net/virtio: propagate return value of called function rte_intr_vec_list_alloc() may fail because of different reasons which are indicated by different negative errno values. Fixes: d61138d4f0e2 ("drivers: remove direct access to interrupt handle") Cc: sta...@dpdk.org Signed-off-by: Boleslav Stankevich Signed-off-by: Andrew Rybchenko I see Boleslav's email is updated in mailmap file but patchwork is still complaining about it. @Adrew & Maxime, Do you know why? My idea was that next-virtio was not updated yet at that moment. Don't know how to check it. May be just resent? Andrew.
Re: [PATCH] common/sfc_efx/base: support link status change v2 events
On 3/28/23 19:51, Ivan Malov wrote: FW should send link status change events in either v1 or v2 format depending on the preference which the driver can express during CMD_DRV_ATTACH stage. At the moment, libefx does not request v2, so v1 events must arrive. However, FW does not honour this choice and always sends v2 events. So teach libefx to parse such and add v2 request to CMD_DRV_ATTACH, correspondingly. Signed-off-by: Ivan Malov Reviewed-by: Andy Moreton Acked-by: Andrew Rybchenko
RE: [PATCH v5 1/3] ethdev: add API for buffer recycle mode
> From: Feifei Wang [mailto:feifei.wa...@arm.com] > Sent: Thursday, 30 March 2023 08.30 > [...] > +/** > + * @internal > + * Rx routine for rte_eth_dev_buf_recycle(). > + * Refill Rx descriptors in buffer recycle mode. > + * > + * @note > + * This API can only be called by rte_eth_dev_buf_recycle(). > + * Before calling this API, rte_eth_tx_buf_stash() should be > + * called to stash Tx used buffers into Rx buffer ring. > + * > + * When this functionality is not implemented in the driver, the return > + * buffer number is 0. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * @param queue_id > + * The index of the receive queue. > + * The value must be in the range [0, nb_rx_queue - 1] previously supplied > + * to rte_eth_dev_configure(). > + *@param nb > + * The number of Rx descriptors to be refilled. > + * @return > + * The number Rx descriptors correct to be refilled. > + * - ENODEV: bad port or queue (only if compiled with debug). If you want errors reported by the return value, the function return type cannot be uint16_t. > + */ > +static inline uint16_t rte_eth_rx_descriptors_refill(uint16_t port_id, > + uint16_t queue_id, uint16_t nb) > +{ > + struct rte_eth_fp_ops *p; > + void *qd; > + > +#ifdef RTE_ETHDEV_DEBUG_RX > + if (port_id >= RTE_MAX_ETHPORTS || > + queue_id >= RTE_MAX_QUEUES_PER_PORT) { > + RTE_ETHDEV_LOG(ERR, > + "Invalid port_id=%u or queue_id=%u\n", > + port_id, queue_id); > + rte_errno = ENODEV; > + return 0; If p->rx_descriptors_refill() is likely to return 0, this function should not use 0 as return value to indicate errors. > + } > +#endif > + > + p = &rte_eth_fp_ops[port_id]; > + qd = p->rxq.data[queue_id]; > + > +#ifdef RTE_ETHDEV_DEBUG_RX > + if (!rte_eth_dev_is_valid_port(port_id)) { > + RTE_ETHDEV_LOG(ERR, "Invalid Rx port_id=%u\n", port_id); > + rte_errno = ENODEV; > + return 0; > + > + if (qd == NULL) { > + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for port_id=%u\n", > + queue_id, port_id); > + rte_errno = ENODEV; > + return 0; > + } > +#endif > + > + if (p->rx_descriptors_refill == NULL) > + return 0; > + > + return p->rx_descriptors_refill(qd, nb); > +} > + > /**@{@name Rx hardware descriptor states > * @see rte_eth_rx_descriptor_status > */ > @@ -6483,6 +6597,122 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t queue_id, > return rte_eth_tx_buffer_flush(port_id, queue_id, buffer); > } > > +/** > + * @internal > + * Tx routine for rte_eth_dev_buf_recycle(). > + * Stash Tx used buffers into Rx buffer ring in buffer recycle mode. > + * > + * @note > + * This API can only be called by rte_eth_dev_buf_recycle(). > + * After calling this API, rte_eth_rx_descriptors_refill() should be > + * called to refill Rx ring descriptors. > + * > + * When this functionality is not implemented in the driver, the return > + * buffer number is 0. > + * > + * @param port_id > + * The port identifier of the Ethernet device. > + * @param queue_id > + * The index of the transmit queue. > + * The value must be in the range [0, nb_tx_queue - 1] previously supplied > + * to rte_eth_dev_configure(). > + * @param rxq_buf_recycle_info > + * A pointer to a structure of Rx queue buffer ring information in buffer > + * recycle mode. > + * > + * @return > + * The number buffers correct to be filled in the Rx buffer ring. > + * - ENODEV: bad port or queue (only if compiled with debug). If you want errors reported by the return value, the function return type cannot be uint16_t. > + */ > +static inline uint16_t rte_eth_tx_buf_stash(uint16_t port_id, uint16_t > queue_id, > + struct rte_eth_rxq_buf_recycle_info *rxq_buf_recycle_info) > +{ > + struct rte_eth_fp_ops *p; > + void *qd; > + > +#ifdef RTE_ETHDEV_DEBUG_TX > + if (port_id >= RTE_MAX_ETHPORTS || > + queue_id >= RTE_MAX_QUEUES_PER_PORT) { > + RTE_ETHDEV_LOG(ERR, > + "Invalid port_id=%u or queue_id=%u\n", > + port_id, queue_id); > + rte_errno = ENODEV; > + return 0; If p->tx_buf_stash() is likely to return 0, this function should not use 0 as return value to indicate errors. > + } > +#endif > + > + p = &rte_eth_fp_ops[port_id]; > + qd = p->txq.data[queue_id]; > + > +#ifdef RTE_ETHDEV_DEBUG_TX > + if (!rte_eth_dev_is_valid_port(port_id)) { > + RTE_ETHDEV_LOG(ERR, "Invalid Tx port_id=%u\n", port_id); > + rte_errno = ENODEV; > + return 0; > + > + if (qd == NULL) { > + RTE_ETHDEV_LOG(ERR, "Invalid Tx queue_id=%u for port_id=%u\n", > + queue_id, port_id); > + rte_erno = ENODEV; > + return 0;
RE: [PATCH 2/2] net/gve: update copyright holders
> -Original Message- > From: Thomas Monjalon > Sent: Wednesday, March 29, 2023 22:07 > To: Ferruh Yigit ; Zhang, Qi Z > ; Wu, Jingjing ; Xing, > Beilei ; Guo, Junfeng > Cc: dev@dpdk.org; Rushil Gupta ; Joshua > Washington ; Jeroen de Borst > > Subject: Re: [PATCH 2/2] net/gve: update copyright holders > > 28/03/2023 11:35, Guo, Junfeng: > > The background is that, in the past (DPDK 22.11) we didn't get the > approval > > of license from Google, thus chose the MIT License for the base code, > and > > BSD-3 License for GVE common code (without the files in /base folder). > > We also left the copyright holder of base code just to Google Inc, and > made > > Intel as the copyright holder of GVE common code (without /base > folder). > > > > Today we are working together for GVE dev and maintaining. And we > got > > the approval of BSD-3 License from Google for the base code. > > Thus we dicided to 1) switch the License of GVE base code from MIT to > BSD-3; > > 2) add Google LLC as one of the copyright holders for GVE common > code. > > Do you realize we had lenghty discussions in the Technical Board, > the Governing Board, and with lawyers, just for this unneeded exception? > > Now looking at the patches, there seem to be some big mistakes like > removing some copyright. I don't understand how it can be taken so > lightly. > > I regret how fast we were, next time we will surely operate differently. > If you want to improve the reputation of this driver, > please ask other copyright holders to be more active and responsive. > Really sorry for causing such severe trouble. Yes, we did take lots of efforts in the Technical Board and the Governing Board about this MIT exception. We really appreciate that. About this patch set, it is my severe mistake to switch the MIT License directly for the upstream-ed code in community, in the wrong way. In the past we upstream-ed this driver with MIT License followed from the kernel community's gve driver base code. And now we want to use the code with BSD-3 License (approved by Google). So I suppose that the correct way may be 1) first remove all these code under MIT License and 2) then add the new files under BSD-3 License. Please correct me if there are still misunderstanding in my statement. Thanks Thomas for pointing out my mistake. I'll be careful to fix this. Copyright holder for the gve base code will stay unchanged. Google LLC will be added as one of the copyright holders for the gve common code. @Rushil Gupta Please also be more active and responsive for the code review and contribution in the community. Thanks! >
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > Hi, > > While trying to port some code to VPP (which uses DPDK as the backend > driver), I am running into a problem that calls to API's like > rte_timer_subsystem_init, rte_hash_create are failing while allocation > of memory. > > This is presumably because VPP inits the EAL with the following arguments -- > > -in-memory --no-telemetry --file-prefix vpp > > Is there is something that can be done eg. passing some more parms in > the EAL initialization which hopefully wouldn't break VPP but will > also be friendly to the RTE timer and hash functions too, that would > be great, so requesting some advice here. > Hi, can you provide some more details on what the errors are that you are receiving? Have you been able to dig a little deeper into what might be causing the memory failures? The above flags alone are unlikely to cause issues with hash or timer libraries, for example. /Bruce
RE: release candidate 23.03-rc4
> -Original Message- > From: Thomas Monjalon > Sent: Wednesday, March 29, 2023 3:34 AM > To: annou...@dpdk.org > Subject: release candidate 23.03-rc4 > > A new DPDK release candidate is ready for testing: > https://git.dpdk.org/dpdk/tag/?id=v23.03-rc4 > > There are 42 new patches in this snapshot. > > Release notes: > https://doc.dpdk.org/guides/rel_notes/release_23_03.html > > This is the last release candidate. > Only documentation should be updated before the release. > > Reviews of deprecation notices are required: > https://patches.dpdk.org/bundle/dmarchand/deprecation_notices > > You may share some release validation results by replying to this message at > dev@dpdk.org and by adding tested hardware in the release notes. > > Please think about sharing your roadmap now for DPDK 23.07. > > Thank you everyone > Update the test status for Intel part. Till now dpdk23.03-rc4 test execution rate is 90%, no new issue is found. # Basic Intel(R) NIC testing * Build or compile: *Build: cover the build test combination with latest GCC/Clang version and the popular OS revision such as Ubuntu20.04.5, Ubuntu22.04.1, Fedora37, RHEL8.6/9.1 etc. - All test passed. *Compile: cover the CFLAGES(O0/O1/O2/O3) with popular OS such as Ubuntu22.04.1 and RHEL8.6. - All test passed. * Meson test & Asan test: known issues: - https://bugs.dpdk.org/show_bug.cgi?id=1024 [dpdk-22.07][meson test] driver-tests/link_bonding_mode4_autotest bond handshake failed - Not fix yet. - https://bugs.dpdk.org/show_bug.cgi?id=1107 [22.11-rc1][meson test] seqlock_autotest test failed. - Special issue with gcc 4.8.5. * PF/VF(i40e, ixgbe): test scenarios including PF/VF-RTE_FLOW/TSO/Jumboframe/checksum offload/VLAN/VXLAN, etc. - All test done. No new issue is found. * PF/VF(ice): test scenarios including Switch features/Package Management/Flow Director/Advanced Tx/Advanced RSS/ACL/DCF/Flexible Descriptor, etc. - Execution rate is 95%. No new issue is found. * Intel NIC single core/NIC performance: test scenarios including PF/VF single core performance test, RFC2544 Zero packet loss performance test, etc. - All test done. No new issue is found. * Power and IPsec: * Power: test scenarios including bi-direction/Telemetry/Empty Poll Lib/Priority Base Frequency, etc. - All test done. No new issue is found. * IPsec: test scenarios including ipsec/ipsec-gw/ipsec library basic test - QAT&SW/FIB library, etc. - On going. # Basic cryptodev and virtio testing * Virtio: both function and performance test are covered. Such as PVP/Virtio_loopback/virtio-user loopback/virtio-net VM2VM perf testing/VMAWARE ESXI 8.0, etc. - All test done. No new issue is found. * Cryptodev: *Function test: test scenarios including Cryptodev API testing/CompressDev ISA-L/QAT/ZLIB PMD Testing/FIPS, etc. - On going. *Performance test: test scenarios including Throughput Performance /Cryptodev Latency, etc. - On going. Regards, Xu, Hailin
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
Hi, The hash creation API throws the following error -- RING: Cannot reserve memory for tailq HASH: memory allocation failed The timer subsystem init api throws this error -- EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested memzone segments exceeds RTE_MAX_MEMZONE I did check the code and apparently the memzone and rte zmalloc related api's are not being able to allocate memory. Regards -Prashant On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson wrote: > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > Hi, > > > > While trying to port some code to VPP (which uses DPDK as the backend > > driver), I am running into a problem that calls to API's like > > rte_timer_subsystem_init, rte_hash_create are failing while allocation > > of memory. > > > > This is presumably because VPP inits the EAL with the following arguments -- > > > > -in-memory --no-telemetry --file-prefix vpp > > > > Is there is something that can be done eg. passing some more parms in > > the EAL initialization which hopefully wouldn't break VPP but will > > also be friendly to the RTE timer and hash functions too, that would > > be great, so requesting some advice here. > > > Hi, > > can you provide some more details on what the errors are that you are > receiving? Have you been able to dig a little deeper into what might be > causing the memory failures? The above flags alone are unlikely to cause > issues with hash or timer libraries, for example. > > /Bruce
[Bug 1203] ice: cannot create 2 rte_flows with 2 actions, only with 1 action
https://bugs.dpdk.org/show_bug.cgi?id=1203 Bug ID: 1203 Summary: ice: cannot create 2 rte_flows with 2 actions, only with 1 action Product: DPDK Version: 23.03 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: maxime.le...@6wind.com Target Milestone: --- kernel driver: 1.10.1.2.2 firmware-version: 4.10 0x80015191 1.3310.0 COMMS DDP: 1.3.37 ICE OS Default Package version 1.3.30.0 testpmd cmdline: ./build/app/dpdk-testpmd --log-level=.*ice.*,debug --legacy-mem -c 7 -a 17:00.0 -a :17:00.1 -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=2048 RTE_FLOWS rules can be created -- With queue action: testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions queue index 0 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions queue index 0 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #1 created With mark action: testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions mark id 1 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions mark id 1 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #1 created RTE_FLOWS rules cannot be created - with mark + queue action: testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions mark id 1 / queue index 0 / end ice_fdir_rx_parsing_enable(): FDIR processing on RX set to 1 ice_flow_create(): Succeeded to create (1) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions mark id 1 / queue index 0 / end ice_fdir_cross_prof_conflict(): Failed to create profile for flow type 1 due to conflict with existing rule of flow type 4. ice_flow_create(): Failed to create flow port_flow_complain(): Caught PMD error type 2 (flow rule (handle)): Profile configure failed.: Invalid argument with mark + passthru action: testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions mark id 1 / passthru / end ice_fdir_rx_parsing_enable(): FDIR processing on RX set to 1 ice_flow_create(): Succeeded to create (1) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions mark id 1 / passthru / end ice_fdir_cross_prof_conflict(): Failed to create profile for flow type 1 due to conflict with existing rule of flow type 4. ice_flow_create(): Failed to create flow port_flow_complain(): Caught PMD error type 2 (flow rule (handle)): Profile configure failed.: Invalid argument Question 1. Does ice nics support to have several actions ? 2. What is the difference between MARK vs MARK+PASSTRHU ? It seems to be the same: http://git.dpdk.org/dpdk/commit/?id=0f664f7d57268f9ab9bdef95f0d48b3ce5004a61 In this case, there is no reason to support MARK and not MARK+PASSTRHU with 2 flows. -- You are receiving this mail because: You are the assignee for the bug.
[Bug 1204] ice: cannot create 2 rte_flows with MARK actions with dpdk 22.11.1, but can with dpdk 23.03.0-rc4
https://bugs.dpdk.org/show_bug.cgi?id=1204 Bug ID: 1204 Summary: ice: cannot create 2 rte_flows with MARK actions with dpdk 22.11.1, but can with dpdk 23.03.0-rc4 Product: DPDK Version: 22.11 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: maxime.le...@6wind.com Target Milestone: --- kernel driver: 1.10.1.2.2 firmware-version: 4.10 0x80015191 1.3310.0 COMMS DDP: 1.3.37 ICE OS Default Package version 1.3.30.0 testpmd cmdline: ./build/app/dpdk-testpmd --log-level=.*ice.*,debug --legacy-mem -c 7 -a :17:00.0 -a :17:00.1 -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=2048 With dpdk 23.03.0-rc4 - testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions mark id 1 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions mark id 1 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #1 created With dpdk 22.11.1 -- testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions mark id 1 / end ice_fdir_rx_parsing_enable(): FDIR processing on RX set to 1 ice_flow_create(): Succeeded to create (1) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions mark id 1 / end ice_fdir_cross_prof_conflict(): Failed to create profile for flow type 1 due to conflict with existing rule of flow type 4. ice_flow_create(): Failed to create flow port_flow_complain(): Caught PMD error type 2 (flow rule (handle)): Profile configure failed.: Invalid argument -- You are receiving this mail because: You are the assignee for the bug.
RE: [PATCH v1] raw/ifpga: check afu device before unplug
> -Original Message- > From: Huang, Wei > Sent: Monday, March 27, 2023 5:42 AM > To: dev@dpdk.org; tho...@monjalon.net; david.march...@redhat.com > Cc: sta...@dpdk.org; Xu, Rosen ; Zhang, Tianfei > ; Zhang, Qi Z ; Huang, Wei > > Subject: [PATCH v1] raw/ifpga: check afu device before unplug > > AFU device may be already unplugged in IFPGA bus cleanup process, unplug AFU > device only when it exists. > > Signed-off-by: Wei Huang > --- > drivers/raw/ifpga/ifpga_rawdev.c | 16 +++- > 1 file changed, 15 insertions(+), 1 deletion(-) > > diff --git a/drivers/raw/ifpga/ifpga_rawdev.c > b/drivers/raw/ifpga/ifpga_rawdev.c > index 1020adc..0d43c87 100644 > --- a/drivers/raw/ifpga/ifpga_rawdev.c > +++ b/drivers/raw/ifpga/ifpga_rawdev.c > @@ -29,6 +29,7 @@ > #include > #include > #include > +#include > > #include "base/opae_hw_api.h" > #include "base/opae_ifpga_hw_api.h" > @@ -1832,12 +1833,19 @@ static int ifpga_rawdev_get_string_arg(const char *key > __rte_unused, > return ret; > } > > +static int cmp_dev_name(const struct rte_device *dev, const void > +*_name) { > + const char *name = _name; > + return strcmp(dev->name, name); > +} > + > static int > ifpga_cfg_remove(struct rte_vdev_device *vdev) { > struct rte_rawdev *rawdev = NULL; > struct ifpga_rawdev *ifpga_dev; > struct ifpga_vdev_args args; > + struct rte_bus *bus; > char dev_name[RTE_RAWDEV_NAME_MAX_LEN]; > const char *vdev_name = NULL; > char *tmp_vdev = NULL; > @@ -1864,7 +1872,13 @@ static int ifpga_rawdev_get_string_arg(const char *key > __rte_unused, > > snprintf(dev_name, RTE_RAWDEV_NAME_MAX_LEN, "%d|%s", > args.port, args.bdf); > - ret = rte_eal_hotplug_remove(RTE_STR(IFPGA_BUS_NAME), dev_name); > + bus = rte_bus_find_by_name(RTE_STR(IFPGA_BUS_NAME)); > + if (bus) { > + if (bus->find_device(NULL, cmp_dev_name, dev_name)) { > + ret = > rte_eal_hotplug_remove(RTE_STR(IFPGA_BUS_NAME), > + dev_name); > + } > + } > It looks good for me. Acked-by: Tianfei Zhang
RE: [PATCH] maintainers: update for FIPS validation
Hi Gowrishankar, > -Original Message- > From: Gowrishankar Muthukrishnan > Sent: Wednesday 29 March 2023 12:01 > To: dev@dpdk.org > Cc: jer...@marvell.com; ano...@marvell.com; Akhil Goyal > ; Dooley, Brian ; > Gowrishankar Muthukrishnan > Subject: [PATCH] maintainers: update for FIPS validation > > Add co-maintainer for FIPS validation example. > > Signed-off-by: Gowrishankar Muthukrishnan > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index 280058adfc..8df23e5099 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -1809,6 +1809,7 @@ F: doc/guides/sample_app_ug/ethtool.rst > > FIPS validation example > M: Brian Dooley > +M: Gowrishankar Muthukrishnan > F: examples/fips_validation/ > F: doc/guides/sample_app_ug/fips_validation.rst > > -- > 2.25.1 Acked-by: Brian Dooley
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > Hi, > FYI, when replying on list, it's best not to top-post, but put your replies below the email snippet you are replying to. > The hash creation API throws the following error -- > RING: Cannot reserve memory for tailq > HASH: memory allocation failed > > The timer subsystem init api throws this error -- > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > memzone segments exceeds RTE_MAX_MEMZONE > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's rte_config.h file, so edit that and then rebuild DPDK. [If you are using the built-in DPDK from VPP, you may need to do a patch for this, add it into the VPP patches direction and then do a VPP rebuild.] Let's see if we can get rid of at least one of the error messages. :-) /Bruce > I did check the code and apparently the memzone and rte zmalloc > related api's are not being able to allocate memory. > > Regards > -Prashant > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > wrote: > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > > Hi, > > > > > > While trying to port some code to VPP (which uses DPDK as the backend > > > driver), I am running into a problem that calls to API's like > > > rte_timer_subsystem_init, rte_hash_create are failing while allocation > > > of memory. > > > > > > This is presumably because VPP inits the EAL with the following arguments > > > -- > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > Is there is something that can be done eg. passing some more parms in > > > the EAL initialization which hopefully wouldn't break VPP but will > > > also be friendly to the RTE timer and hash functions too, that would > > > be great, so requesting some advice here. > > > > > Hi, > > > > can you provide some more details on what the errors are that you are > > receiving? Have you been able to dig a little deeper into what might be > > causing the memory failures? The above flags alone are unlikely to cause > > issues with hash or timer libraries, for example. > > > > /Bruce
RE: [PATCH v5 1/3] ethdev: add API for buffer recycle mode
> -Original Message- > From: Morten Brørup > Sent: Thursday, March 30, 2023 3:19 PM > To: Feifei Wang ; tho...@monjalon.net; Ferruh > Yigit ; Andrew Rybchenko > > Cc: dev@dpdk.org; konstantin.v.anan...@yandex.ru; nd ; > Honnappa Nagarahalli ; Ruifeng Wang > > Subject: RE: [PATCH v5 1/3] ethdev: add API for buffer recycle mode > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > Sent: Thursday, 30 March 2023 08.30 > > > > [...] > > > +/** > > + * @internal > > + * Rx routine for rte_eth_dev_buf_recycle(). > > + * Refill Rx descriptors in buffer recycle mode. > > + * > > + * @note > > + * This API can only be called by rte_eth_dev_buf_recycle(). > > + * Before calling this API, rte_eth_tx_buf_stash() should be > > + * called to stash Tx used buffers into Rx buffer ring. > > + * > > + * When this functionality is not implemented in the driver, the > > +return > > + * buffer number is 0. > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param queue_id > > + * The index of the receive queue. > > + * The value must be in the range [0, nb_rx_queue - 1] previously > supplied > > + * to rte_eth_dev_configure(). > > + *@param nb > > + * The number of Rx descriptors to be refilled. > > + * @return > > + * The number Rx descriptors correct to be refilled. > > + * - ENODEV: bad port or queue (only if compiled with debug). > > If you want errors reported by the return value, the function return type > cannot be uint16_t. Agree. Actually, in the code path, if errors happen, the function will return 0. For this description line, I refer to 'rte_eth_tx_prepare' notes. Maybe we should delete this line. > > > + */ > > +static inline uint16_t rte_eth_rx_descriptors_refill(uint16_t port_id, > > + uint16_t queue_id, uint16_t nb) > > +{ > > + struct rte_eth_fp_ops *p; > > + void *qd; > > + > > +#ifdef RTE_ETHDEV_DEBUG_RX > > + if (port_id >= RTE_MAX_ETHPORTS || > > + queue_id >= RTE_MAX_QUEUES_PER_PORT) { > > + RTE_ETHDEV_LOG(ERR, > > + "Invalid port_id=%u or queue_id=%u\n", > > + port_id, queue_id); > > + rte_errno = ENODEV; > > + return 0; > > If p->rx_descriptors_refill() is likely to return 0, this function should not > use 0 > as return value to indicate errors. However, refer to dpdk code style in ethdev, most of API write like this. For example, 'rte_eth_rx/tx_burst', 'rte_eth_tx_prep'. I'm also confused what's return type for this due to I want to indicate errors and show the processed buffer number. > > > + } > > +#endif > > + > > + p = &rte_eth_fp_ops[port_id]; > > + qd = p->rxq.data[queue_id]; > > + > > +#ifdef RTE_ETHDEV_DEBUG_RX > > + if (!rte_eth_dev_is_valid_port(port_id)) { > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx port_id=%u\n", port_id); > > + rte_errno = ENODEV; > > + return 0; > > + > > + if (qd == NULL) { > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for > port_id=%u\n", > > + queue_id, port_id); > > + rte_errno = ENODEV; > > + return 0; > > + } > > +#endif > > + > > + if (p->rx_descriptors_refill == NULL) > > + return 0; > > + > > + return p->rx_descriptors_refill(qd, nb); } > > + > > /**@{@name Rx hardware descriptor states > > * @see rte_eth_rx_descriptor_status > > */ > > @@ -6483,6 +6597,122 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t > queue_id, > > return rte_eth_tx_buffer_flush(port_id, queue_id, buffer); } > > > > +/** > > + * @internal > > + * Tx routine for rte_eth_dev_buf_recycle(). > > + * Stash Tx used buffers into Rx buffer ring in buffer recycle mode. > > + * > > + * @note > > + * This API can only be called by rte_eth_dev_buf_recycle(). > > + * After calling this API, rte_eth_rx_descriptors_refill() should be > > + * called to refill Rx ring descriptors. > > + * > > + * When this functionality is not implemented in the driver, the > > +return > > + * buffer number is 0. > > + * > > + * @param port_id > > + * The port identifier of the Ethernet device. > > + * @param queue_id > > + * The index of the transmit queue. > > + * The value must be in the range [0, nb_tx_queue - 1] previously > supplied > > + * to rte_eth_dev_configure(). > > + * @param rxq_buf_recycle_info > > + * A pointer to a structure of Rx queue buffer ring information in buffer > > + * recycle mode. > > + * > > + * @return > > + * The number buffers correct to be filled in the Rx buffer ring. > > + * - ENODEV: bad port or queue (only if compiled with debug). > > If you want errors reported by the return value, the function return type > cannot be uint16_t. > > > + */ > > +static inline uint16_t rte_eth_tx_buf_stash(uint16_t port_id, > > +uint16_t > > queue_id, > > + struct rte_eth_rxq_buf_recycle_info *rxq_buf_recycle_info) > { > > + struct rte_eth_fp_ops *p; > > + void *qd; > > + > > +#ifdef R
[PATCH] examples/ipsec-secgw: fix zero address in ethernet header
During port init, src address stored in ethaddr_tbl is typecast which violates the stric-aliasing rule and not reflecting the updated source address in processed packets too. Fixes: 6eb3ba0399 ("examples/ipsec-secgw: support poll mode NEON LPM lookup") Signed-off-by: Rahul Bhansali --- examples/ipsec-secgw/ipsec-secgw.c | 20 ++-- examples/ipsec-secgw/ipsec-secgw.h | 2 +- 2 files changed, 11 insertions(+), 11 deletions(-) diff --git a/examples/ipsec-secgw/ipsec-secgw.c b/examples/ipsec-secgw/ipsec-secgw.c index d2d9d85b4a..029749e522 100644 --- a/examples/ipsec-secgw/ipsec-secgw.c +++ b/examples/ipsec-secgw/ipsec-secgw.c @@ -99,10 +99,10 @@ uint32_t qp_desc_nb = 2048; #define MTU_TO_FRAMELEN(x) ((x) + RTE_ETHER_HDR_LEN + RTE_ETHER_CRC_LEN) struct ethaddr_info ethaddr_tbl[RTE_MAX_ETHPORTS] = { - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a) }, - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9) }, - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x08, 0x69, 0x26) }, - { 0, ETHADDR(0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd) } + { {{0}}, {{0x00, 0x16, 0x3e, 0x7e, 0x94, 0x9a}} }, + { {{0}}, {{0x00, 0x16, 0x3e, 0x22, 0xa1, 0xd9}} }, + { {{0}}, {{0x00, 0x16, 0x3e, 0x08, 0x69, 0x26}} }, + { {{0}}, {{0x00, 0x16, 0x3e, 0x49, 0x9e, 0xdd}} } }; struct offloads tx_offloads; @@ -1427,9 +1427,8 @@ add_dst_ethaddr(uint16_t port, const struct rte_ether_addr *addr) if (port >= RTE_DIM(ethaddr_tbl)) return -EINVAL; - ethaddr_tbl[port].dst = ETHADDR_TO_UINT64(addr); - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[port].dst, - (struct rte_ether_addr *)(val_eth + port)); + rte_ether_addr_copy(addr, ðaddr_tbl[port].dst); + rte_ether_addr_copy(addr, (struct rte_ether_addr *)(val_eth + port)); return 0; } @@ -1907,11 +1906,12 @@ port_init(uint16_t portid, uint64_t req_rx_offloads, uint64_t req_tx_offloads, "Error getting MAC address (port %u): %s\n", portid, rte_strerror(-ret)); - ethaddr_tbl[portid].src = ETHADDR_TO_UINT64(ðaddr); + rte_ether_addr_copy(ðaddr, ðaddr_tbl[portid].src); - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].dst, + rte_ether_addr_copy(ðaddr_tbl[portid].dst, (struct rte_ether_addr *)(val_eth + portid)); - rte_ether_addr_copy((struct rte_ether_addr *)ðaddr_tbl[portid].src, + + rte_ether_addr_copy(ðaddr_tbl[portid].src, (struct rte_ether_addr *)(val_eth + portid) + 1); print_ethaddr("Address: ", ðaddr); diff --git a/examples/ipsec-secgw/ipsec-secgw.h b/examples/ipsec-secgw/ipsec-secgw.h index 0e0012d058..53665adf03 100644 --- a/examples/ipsec-secgw/ipsec-secgw.h +++ b/examples/ipsec-secgw/ipsec-secgw.h @@ -84,7 +84,7 @@ struct ipsec_traffic_nb { /* port/source ethernet addr and destination ethernet addr */ struct ethaddr_info { - uint64_t src, dst; + struct rte_ether_addr src, dst; }; struct ipsec_spd_stats { -- 2.25.1
[Bug 1205] iavf: cannot create 2 rte_flows with E810 VF, but can with E810 PF
https://bugs.dpdk.org/show_bug.cgi?id=1205 Bug ID: 1205 Summary: iavf: cannot create 2 rte_flows with E810 VF, but can with E810 PF Product: DPDK Version: 23.03 Hardware: All OS: All Status: UNCONFIRMED Severity: normal Priority: Normal Component: ethdev Assignee: dev@dpdk.org Reporter: maxime.le...@6wind.com Target Milestone: --- Environnement - distribution for host/vm: Ubuntu 22.04.2 LTS, kernel 5.15.0-67-generic kernel driver: 1.10.1.2.2 firmware-version: 4.10 0x80015191 1.3310.0 COMMS DDP: 1.3.37 ICE OS Default Package version 1.3.30.0 testpmd cmdline: ./build/app/dpdk-testpmd --log-level=.*ice.*,debug --legacy-mem -c 7 -a 17:00.0 -a :17:00.1 -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=2048 dpdk version: 23.03.0-rc4 NIC: Intel Corporation Ethernet Controller E810-C for QSFP With PF (ice pmd) - Working case, no sriov, no VM. ICE PMD is able to create the following flows: ./build/app/dpdk-testpmd --log-level=.*ice.*,debug --legacy-mem -c 7 -a 17:00.0 -a :17:00.1 -- -i --nb-cores=2 --nb-ports=2 --total-num-mbufs=2048 testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions queue index 0 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions queue index 0 / end ice_flow_create(): Succeeded to create (2) flow Flow rule #1 created With VF (iavf pmd) -- No working case sriov with a VM on the same device/hardware. sriov devices: On PF : echo 1 > "/sys/bus/pci/devices/:17:00.0/sriov_numvfs" -> for VF 17.01.0 On PF : echo 1 > "/sys/bus/pci/devices/:17:00.1/sriov_numvfs" -> for VF 17.11.0 QEMU ARGS: -device vfio-pci,host=:17:01.0,addr=04 -device vfio-pci,host=:17:11.0,addr=05 ./build/app/dpdk-testpmd --log-level=.*iavf.*,debug -c 0x6 -a :00:04.0 -a :00:05.0 -- -i --total-num-mbufs=2048 testpmd> flow create 0 ingress pattern eth / ipv4 proto is 1 / end actions queue index 0 / end iavf_handle_virtchnl_msg(): adminq response is received, opcode = 47 iavf_fdir_add(): Succeed in adding rule request by PF iavf_flow_create(): Succeeded to create (2) flow Flow rule #0 created testpmd> flow create 0 ingress pattern eth / ipv4 / udp src is 22 / end actions queue index 0 / end iavf_handle_virtchnl_msg(): adminq response is received, opcode = 47 iavf_fdir_add(): Failed to add rule request due to the rule is conflict with existing rule iavf_flow_create(): Failed to create flow port_flow_complain(): Caught PMD error type 2 (flow rule (handle)): Failed to create parser engine.: Invalid argument Conclusion -- IAVF is not able to create the second flow. Because the kernel driver 1.10.1.2.2 rejects the creation of second flow. There are no such issue with ICE pmd of dpdk 23.03.0-rc4. -- You are receiving this mail because: You are the assignee for the bug.
malloc_heap: Possible Control Block Overwrite When Insufficient Space in Elem
Hello, I seem to have discovered a problem in the heap memory allocation and deallocation operations. |--|| elem padsizenewelem In the malloc_elem_alloc function, when padsize > cache-line (such as 64 bytes) and padsize < sizeof(struct malloc_elem), the initialization of new_elem will overwrite and damage the struct malloc_elem information of elem, while setting the state of new_elem to ELEM_PAD. When releasing new_elem in malloc_elem_free, it will be converted to elem using RTE_PTR_SUB(new_elem, new_elem->pad), but at this point, the struct malloc_elem information of elem is damaged.
Re: [dpdk-dev] [PATCH] doc: deprecation notice to remove LiquidIO ethdev driver
On Thu, Mar 9, 2023 at 5:16 PM Ferruh Yigit wrote: > > On 3/9/2023 7:07 AM, jer...@marvell.com wrote: > > From: Jerin Jacob > > > > The LiquidIO product line(drivers/net/liquidio) has been substituted with > > CN9K/CN10K OCTEON product line smart NICs located in drivers/net/octeon_ep/. > > DPDK v20.08 has categorized the LiquidIO driver as UNMAINTAINED > > because of the absence of updates in the driver. > > Due to the above reasons, the driver will be unavailable from DPDK 23.07. > > > > Signed-off-by: Jerin Jacob > > --- > > doc/guides/rel_notes/deprecation.rst | 6 ++ > > 1 file changed, 6 insertions(+) > > > > diff --git a/doc/guides/rel_notes/deprecation.rst > > b/doc/guides/rel_notes/deprecation.rst > > index 872847e938..eb6c3aedd8 100644 > > --- a/doc/guides/rel_notes/deprecation.rst > > +++ b/doc/guides/rel_notes/deprecation.rst > > @@ -135,3 +135,9 @@ Deprecation Notices > >Its removal has been postponed to let potential users report interest > >in maintaining it. > >In the absence of such interest, this library will be removed in DPDK > > 23.11. > > + > > +* net/liquidio: remove LiquidIO ethdev driver. The LiquidIO product line > > has been substituted > > + with CN9K/CN10K OCTEON product line smart NICs located in > > ``drivers/net/octeon_ep/``. > > + DPDK v20.08 has categorized the LiquidIO driver as UNMAINTAINED because > > of the absence of > > + updates in the driver. Due to the above reasons, the driver will be > > unavailable from DPDK 23.07. > > + > > Acked-by: Ferruh Yigit Ping for merge.
Re: [dpdk-dev] [PATCH] doc: deprecation notice to remove net/bnx2x driver
On Tue, Mar 21, 2023 at 4:11 PM Ferruh Yigit wrote: > > On 3/17/2023 6:02 PM, Alok Prasad wrote: > >> -Original Message- > >> From: jer...@marvell.com > >> Sent: 17 March 2023 18:00 > >> To: dev@dpdk.org > >> Cc: tho...@monjalon.net; david.march...@redhat.com; ferruh.yi...@amd.com; > >> andrew.rybche...@oktetlabs.ru; Alok Prasad > >> ; Devendra Singh Rawat ; Jerin > >> Jacob Kollanukkaran > >> Subject: [dpdk-dev] [PATCH] doc: deprecation notice to remove net/bnx2x > >> driver > >> > >> From: Jerin Jacob > >> > >> Starting from DPDK 23.07, the Marvell QLogic bnx2x driver > >> will be removed. This decision has been made to alleviate the burden of > >> maintaining a discontinued product. > >> > >> Signed-off-by: Jerin Jacob > >> --- > >> doc/guides/rel_notes/deprecation.rst | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/doc/guides/rel_notes/deprecation.rst > >> b/doc/guides/rel_notes/deprecation.rst > >> index 872847e938..d3d8d0011c 100644 > >> --- a/doc/guides/rel_notes/deprecation.rst > >> +++ b/doc/guides/rel_notes/deprecation.rst > >> @@ -135,3 +135,6 @@ Deprecation Notices > >>Its removal has been postponed to let potential users report interest > >>in maintaining it. > >>In the absence of such interest, this library will be removed in DPDK > >> 23.11. > >> + > >> +* net/bnx2x: Starting from DPDK 23.07, the Marvell QLogic bnx2x driver > >> will be removed. > >> + This decision has been made to alleviate the burden of maintaining a > >> discontinued product. > >> -- > >> 2.40.0 > > > > Thanks Jerin! > > > > Acked-by: Alok Prasad > > > Acked-by: Ferruh Yigit Ping for merge.
Re: [dpdk-web] [RFC PATCH] process: new library approval in principle
On Wed, Mar 15, 2023 at 7:17 PM Jerin Jacob wrote: > > On Fri, Mar 3, 2023 at 11:55 PM Thomas Monjalon wrote: > > > > Thanks for formalizing our process. > > Thanks for the review. Ping > > > > > 13/02/2023 10:26, jer...@marvell.com: > > > --- /dev/null > > > +++ b/content/process/_index.md > > > > First question: is the website the best place for this process? > > > > Inside the code guides, we have a contributing section, > > but I'm not sure it is a good fit for the decision process. > > > > In the website, you are creating a new page "process". > > Is it what we want? > > What about making it a sub-page of "Technical Board"? > > Since it is a process, I thought of keeping "process" page. > No specific opinion on where to add it. > If not other objections, Then I can add at > doc/guides/contributing/new_library_policy.rst in DPDK repo. > Let me know if you think better name or better place to keep the file > > > > > > @@ -0,0 +1,33 @@ > > > > > > +title = "Process" > > > +weight = "9" > > > > > > + > > > +## Process for new library approval in principle > > > + > > > +### Rational > > > > s/Rational/Rationale/ > > Ack > > > > > > + > > > +Adding a new library to DPDK codebase with proper RFC and then full > > > patch-sets is > > > +significant work and getting early approval-in-principle that a library > > > help DPDK contributors > > > +avoid wasted effort if it is not suitable for various reasons. > > > > That's a long sentence we could split. > > OK Changing as: > > Adding a new library to DPDK codebase with proper RFC and full > patch-sets is significant work. > > Getting early approval-in-principle that a library can help DPDK > contributors avoid wasted effort > if it is not suitable for various reasons > > > > > > > + > > > +### Process > > > + > > > +1. When a contributor would like to add a new library to DPDK code base, > > > the contributor must send > > > +the following items to DPDK mailing list for TB approval-in-principle. > > > > I think we can remove "code base". > > Ack > > > > > TB should be explained: Technical Board. > > Ack > > > > > > + > > > + - Purpose of the library. > > > + - Scope of the library. > > > > Not sure I understand the difference between Purpose and Scope. > > Purpose → The need for the library > Scope → I meant the work scope associated with it. > > I will change "Scope of the library" to, > > - Scope of work: Outline the various additional tasks planned for this > library, such as developing new test applications, adding new drivers, > and updating existing applications. > > > > > > + - Any licensing constraints. > > > + - Justification for adding to DPDK. > > > + - Any other implementations of the same functionality in other > > > libs/products and how this version differs. > > > > libs/products -> libraries/projects > > Ack > > > > > > + - Public API specification header file as RFC > > > + - Optional and good to have. > > > > You mean providing API is optional at this stage? > > Yes. I think, TB can request if more clarity is needed as mentioned below. > "TB may additionally request this collateral if needed to get more > clarity on scope and purpose" > > > > > > + - TB may additionally request this collateral if needed to get > > > more clarity on scope and purpose. > > > + > > > +2. TB to schedule discussion on this in upcoming TB meeting along with > > > author. Based on the TB > > > +schedule and/or author availability, TB may need maximum three TB > > > meeting slots. > > > > Better to translate the delay into weeks: 5 weeks? > > Ack > > > > > > + > > > +3. Based on mailing list and TB meeting discussions, TB to vote for > > > approval-in-principle and share > > > +the decision in the mailing list. > > > > I think we should say here that it is safe to start working > > on the implementation after this step, > > but the patches will need to match usual quality criterias > > to be effectively accepted. > > OK. > > I will add the following, > > 4. Once TB approves the library in principle, it is safe to start > working on its implementation. > However, the patches will need to meet the usual quality criteria in > order to be effectively accepted. > > > > > >
Should we try to be more graceful in library init on old Hardware?
Hi, I've recently gotten a kind of bug I was waiting for many years. In fact I wondered if it would still come up as each year made it less likely. But it happened and I got a crash report of someone using dpdk a rather old pre sse4.2 hardware. => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 The reporter was nice and tried the newer 22.11, but that is just as affected. I understand that DPDK, as a project, has set this as the minimal accepted hardware capability. But due to some programs - in this case UHD - being able to do many other things it might happen that UHD or any else just links to DPDK (as it could be used with it) and due to that runs into a crash when loading. In theory other tools like collectd which has dpdk support would be affected by the same. Example: root@1bee22d20ca0:/# uhd_usrp_probe Illegal instruction (core dumped) (gdb) bt #0 0x7f4b2d3a3374 in rte_srand () from /lib/x86_64-linux-gnu/librte_eal.so.23 #1 0x7f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 #2 0x7f4b2e5d1fbe in call_init (l=, argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, env=env@entry=0x7ffeabf5b498) at ./elf/dl-init.c:70 #3 0x7f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, argv=0x7ffeabf5b488, argc=1, l=) at ./elf/dl-init.c:33 #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, env=0x7ffeabf5b498) at ./elf/dl-init.c:117 #5 0x7f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 #6 0x0001 in ?? () #7 0x7ffeabf5c844 in ?? () #8 0x in ?? () Right now all we could do is: a) say bad luck old hardware (not nice) b) make super complex alternative builds with and without dpdk support c) ask the DPDK project to work on non sse4.2 (unlikely and too late in 2023 I guess) d) Somehow make the initialization graceful (that is what I'm RFC here) If we could manage to get that DPDK to ensure the lib loading paths are SSE4.2 free. Then we could check the capabilities on the actual initialization and return a proper bad result instead of a crash. Due to that only real-users of DPDK would be required to have sufficiently new hardware. And OTOH users of software that links, but in the current config would not use DPDK would suffer less. WDYT? Maybe it has been already discussed and I did neither remember nor find it? -- Christian Ehrhardt Senior Staff Engineer, Ubuntu Server Canonical Ltd
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson wrote: > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > > Hi, > > > > FYI, when replying on list, it's best not to top-post, but put your replies > below the email snippet you are replying to. > > > The hash creation API throws the following error -- > > RING: Cannot reserve memory for tailq > > HASH: memory allocation failed > > > > The timer subsystem init api throws this error -- > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > > memzone segments exceeds RTE_MAX_MEMZONE > > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's rte_config.h > file, so edit that and then rebuild DPDK. [If you are using the built-in > DPDK from VPP, you may need to do a patch for this, add it into the VPP > patches direction and then do a VPP rebuild.] > > Let's see if we can get rid of at least one of the error messages. :-) > > /Bruce > > > I did check the code and apparently the memzone and rte zmalloc > > related api's are not being able to allocate memory. > > > > Regards > > -Prashant > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > > wrote: > > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > > > Hi, > > > > > > > > While trying to port some code to VPP (which uses DPDK as the backend > > > > driver), I am running into a problem that calls to API's like > > > > rte_timer_subsystem_init, rte_hash_create are failing while allocation > > > > of memory. > > > > > > > > This is presumably because VPP inits the EAL with the following > > > > arguments -- > > > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > > > Is there is something that can be done eg. passing some more parms in > > > > the EAL initialization which hopefully wouldn't break VPP but will > > > > also be friendly to the RTE timer and hash functions too, that would > > > > be great, so requesting some advice here. > > > > > > > Hi, > > > > > > can you provide some more details on what the errors are that you are > > > receiving? Have you been able to dig a little deeper into what might be > > > causing the memory failures? The above flags alone are unlikely to cause > > > issues with hash or timer libraries, for example. > > > > > > /Bruce Thanks Bruce, the error comes from the following function in lib/eal/common/eal_common_memzone.c memzone_reserve_aligned_thread_unsafe The condition which spits out the error is the following if (arr->count >= arr->len) So I printed both of the above values inside this function, and the following output came vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry --file-prefix vpp [New Thread 0x7fffa67b6700 (LWP 14732)] count: 0 len: 2560 count: 1 len: 2560 count: 2 len: 2560 [New Thread 0x7fffa5fb5700 (LWP 14733)] [New Thread 0x7fffa5db4700 (LWP 14734)] count: 3 len: 2560 count: 4 len: 2560 ### this is the place where I call rte_timer_subsystem_init from my code, the above must be coming from any other code from VPP/EAL init, the line below is surely because of my call to rte_timer_subsystem_init count: 0 len: 0 So as you can see that both values are coming to be zero -- is this expected ? I thought the arr->len should have been non zero. I must add that the thread which is calling the rte_timer_subsystem_init is possibly different than the one which did the eal init, do you think that might be a problem... I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share the above first for any suggestions. Regards -Prashant
Re: [PATCH 2/2] net/gve: update copyright holders
30/03/2023 09:20, Guo, Junfeng: > From: Thomas Monjalon > > 28/03/2023 11:35, Guo, Junfeng: > > > The background is that, in the past (DPDK 22.11) we didn't get the > > approval > > > of license from Google, thus chose the MIT License for the base code, > > and > > > BSD-3 License for GVE common code (without the files in /base folder). > > > We also left the copyright holder of base code just to Google Inc, and > > made > > > Intel as the copyright holder of GVE common code (without /base > > folder). > > > > > > Today we are working together for GVE dev and maintaining. And we > > got > > > the approval of BSD-3 License from Google for the base code. > > > Thus we dicided to 1) switch the License of GVE base code from MIT to > > BSD-3; > > > 2) add Google LLC as one of the copyright holders for GVE common > > code. > > > > Do you realize we had lenghty discussions in the Technical Board, > > the Governing Board, and with lawyers, just for this unneeded exception? > > > > Now looking at the patches, there seem to be some big mistakes like > > removing some copyright. I don't understand how it can be taken so > > lightly. > > > > I regret how fast we were, next time we will surely operate differently. > > If you want to improve the reputation of this driver, > > please ask other copyright holders to be more active and responsive. > > > > Really sorry for causing such severe trouble. > > Yes, we did take lots of efforts in the Technical Board and the Governing > Board about this MIT exception. We really appreciate that. > > About this patch set, it is my severe mistake to switch the MIT License > directly for the upstream-ed code in community, in the wrong way. > In the past we upstream-ed this driver with MIT License followed from > the kernel community's gve driver base code. And now we want to > use the code with BSD-3 License (approved by Google). > So I suppose that the correct way may be 1) first remove all these code > under MIT License and 2) then add the new files under BSD-3 License. The code under BSD is different of the MIT code? If it is the same with a new approved license, you can just change the license. > Please correct me if there are still misunderstanding in my statement. > Thanks Thomas for pointing out my mistake. I'll be careful to fix this. > > Copyright holder for the gve base code will stay unchanged. Google LLC > will be added as one of the copyright holders for the gve common code. > @Rushil Gupta Please also be more active and responsive for the code > review and contribution in the community. Thanks!
Re: Should we try to be more graceful in library init on old Hardware?
On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > Hi, > I've recently gotten a kind of bug I was waiting for many years. > In fact I wondered if it would still come up as each year made it less > likely. > But it happened and I got a crash report of someone using dpdk a > rather old pre sse4.2 hardware. > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > The reporter was nice and tried the newer 22.11, but that is just as affected. > > I understand that DPDK, as a project, has set this as the minimal > accepted hardware capability. > But due to some programs - in this case UHD - being able to do many > other things it might happen that UHD or any else just links to DPDK > (as it could be used with it) and due to that runs into a crash when > loading. In theory other tools like collectd which has dpdk support > would be affected by the same. > > Example: > root@1bee22d20ca0:/# uhd_usrp_probe > Illegal instruction (core dumped) > > (gdb) bt > #0 0x7f4b2d3a3374 in rte_srand () from > /lib/x86_64-linux-gnu/librte_eal.so.23 > #1 0x7f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > #2 0x7f4b2e5d1fbe in call_init (l=, > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > env=env@entry=0x7ffeabf5b498) > at ./elf/dl-init.c:70 > #3 0x7f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > argv=0x7ffeabf5b488, argc=1, l=) at ./elf/dl-init.c:33 > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > #5 0x7f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 > #6 0x0001 in ?? () > #7 0x7ffeabf5c844 in ?? () > #8 0x in ?? () > > Right now all we could do is: > a) say bad luck old hardware (not nice) > b) make super complex alternative builds with and without dpdk support > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > in 2023 I guess) > d) Somehow make the initialization graceful (that is what I'm RFC here) > > If we could manage to get that DPDK to ensure the lib loading paths > are SSE4.2 free. > Then we could check the capabilities on the actual initialization and > return a proper bad result instead of a crash. > Due to that only real-users of DPDK would be required to have > sufficiently new hardware. > And OTOH users of software that links, but in the current config would > not use DPDK would suffer less. > > WDYT? > Maybe it has been already discussed and I did neither remember nor find it? > It certainly hasn't been discussed previously, but there is meant to be support for this in EAL init itself. Almost the first function called from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time CPU flags against those of the current system. Unfortunately, from the error message you are getting, that doesn't seem to be working ok in the case of SSE4.2. It seems the compiler is inserting SSE4 instructions before we even get to that point. :-( Perhaps we need to move eal init to a new file, and compile it (and the cpuflag checks) with very minimal CPU flags. /Bruce [1] http://git.dpdk.org/dpdk/tree/lib/eal/common/eal_common_cpuflags.c
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 06:42:58PM +0530, Prashant Upadhyaya wrote: > On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson > wrote: > > > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > > > Hi, > > > > > > > FYI, when replying on list, it's best not to top-post, but put your replies > > below the email snippet you are replying to. > > > > > The hash creation API throws the following error -- > > > RING: Cannot reserve memory for tailq > > > HASH: memory allocation failed > > > > > > The timer subsystem init api throws this error -- > > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > > > memzone segments exceeds RTE_MAX_MEMZONE > > > > > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's rte_config.h > > file, so edit that and then rebuild DPDK. [If you are using the built-in > > DPDK from VPP, you may need to do a patch for this, add it into the VPP > > patches direction and then do a VPP rebuild.] > > > > Let's see if we can get rid of at least one of the error messages. :-) > > > > /Bruce > > > > > I did check the code and apparently the memzone and rte zmalloc > > > related api's are not being able to allocate memory. > > > > > > Regards > > > -Prashant > > > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > > > wrote: > > > > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > > > > Hi, > > > > > > > > > > While trying to port some code to VPP (which uses DPDK as the backend > > > > > driver), I am running into a problem that calls to API's like > > > > > rte_timer_subsystem_init, rte_hash_create are failing while allocation > > > > > of memory. > > > > > > > > > > This is presumably because VPP inits the EAL with the following > > > > > arguments -- > > > > > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > > > > > Is there is something that can be done eg. passing some more parms in > > > > > the EAL initialization which hopefully wouldn't break VPP but will > > > > > also be friendly to the RTE timer and hash functions too, that would > > > > > be great, so requesting some advice here. > > > > > > > > > Hi, > > > > > > > > can you provide some more details on what the errors are that you are > > > > receiving? Have you been able to dig a little deeper into what might be > > > > causing the memory failures? The above flags alone are unlikely to cause > > > > issues with hash or timer libraries, for example. > > > > > > > > /Bruce > > Thanks Bruce, the error comes from the following function in > lib/eal/common/eal_common_memzone.c > memzone_reserve_aligned_thread_unsafe > > The condition which spits out the error is the following > if (arr->count >= arr->len) > So I printed both of the above values inside this function, and the > following output came > > vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry --file-prefix vpp > [New Thread 0x7fffa67b6700 (LWP 14732)] > count: 0 len: 2560 > count: 1 len: 2560 > count: 2 len: 2560 > [New Thread 0x7fffa5fb5700 (LWP 14733)] > [New Thread 0x7fffa5db4700 (LWP 14734)] > count: 3 len: 2560 > count: 4 len: 2560 > ### this is the place where I call rte_timer_subsystem_init from my > code, the above must be coming from any other code from VPP/EAL init, > the line below is surely because of my call to > rte_timer_subsystem_init > count: 0 len: 0 > > So as you can see that both values are coming to be zero -- is this > expected ? I thought the arr->len should have been non zero. > I must add that the thread which is calling the > rte_timer_subsystem_init is possibly different than the one which did > the eal init, do you think that might be a problem... > I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share > the above first for any suggestions. > Given the lengths you printed above, increasing the MAX_MEMZONE will not help things. Is the init call which is failing coming from a non-DPDK thread?
Re: Should we try to be more graceful in library init on old Hardware?
On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote: > On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > > Hi, > > I've recently gotten a kind of bug I was waiting for many years. > > In fact I wondered if it would still come up as each year made it less > > likely. > > But it happened and I got a crash report of someone using dpdk a > > rather old pre sse4.2 hardware. > > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > > > The reporter was nice and tried the newer 22.11, but that is just as > > affected. > > > > I understand that DPDK, as a project, has set this as the minimal > > accepted hardware capability. > > But due to some programs - in this case UHD - being able to do many > > other things it might happen that UHD or any else just links to DPDK > > (as it could be used with it) and due to that runs into a crash when > > loading. In theory other tools like collectd which has dpdk support > > would be affected by the same. > > > > Example: > > root@1bee22d20ca0:/# uhd_usrp_probe > > Illegal instruction (core dumped) > > > > (gdb) bt > > #0 0x7f4b2d3a3374 in rte_srand () from > > /lib/x86_64-linux-gnu/librte_eal.so.23 > > #1 0x7f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > > #2 0x7f4b2e5d1fbe in call_init (l=, > > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > > env=env@entry=0x7ffeabf5b498) > > at ./elf/dl-init.c:70 > > #3 0x7f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > > argv=0x7ffeabf5b488, argc=1, l=) at ./elf/dl-init.c:33 > > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > > #5 0x7f4b2e5ea8b0 in _dl_start_user () from /lib64/ld-linux-x86-64.so.2 > > #6 0x0001 in ?? () > > #7 0x7ffeabf5c844 in ?? () > > #8 0x in ?? () > > > > Right now all we could do is: > > a) say bad luck old hardware (not nice) > > b) make super complex alternative builds with and without dpdk support > > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > > in 2023 I guess) > > d) Somehow make the initialization graceful (that is what I'm RFC here) > > > > If we could manage to get that DPDK to ensure the lib loading paths > > are SSE4.2 free. > > Then we could check the capabilities on the actual initialization and > > return a proper bad result instead of a crash. > > Due to that only real-users of DPDK would be required to have > > sufficiently new hardware. > > And OTOH users of software that links, but in the current config would > > not use DPDK would suffer less. > > > > WDYT? > > Maybe it has been already discussed and I did neither remember nor find it? > > > It certainly hasn't been discussed previously, but there is meant to be > support for this in EAL init itself. Almost the first function called > from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time > CPU flags against those of the current system. > Unfortunately, from the error message you are getting, that doesn't seem to > be working ok in the case of SSE4.2. It seems the compiler is inserting > SSE4 instructions before we even get to that point. :-( > > Perhaps we need to move eal init to a new file, and compile it (and the > cpuflag checks) with very minimal CPU flags. > Following up to my own mail... I believe we may be able to solve this easier by maybe using the "target" attribute for those functions. For x86 builds I don't see why eal init cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's not a performance-sensitive function. Thoughts? /Bruce
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 6:47 PM Bruce Richardson wrote: > > On Thu, Mar 30, 2023 at 06:42:58PM +0530, Prashant Upadhyaya wrote: > > On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson > > wrote: > > > > > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > > > > Hi, > > > > > > > > > > FYI, when replying on list, it's best not to top-post, but put your > > > replies > > > below the email snippet you are replying to. > > > > > > > The hash creation API throws the following error -- > > > > RING: Cannot reserve memory for tailq > > > > HASH: memory allocation failed > > > > > > > > The timer subsystem init api throws this error -- > > > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > > > > memzone segments exceeds RTE_MAX_MEMZONE > > > > > > > > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's rte_config.h > > > file, so edit that and then rebuild DPDK. [If you are using the built-in > > > DPDK from VPP, you may need to do a patch for this, add it into the VPP > > > patches direction and then do a VPP rebuild.] > > > > > > Let's see if we can get rid of at least one of the error messages. :-) > > > > > > /Bruce > > > > > > > I did check the code and apparently the memzone and rte zmalloc > > > > related api's are not being able to allocate memory. > > > > > > > > Regards > > > > -Prashant > > > > > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > > > > wrote: > > > > > > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > > > > > Hi, > > > > > > > > > > > > While trying to port some code to VPP (which uses DPDK as the > > > > > > backend > > > > > > driver), I am running into a problem that calls to API's like > > > > > > rte_timer_subsystem_init, rte_hash_create are failing while > > > > > > allocation > > > > > > of memory. > > > > > > > > > > > > This is presumably because VPP inits the EAL with the following > > > > > > arguments -- > > > > > > > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > > > > > > > Is there is something that can be done eg. passing some more parms > > > > > > in > > > > > > the EAL initialization which hopefully wouldn't break VPP but will > > > > > > also be friendly to the RTE timer and hash functions too, that would > > > > > > be great, so requesting some advice here. > > > > > > > > > > > Hi, > > > > > > > > > > can you provide some more details on what the errors are that you are > > > > > receiving? Have you been able to dig a little deeper into what might > > > > > be > > > > > causing the memory failures? The above flags alone are unlikely to > > > > > cause > > > > > issues with hash or timer libraries, for example. > > > > > > > > > > /Bruce > > > > Thanks Bruce, the error comes from the following function in > > lib/eal/common/eal_common_memzone.c > > memzone_reserve_aligned_thread_unsafe > > > > The condition which spits out the error is the following > > if (arr->count >= arr->len) > > So I printed both of the above values inside this function, and the > > following output came > > > > vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry --file-prefix > > vpp > > [New Thread 0x7fffa67b6700 (LWP 14732)] > > count: 0 len: 2560 > > count: 1 len: 2560 > > count: 2 len: 2560 > > [New Thread 0x7fffa5fb5700 (LWP 14733)] > > [New Thread 0x7fffa5db4700 (LWP 14734)] > > count: 3 len: 2560 > > count: 4 len: 2560 > > ### this is the place where I call rte_timer_subsystem_init from my > > code, the above must be coming from any other code from VPP/EAL init, > > the line below is surely because of my call to > > rte_timer_subsystem_init > > count: 0 len: 0 > > > > So as you can see that both values are coming to be zero -- is this > > expected ? I thought the arr->len should have been non zero. > > I must add that the thread which is calling the > > rte_timer_subsystem_init is possibly different than the one which did > > the eal init, do you think that might be a problem... > > I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share > > the above first for any suggestions. > > > Given the lengths you printed above, increasing the MAX_MEMZONE will not > help things. Is the init call which is failing coming from a non-DPDK > thread? Likely yes, at the moment I am calling it from a CLI which I have added in VPP. Assuming this is the case, do you foresee a problem ?
Re: Regarding DPDK API's like rte_timer_subsystem_init/rte_hash_create etc. in VPP
On Thu, Mar 30, 2023 at 07:07:23PM +0530, Prashant Upadhyaya wrote: > On Thu, Mar 30, 2023 at 6:47 PM Bruce Richardson > wrote: > > > > On Thu, Mar 30, 2023 at 06:42:58PM +0530, Prashant Upadhyaya wrote: > > > On Thu, Mar 30, 2023 at 2:50 PM Bruce Richardson > > > wrote: > > > > > > > > On Thu, Mar 30, 2023 at 01:57:52PM +0530, Prashant Upadhyaya wrote: > > > > > Hi, > > > > > > > > > > > > > FYI, when replying on list, it's best not to top-post, but put your > > > > replies > > > > below the email snippet you are replying to. > > > > > > > > > The hash creation API throws the following error -- > > > > > RING: Cannot reserve memory for tailq > > > > > HASH: memory allocation failed > > > > > > > > > > The timer subsystem init api throws this error -- > > > > > EAL: memzone_reserve_aligned_thread_unsafe(): Number of requested > > > > > memzone segments exceeds RTE_MAX_MEMZONE > > > > > > > > > > > > > Can you try increasing RTE_MAX_MEMZONE. It' defined in DPDK's > > > > rte_config.h > > > > file, so edit that and then rebuild DPDK. [If you are using the built-in > > > > DPDK from VPP, you may need to do a patch for this, add it into the VPP > > > > patches direction and then do a VPP rebuild.] > > > > > > > > Let's see if we can get rid of at least one of the error messages. :-) > > > > > > > > /Bruce > > > > > > > > > I did check the code and apparently the memzone and rte zmalloc > > > > > related api's are not being able to allocate memory. > > > > > > > > > > Regards > > > > > -Prashant > > > > > > > > > > On Thu, Mar 30, 2023 at 1:30 PM Bruce Richardson > > > > > wrote: > > > > > > > > > > > > On Thu, Mar 30, 2023 at 10:30:24AM +0530, Prashant Upadhyaya wrote: > > > > > > > Hi, > > > > > > > > > > > > > > While trying to port some code to VPP (which uses DPDK as the > > > > > > > backend > > > > > > > driver), I am running into a problem that calls to API's like > > > > > > > rte_timer_subsystem_init, rte_hash_create are failing while > > > > > > > allocation > > > > > > > of memory. > > > > > > > > > > > > > > This is presumably because VPP inits the EAL with the following > > > > > > > arguments -- > > > > > > > > > > > > > > -in-memory --no-telemetry --file-prefix vpp > > > > > > > > > > > > > > Is there is something that can be done eg. passing some more > > > > > > > parms in > > > > > > > the EAL initialization which hopefully wouldn't break VPP but will > > > > > > > also be friendly to the RTE timer and hash functions too, that > > > > > > > would > > > > > > > be great, so requesting some advice here. > > > > > > > > > > > > > Hi, > > > > > > > > > > > > can you provide some more details on what the errors are that you > > > > > > are > > > > > > receiving? Have you been able to dig a little deeper into what > > > > > > might be > > > > > > causing the memory failures? The above flags alone are unlikely to > > > > > > cause > > > > > > issues with hash or timer libraries, for example. > > > > > > > > > > > > /Bruce > > > > > > Thanks Bruce, the error comes from the following function in > > > lib/eal/common/eal_common_memzone.c > > > memzone_reserve_aligned_thread_unsafe > > > > > > The condition which spits out the error is the following > > > if (arr->count >= arr->len) > > > So I printed both of the above values inside this function, and the > > > following output came > > > > > > vpp[14728]: dpdk: EAL init args: --in-memory --no-telemetry --file-prefix > > > vpp > > > [New Thread 0x7fffa67b6700 (LWP 14732)] > > > count: 0 len: 2560 > > > count: 1 len: 2560 > > > count: 2 len: 2560 > > > [New Thread 0x7fffa5fb5700 (LWP 14733)] > > > [New Thread 0x7fffa5db4700 (LWP 14734)] > > > count: 3 len: 2560 > > > count: 4 len: 2560 > > > ### this is the place where I call rte_timer_subsystem_init from my > > > code, the above must be coming from any other code from VPP/EAL init, > > > the line below is surely because of my call to > > > rte_timer_subsystem_init > > > count: 0 len: 0 > > > > > > So as you can see that both values are coming to be zero -- is this > > > expected ? I thought the arr->len should have been non zero. > > > I must add that the thread which is calling the > > > rte_timer_subsystem_init is possibly different than the one which did > > > the eal init, do you think that might be a problem... > > > I am yet to increase the value of RTE_MAX_MEMZONE, but wanted to share > > > the above first for any suggestions. > > > > > Given the lengths you printed above, increasing the MAX_MEMZONE will not > > help things. Is the init call which is failing coming from a non-DPDK > > thread? > > Likely yes, at the moment I am calling it from a CLI which I have added in > VPP. > Assuming this is the case, do you foresee a problem ? Could well be a possible cause, yes. With non-DPDK threads, the memory NUMA node/socket-id entries could be invalid, and cause the DPDK memory allocation to look for memory heaps on non-existent NUMA nodes. Can you try using rte_thread_register API in
Re: Should we try to be more graceful in library init on old Hardware?
2023-03-30 14:28 (UTC+0100), Bruce Richardson: > On Thu, Mar 30, 2023 at 02:15:42PM +0100, Bruce Richardson wrote: > > On Thu, Mar 30, 2023 at 02:53:41PM +0200, Christian Ehrhardt wrote: > > > Hi, > > > I've recently gotten a kind of bug I was waiting for many years. > > > In fact I wondered if it would still come up as each year made it less > > > likely. > > > But it happened and I got a crash report of someone using dpdk a > > > rather old pre sse4.2 hardware. > > > => https://bugs.launchpad.net/ubuntu/+source/dpdk/+bug/2009635/comments/9 > > > > > > > > > The reporter was nice and tried the newer 22.11, but that is just as > > > affected. > > > > > > I understand that DPDK, as a project, has set this as the minimal > > > accepted hardware capability. > > > But due to some programs - in this case UHD - being able to do many > > > other things it might happen that UHD or any else just links to DPDK > > > (as it could be used with it) and due to that runs into a crash when > > > loading. In theory other tools like collectd which has dpdk support > > > would be affected by the same. > > > > > > Example: > > > root@1bee22d20ca0:/# uhd_usrp_probe > > > Illegal instruction (core dumped) > > > > > > (gdb) bt > > > #0 0x7f4b2d3a3374 in rte_srand () from > > > /lib/x86_64-linux-gnu/librte_eal.so.23 > > > #1 0x7f4b2d3967ec in ?? () from /lib/x86_64-linux-gnu/librte_eal.so.23 > > > #2 0x7f4b2e5d1fbe in call_init (l=, > > > argc=argc@entry=1, argv=argv@entry=0x7ffeabf5b488, > > > env=env@entry=0x7ffeabf5b498) > > > at ./elf/dl-init.c:70 > > > #3 0x7f4b2e5d20a8 in call_init (env=0x7ffeabf5b498, > > > argv=0x7ffeabf5b488, argc=1, l=) at ./elf/dl-init.c:33 > > > #4 _dl_init (main_map=0x7f4b2e6042e0, argc=1, argv=0x7ffeabf5b488, > > > env=0x7ffeabf5b498) at ./elf/dl-init.c:117 > > > #5 0x7f4b2e5ea8b0 in _dl_start_user () from > > > /lib64/ld-linux-x86-64.so.2 > > > #6 0x0001 in ?? () > > > #7 0x7ffeabf5c844 in ?? () > > > #8 0x in ?? () > > > > > > Right now all we could do is: > > > a) say bad luck old hardware (not nice) > > > b) make super complex alternative builds with and without dpdk support > > > c) ask the DPDK project to work on non sse4.2 (unlikely and too late > > > in 2023 I guess) > > > d) Somehow make the initialization graceful (that is what I'm RFC here) > > > > > > If we could manage to get that DPDK to ensure the lib loading paths > > > are SSE4.2 free. > > > Then we could check the capabilities on the actual initialization and > > > return a proper bad result instead of a crash. > > > Due to that only real-users of DPDK would be required to have > > > sufficiently new hardware. > > > And OTOH users of software that links, but in the current config would > > > not use DPDK would suffer less. > > > > > > WDYT? > > > Maybe it has been already discussed and I did neither remember nor find > > > it? > > > > > It certainly hasn't been discussed previously, but there is meant to be > > support for this in EAL init itself. Almost the first function called > > from eal_init() is "rte_cpu_is_supported()" [1] which checks the build-time > > CPU flags against those of the current system. > > Unfortunately, from the error message you are getting, that doesn't seem to > > be working ok in the case of SSE4.2. It seems the compiler is inserting > > SSE4 instructions before we even get to that point. :-( > > > > Perhaps we need to move eal init to a new file, and compile it (and the > > cpuflag checks) with very minimal CPU flags. > > > > Following up to my own mail... > > I believe we may be able to solve this easier by maybe using the "target" > attribute for those functions. For x86 builds I don't see why eal init > cannot be compiled for an earlier SSE version, (march=core2, perhaps). It's > not a performance-sensitive function. > > Thoughts? > /Bruce The error originates from some RTE_INIT() routine called on library load. They can also be augmented with the "target" attribute and a check before calling the actual code supplied by DPDK developer. The latter is needed because we can't ensure (systematically) that this code doesn't call some external function that uses SSE4.2. As for rte_eal_init(), I think the check there is enough with one big "if": main() must also be compiled for the generic CPU to get there. So app developers can't be completely freed from thinking about this. BTW, rte_cpu_is_supported() itself is not protected against being compiled into unsupported instructions :)
RE: [PATCH v5 1/3] ethdev: add API for buffer recycle mode
> From: Feifei Wang [mailto:feifei.wa...@arm.com] > Sent: Thursday, 30 March 2023 11.31 > > > From: Morten Brørup > > Sent: Thursday, March 30, 2023 3:19 PM > > > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > > Sent: Thursday, 30 March 2023 08.30 > > > > > > > [...] > > > > > +/** > > > + * @internal > > > + * Rx routine for rte_eth_dev_buf_recycle(). > > > + * Refill Rx descriptors in buffer recycle mode. > > > + * > > > + * @note > > > + * This API can only be called by rte_eth_dev_buf_recycle(). > > > + * Before calling this API, rte_eth_tx_buf_stash() should be > > > + * called to stash Tx used buffers into Rx buffer ring. > > > + * > > > + * When this functionality is not implemented in the driver, the > > > +return > > > + * buffer number is 0. > > > + * > > > + * @param port_id > > > + * The port identifier of the Ethernet device. > > > + * @param queue_id > > > + * The index of the receive queue. > > > + * The value must be in the range [0, nb_rx_queue - 1] previously > > supplied > > > + * to rte_eth_dev_configure(). > > > + *@param nb > > > + * The number of Rx descriptors to be refilled. > > > + * @return > > > + * The number Rx descriptors correct to be refilled. > > > + * - ENODEV: bad port or queue (only if compiled with debug). > > > > If you want errors reported by the return value, the function return type > > cannot be uint16_t. > Agree. Actually, in the code path, if errors happen, the function will return > 0. > For this description line, I refer to 'rte_eth_tx_prepare' notes. Maybe we > should delete > this line. > > > > > > + */ > > > +static inline uint16_t rte_eth_rx_descriptors_refill(uint16_t port_id, > > > + uint16_t queue_id, uint16_t nb) > > > +{ > > > + struct rte_eth_fp_ops *p; > > > + void *qd; > > > + > > > +#ifdef RTE_ETHDEV_DEBUG_RX > > > + if (port_id >= RTE_MAX_ETHPORTS || > > > + queue_id >= RTE_MAX_QUEUES_PER_PORT) { > > > + RTE_ETHDEV_LOG(ERR, > > > + "Invalid port_id=%u or queue_id=%u\n", > > > + port_id, queue_id); > > > + rte_errno = ENODEV; > > > + return 0; > > > > If p->rx_descriptors_refill() is likely to return 0, this function should > not use 0 > > as return value to indicate errors. > However, refer to dpdk code style in ethdev, most of API write like this. > For example, 'rte_eth_rx/tx_burst', 'rte_eth_tx_prep'. > > I'm also confused what's return type for this due to I want > to indicate errors and show the processed buffer number. OK. Thanks for the references. Looking at rte_eth_rx/tx_burst(), you could follow the same conventions here, i.e.: - Use uint16_t as return type. - Return 0 on error. - Do not set rte_errno. - Remove the "ENODEV" line from the @return description. - Use RTE_ETHDEV_LOG(ERR,...) as the only method to indicate errors. I now see that you follow the convention of rte_eth_tx_prepare(). This is also perfectly fine; then you just need to update the description of @return to mention that the error value is set in rte_errno if a value less than 'nb' is returned. > > > > > > + } > > > +#endif > > > + > > > + p = &rte_eth_fp_ops[port_id]; > > > + qd = p->rxq.data[queue_id]; > > > + > > > +#ifdef RTE_ETHDEV_DEBUG_RX > > > + if (!rte_eth_dev_is_valid_port(port_id)) { > > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx port_id=%u\n", port_id); > > > + rte_errno = ENODEV; > > > + return 0; > > > + > > > + if (qd == NULL) { > > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for > > port_id=%u\n", > > > + queue_id, port_id); > > > + rte_errno = ENODEV; > > > + return 0; > > > + } > > > +#endif > > > + > > > + if (p->rx_descriptors_refill == NULL) > > > + return 0; > > > + > > > + return p->rx_descriptors_refill(qd, nb); } When does p->rx_descriptors_refill() return anything else than 'nb'? If p->rx_descriptors_refill() always succeeds (and thus always returns 'nb'), you could make its return type void. And thus, you could also make the return type of rte_eth_rx_descriptors_refill() void. > > > + > > > /**@{@name Rx hardware descriptor states > > > * @see rte_eth_rx_descriptor_status > > > */ > > > @@ -6483,6 +6597,122 @@ rte_eth_tx_buffer(uint16_t port_id, uint16_t > > queue_id, > > > return rte_eth_tx_buffer_flush(port_id, queue_id, buffer); } > > > > > > +/** > > > + * @internal > > > + * Tx routine for rte_eth_dev_buf_recycle(). > > > + * Stash Tx used buffers into Rx buffer ring in buffer recycle mode. > > > + * > > > + * @note > > > + * This API can only be called by rte_eth_dev_buf_recycle(). > > > + * After calling this API, rte_eth_rx_descriptors_refill() should be > > > + * called to refill Rx ring descriptors. > > > + * > > > + * When this functionality is not implemented in the driver, the > > > +return > > > + * buffer number is 0. > > > + * > > > + * @param port_id > > > + * The port identifier of the Ethernet device. > > > + * @param queue_i
Re: [PATCH 2/2] net/gve: update copyright holders
We were just trying to comply with the BSD license to get rid of the exception. You have the MIT license for control path/admin-queue code. Since admin-queue path is similar across linux, freebsd and dpdk the code is similar but not exactly the same, We are about to upstream driver code to FreeBSD under BSD license as well so you will see this code under BSD license soon. I will consult the lawyers on my end as well. On Thu, Mar 30, 2023 at 6:14 AM Thomas Monjalon wrote: > 30/03/2023 09:20, Guo, Junfeng: > > From: Thomas Monjalon > > > 28/03/2023 11:35, Guo, Junfeng: > > > > The background is that, in the past (DPDK 22.11) we didn't get the > > > approval > > > > of license from Google, thus chose the MIT License for the base code, > > > and > > > > BSD-3 License for GVE common code (without the files in /base > folder). > > > > We also left the copyright holder of base code just to Google Inc, > and > > > made > > > > Intel as the copyright holder of GVE common code (without /base > > > folder). > > > > > > > > Today we are working together for GVE dev and maintaining. And we > > > got > > > > the approval of BSD-3 License from Google for the base code. > > > > Thus we dicided to 1) switch the License of GVE base code from MIT to > > > BSD-3; > > > > 2) add Google LLC as one of the copyright holders for GVE common > > > code. > > > > > > Do you realize we had lenghty discussions in the Technical Board, > > > the Governing Board, and with lawyers, just for this unneeded > exception? > > > > > > Now looking at the patches, there seem to be some big mistakes like > > > removing some copyright. I don't understand how it can be taken so > > > lightly. > > > > > > I regret how fast we were, next time we will surely operate > differently. > > > If you want to improve the reputation of this driver, > > > please ask other copyright holders to be more active and responsive. > > > > > > > Really sorry for causing such severe trouble. > > > > Yes, we did take lots of efforts in the Technical Board and the Governing > > Board about this MIT exception. We really appreciate that. > > > > About this patch set, it is my severe mistake to switch the MIT License > > directly for the upstream-ed code in community, in the wrong way. > > In the past we upstream-ed this driver with MIT License followed from > > the kernel community's gve driver base code. And now we want to > > use the code with BSD-3 License (approved by Google). > > So I suppose that the correct way may be 1) first remove all these code > > under MIT License and 2) then add the new files under BSD-3 License. > > The code under BSD is different of the MIT code? > If it is the same with a new approved license, you can just change the > license. > > > Please correct me if there are still misunderstanding in my statement. > > Thanks Thomas for pointing out my mistake. I'll be careful to fix this. > > > > Copyright holder for the gve base code will stay unchanged. Google LLC > > will be added as one of the copyright holders for the gve common code. > > @Rushil Gupta Please also be more active and responsive for the code > > review and contribution in the community. Thanks! > > > >
Re: [PATCH 2/2] net/gve: update copyright holders
On Thu, Mar 30, 2023 at 6:14 AM Thomas Monjalon wrote: > 30/03/2023 09:20, Guo, Junfeng: > > From: Thomas Monjalon > > > 28/03/2023 11:35, Guo, Junfeng: > > > > The background is that, in the past (DPDK 22.11) we didn't get the > > > approval > > > > of license from Google, thus chose the MIT License for the base code, > > > and > > > > BSD-3 License for GVE common code (without the files in /base > folder). > > > > We also left the copyright holder of base code just to Google Inc, > and > > > made > > > > Intel as the copyright holder of GVE common code (without /base > > > folder). > > > > > > > > Today we are working together for GVE dev and maintaining. And we > > > got > > > > the approval of BSD-3 License from Google for the base code. > > > > Thus we dicided to 1) switch the License of GVE base code from MIT to > > > BSD-3; > > > > 2) add Google LLC as one of the copyright holders for GVE common > > > code. > > > > > > Do you realize we had lenghty discussions in the Technical Board, > > > the Governing Board, and with lawyers, just for this unneeded > exception? > > > > > > Now looking at the patches, there seem to be some big mistakes like > > > removing some copyright. I don't understand how it can be taken so > > > lightly. > > > > > > I regret how fast we were, next time we will surely operate > differently. > > > If you want to improve the reputation of this driver, > > > please ask other copyright holders to be more active and responsive. > > > > > > > Really sorry for causing such severe trouble. > > > > Yes, we did take lots of efforts in the Technical Board and the Governing > > Board about this MIT exception. We really appreciate that. > > > > About this patch set, it is my severe mistake to switch the MIT License > > directly for the upstream-ed code in community, in the wrong way. > > In the past we upstream-ed this driver with MIT License followed from > > the kernel community's gve driver base code. And now we want to > > use the code with BSD-3 License (approved by Google). > > So I suppose that the correct way may be 1) first remove all these code > > under MIT License and 2) then add the new files under BSD-3 License. > > The code under BSD is different of the MIT code? > If it is the same with a new approved license, you can just change the > license. > > > Please correct me if there are still misunderstanding in my statement. > > Thanks Thomas for pointing out my mistake. I'll be careful to fix this. > > > > Copyright holder for the gve base code will stay unchanged. Google LLC > > will be added as one of the copyright holders for the gve common code. > > @Rushil Gupta Please also be more active and responsive for the code > > review and contribution in the community. Thanks! > > > > We were just trying to comply with the BSD license to get rid of the exception. You have the MIT license for control path/admin-queue code. Since admin-queue path is similar across linux, freebsd and dpdk the code is similar but not exactly the same, We are about to upstream driver code to FreeBSD under BSD license as well so you will see this code under BSD license soon. I will consult the lawyers on my end as well.
RE: [PATCH v5 1/3] ethdev: add API for buffer recycle mode
> From: Morten Brørup > Sent: Thursday, 30 March 2023 17.15 > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > Sent: Thursday, 30 March 2023 11.31 > > > > > From: Morten Brørup > > > Sent: Thursday, March 30, 2023 3:19 PM > > > > > > > From: Feifei Wang [mailto:feifei.wa...@arm.com] > > > > Sent: Thursday, 30 March 2023 08.30 > > > > > > > > > > [...] > > > > > > > +/** > > > > + * @internal > > > > + * Rx routine for rte_eth_dev_buf_recycle(). > > > > + * Refill Rx descriptors in buffer recycle mode. > > > > + * > > > > + * @note > > > > + * This API can only be called by rte_eth_dev_buf_recycle(). > > > > + * Before calling this API, rte_eth_tx_buf_stash() should be > > > > + * called to stash Tx used buffers into Rx buffer ring. > > > > + * > > > > + * When this functionality is not implemented in the driver, the > > > > +return > > > > + * buffer number is 0. > > > > + * > > > > + * @param port_id > > > > + * The port identifier of the Ethernet device. > > > > + * @param queue_id > > > > + * The index of the receive queue. > > > > + * The value must be in the range [0, nb_rx_queue - 1] previously > > > supplied > > > > + * to rte_eth_dev_configure(). > > > > + *@param nb > > > > + * The number of Rx descriptors to be refilled. > > > > + * @return > > > > + * The number Rx descriptors correct to be refilled. > > > > + * - ENODEV: bad port or queue (only if compiled with debug). > > > > > > If you want errors reported by the return value, the function return type > > > cannot be uint16_t. > > Agree. Actually, in the code path, if errors happen, the function will > return > > 0. > > For this description line, I refer to 'rte_eth_tx_prepare' notes. Maybe we > > should delete > > this line. > > > > > > > > > + */ > > > > +static inline uint16_t rte_eth_rx_descriptors_refill(uint16_t port_id, > > > > + uint16_t queue_id, uint16_t nb) > > > > +{ > > > > + struct rte_eth_fp_ops *p; > > > > + void *qd; > > > > + > > > > +#ifdef RTE_ETHDEV_DEBUG_RX > > > > + if (port_id >= RTE_MAX_ETHPORTS || > > > > + queue_id >= RTE_MAX_QUEUES_PER_PORT) { > > > > + RTE_ETHDEV_LOG(ERR, > > > > + "Invalid port_id=%u or queue_id=%u\n", > > > > + port_id, queue_id); > > > > + rte_errno = ENODEV; > > > > + return 0; > > > > > > If p->rx_descriptors_refill() is likely to return 0, this function should > > not use 0 > > > as return value to indicate errors. > > However, refer to dpdk code style in ethdev, most of API write like this. > > For example, 'rte_eth_rx/tx_burst', 'rte_eth_tx_prep'. > > > > I'm also confused what's return type for this due to I want > > to indicate errors and show the processed buffer number. > > OK. Thanks for the references. > > Looking at rte_eth_rx/tx_burst(), you could follow the same conventions here, > i.e.: > - Use uint16_t as return type. > - Return 0 on error. > - Do not set rte_errno. > - Remove the "ENODEV" line from the @return description. > - Use RTE_ETHDEV_LOG(ERR,...) as the only method to indicate errors. > > I now see that you follow the convention of rte_eth_tx_prepare(). This is also > perfectly fine; then you just need to update the description of @return to > mention that the error value is set in rte_errno if a value less than 'nb' is > returned. After further consideration, I have changed my mind: The primary purpose of rte_eth_tx_prepare() is to test if a packet burst is valid, so the ability to return an error value is a natural requirement. This is not the purpose your functions. The purpose of your functions resemble rte_eth_rx/tx_burst(), where there is no requirement to return an error value. So you should follow the convention of rte_eth_rx/tx_burst(), as I just suggested. > > > > > > > > > > + } > > > > +#endif > > > > + > > > > + p = &rte_eth_fp_ops[port_id]; > > > > + qd = p->rxq.data[queue_id]; > > > > + > > > > +#ifdef RTE_ETHDEV_DEBUG_RX > > > > + if (!rte_eth_dev_is_valid_port(port_id)) { > > > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx port_id=%u\n", port_id); > > > > + rte_errno = ENODEV; > > > > + return 0; > > > > + > > > > + if (qd == NULL) { > > > > + RTE_ETHDEV_LOG(ERR, "Invalid Rx queue_id=%u for > > > port_id=%u\n", > > > > + queue_id, port_id); > > > > + rte_errno = ENODEV; > > > > + return 0; > > > > + } > > > > +#endif > > > > + > > > > + if (p->rx_descriptors_refill == NULL) > > > > + return 0; > > > > + > > > > + return p->rx_descriptors_refill(qd, nb); } > > When does p->rx_descriptors_refill() return anything else than 'nb'? > > If p->rx_descriptors_refill() always succeeds (and thus always returns 'nb'), > you could make its return type void. And thus, you could also make the return > type of rte_eth_rx_descriptors_refill
[PATCH v1] doc: update release notes for 23.03
Fix grammar, spelling and formatting of DPDK 23.03 release notes. Signed-off-by: John McNamara --- * Minor fixes/changes only. doc/guides/rel_notes/release_23_03.rst | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst index b93903447d..a31d34f5f5 100644 --- a/doc/guides/rel_notes/release_23_03.rst +++ b/doc/guides/rel_notes/release_23_03.rst @@ -71,7 +71,7 @@ New Features * **Added platform bus support.** A platform bus provides a way to use Linux platform devices which - are compatible with vfio-platform kernel driver. + are compatible with the do vfio-platform kernel driver. * **Added ARM support for power monitor in the power management library.** @@ -80,6 +80,9 @@ New Features * **Added Ethernet link speed for 400 Gb/s.** +Added Ethernet link speed for 400 Gb/s since there are some devices already +supporting that speed and it is well standardized in IEEE. + * **Added support for mapping a queue with an aggregated port.** * Introduced new function ``rte_eth_dev_count_aggr_ports()`` @@ -88,6 +91,7 @@ New Features to map a Tx queue with an aggregated port of the DPDK port. * Added Rx affinity flow matching of an aggregated port. + * **Added flow matching of IPv6 routing extension.** Added ``RTE_FLOW_ITEM_TYPE_IPV6_ROUTING_EXT`` @@ -113,7 +117,7 @@ New Features * **Added cross-port indirect action in asynchronous flow API.** - * Allowed to share indirect actions between ports by passing + * Enabled the ability to share indirect actions between ports by passing the flag ``RTE_FLOW_PORT_FLAG_SHARE_INDIRECT`` to ``rte_flow_configure()``. * Added ``host_port_id`` in ``rte_flow_port_attr`` structure to reference the port hosting the shared objects. @@ -215,14 +219,14 @@ New Features * **Updated the eventdev reconfigure logic for service based adapters.** - * eventdev reconfig logic is enhanced to increment the + * The eventdev reconfigure logic was enhanced to increment the ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter if event port config is of type ``RTE_EVENT_PORT_CFG_SINGLE_LINK``. * With this change, the application no longer needs to account for the ``rte_event_dev_config::nb_single_link_event_port_queues`` parameter required for eth_rx, eth_tx, crypto and timer eventdev adapters. -* **Added pcap trace support in graph library.** +* **Added PCAP trace support in graph library.** * Added support to capture packets at each graph node with packet metadata and node name. @@ -263,8 +267,8 @@ API Changes * The telemetry command ``/eal/heap_info`` is fixed to print ``Heap_id``. -* The experimental function ``rte_pcapng_copy`` was updated to support comment - section in enhanced packet block in the pcapng library. +* The experimental function ``rte_pcapng_copy`` was updated to support a comment + section in enhanced packet block in the PcapNG library. * The experimental structures ``struct rte_graph_param``, ``struct rte_graph`` and ``struct graph`` were updated to support pcap trace in the graph library. -- 2.31.1
Re: [PATCH v1] doc: update release notes for 23.03
On 3/30/2023 5:09 PM, John McNamara wrote: Fix grammar, spelling and formatting of DPDK 23.03 release notes. Signed-off-by: John McNamara --- * Minor fixes/changes only. doc/guides/rel_notes/release_23_03.rst | 16 ++-- 1 file changed, 10 insertions(+), 6 deletions(-) diff --git a/doc/guides/rel_notes/release_23_03.rst b/doc/guides/rel_notes/release_23_03.rst index b93903447d..a31d34f5f5 100644 --- a/doc/guides/rel_notes/release_23_03.rst +++ b/doc/guides/rel_notes/release_23_03.rst @@ -71,7 +71,7 @@ New Features * **Added platform bus support.** A platform bus provides a way to use Linux platform devices which - are compatible with vfio-platform kernel driver. + are compatible with the do vfio-platform kernel driver. Hi John, Looks like there are a double spacing problem between "do" and "vfio-platform", also I suppose "the vfio-platform" is sufficient. Other than that, Acked-by: Fan Zhang
[Bug 1206] Multiple large memory block allocations using rte_malloc can lead to memory out-of-bounds issues.
https://bugs.dpdk.org/show_bug.cgi?id=1206 Bug ID: 1206 Summary: Multiple large memory block allocations using rte_malloc can lead to memory out-of-bounds issues. Product: DPDK Version: 21.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: major Priority: Normal Component: core Assignee: dev@dpdk.org Reporter: killerst...@gmail.com Target Milestone: --- [root@localhost bin]# lscpu Architecture: x86_64 CPU op-mode(s):32-bit, 64-bit Byte Order:Little Endian CPU(s):8 On-line CPU(s) list: 0-7 Thread(s) per core:2 Core(s) per socket:4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family:6 Model: 58 Model name:Intel(R) Core(TM) i7-3770 CPU @ 3.40GHz Stepping: 9 CPU MHz: 3700.073 CPU max MHz: 3900. CPU min MHz: 1600. BogoMIPS: 6784.24 Virtualization:VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 8192K NUMA node0 CPU(s): 0-7 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm epb ssbd ibrs ibpb tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts [root@localhost bin]# Not supported pdpe1gb There are many free 2M HugePages. HugePages_Total:6656 HugePages_Free: 5682 HugePages_Rsvd:0 HugePages_Surp:0 Hugepagesize: 2048 kB DirectMap4k: 236476 kB DirectMap2M:33228800 kB test code char * t_mem1; char * t_mem2; int t_size = 1024*1024*1024; t_mem1 = rte_malloc(NULL,t_size,RTE_CACHE_LINE_SIZE); t_mem2 = rte_malloc(NULL,t_size,RTE_CACHE_LINE_SIZE); printf("rte_malloc1 t_mem1=%p \n",t_mem1); printf("rte_malloc1 t_mem2=%p \n",t_mem2); memset(t_mem1,0,t_size); memset(t_mem2,1,t_size); int t_i; for(t_i=0;t_i
[RFC 0/4] add frequency adjustment support for PTP
[RFC 1/4] ethdev: add frequency adjustment API. [RFC 2/4] net/ice: add frequency adjustment support for PTP. [RFC 3/4] examples/ptpclient: refine application. [RFC 4/4] examples/ptpclient: add frequency adjustment support. Simei Su (4): ethdev: add frequency adjustment API net/ice: add frequency adjustment support for PTP examples/ptpclient: refine application examples/ptpclient: add frequency adjustment support drivers/net/ice/ice_ethdev.c | 111 +--- examples/ptpclient/ptpclient.c | 222 +-- lib/ethdev/ethdev_driver.h | 5 + lib/ethdev/ethdev_trace.h| 9 ++ lib/ethdev/ethdev_trace_points.c | 3 + lib/ethdev/rte_ethdev.c | 18 lib/ethdev/rte_ethdev.h | 19 7 files changed, 317 insertions(+), 70 deletions(-) -- 2.9.5
[RFC 1/4] ethdev: add frequency adjustment API
This patch adds freq adjustment API for PTP high accuracy. Signed-off-by: Simei Su --- lib/ethdev/ethdev_driver.h | 5 + lib/ethdev/ethdev_trace.h| 9 + lib/ethdev/ethdev_trace_points.c | 3 +++ lib/ethdev/rte_ethdev.c | 18 ++ lib/ethdev/rte_ethdev.h | 19 +++ 5 files changed, 54 insertions(+) diff --git a/lib/ethdev/ethdev_driver.h b/lib/ethdev/ethdev_driver.h index 2c9d615..b1451d2 100644 --- a/lib/ethdev/ethdev_driver.h +++ b/lib/ethdev/ethdev_driver.h @@ -633,6 +633,9 @@ typedef int (*eth_timesync_read_tx_timestamp_t)(struct rte_eth_dev *dev, /** @internal Function used to adjust the device clock. */ typedef int (*eth_timesync_adjust_time)(struct rte_eth_dev *dev, int64_t); +/** @internal Function used to adjust the clock frequency. */ +typedef int (*eth_timesync_adjust_freq)(struct rte_eth_dev *dev, int64_t); + /** @internal Function used to get time from the device clock. */ typedef int (*eth_timesync_read_time)(struct rte_eth_dev *dev, struct timespec *timestamp); @@ -1344,6 +1347,8 @@ struct eth_dev_ops { eth_timesync_read_tx_timestamp_t timesync_read_tx_timestamp; /** Adjust the device clock */ eth_timesync_adjust_time timesync_adjust_time; + /** Adjust the clock frequency */ + eth_timesync_adjust_freq timesync_adjust_freq; /** Get the device clock time */ eth_timesync_read_time timesync_read_time; /** Set the device clock time */ diff --git a/lib/ethdev/ethdev_trace.h b/lib/ethdev/ethdev_trace.h index 3dc7d02..d92554b 100644 --- a/lib/ethdev/ethdev_trace.h +++ b/lib/ethdev/ethdev_trace.h @@ -2196,6 +2196,15 @@ RTE_TRACE_POINT_FP( rte_trace_point_emit_int(ret); ) +/* Called in loop in examples/ptpclient */ +RTE_TRACE_POINT_FP( + rte_eth_trace_timesync_adjust_freq, + RTE_TRACE_POINT_ARGS(uint16_t port_id, int64_t ppm, int ret), + rte_trace_point_emit_u16(port_id); + rte_trace_point_emit_i64(ppm); + rte_trace_point_emit_int(ret); +) + /* Called in loop in app/test-flow-perf */ RTE_TRACE_POINT_FP( rte_flow_trace_create, diff --git a/lib/ethdev/ethdev_trace_points.c b/lib/ethdev/ethdev_trace_points.c index 61010ca..c01b5d3 100644 --- a/lib/ethdev/ethdev_trace_points.c +++ b/lib/ethdev/ethdev_trace_points.c @@ -406,6 +406,9 @@ RTE_TRACE_POINT_REGISTER(rte_eth_trace_timesync_read_tx_timestamp, RTE_TRACE_POINT_REGISTER(rte_eth_trace_timesync_adjust_time, lib.ethdev.timesync_adjust_time) +RTE_TRACE_POINT_REGISTER(rte_eth_trace_timesync_adjust_freq, + lib.ethdev.timesync_adjust_freq) + RTE_TRACE_POINT_REGISTER(rte_eth_trace_timesync_read_time, lib.ethdev.timesync_read_time) diff --git a/lib/ethdev/rte_ethdev.c b/lib/ethdev/rte_ethdev.c index 4d03255..f5934bb 100644 --- a/lib/ethdev/rte_ethdev.c +++ b/lib/ethdev/rte_ethdev.c @@ -6017,6 +6017,24 @@ rte_eth_timesync_adjust_time(uint16_t port_id, int64_t delta) } int +rte_eth_timesync_adjust_freq(uint16_t port_id, int64_t ppm) +{ + struct rte_eth_dev *dev; + int ret; + + RTE_ETH_VALID_PORTID_OR_ERR_RET(port_id, -ENODEV); + dev = &rte_eth_devices[port_id]; + + if (*dev->dev_ops->timesync_adjust_freq == NULL) + return -ENOTSUP; + ret = eth_err(port_id, (*dev->dev_ops->timesync_adjust_freq)(dev, ppm)); + + rte_eth_trace_timesync_adjust_freq(port_id, ppm, ret); + + return ret; +} + +int rte_eth_timesync_read_time(uint16_t port_id, struct timespec *timestamp) { struct rte_eth_dev *dev; diff --git a/lib/ethdev/rte_ethdev.h b/lib/ethdev/rte_ethdev.h index 99fe9e2..9737461 100644 --- a/lib/ethdev/rte_ethdev.h +++ b/lib/ethdev/rte_ethdev.h @@ -5102,6 +5102,25 @@ int rte_eth_timesync_read_tx_timestamp(uint16_t port_id, int rte_eth_timesync_adjust_time(uint16_t port_id, int64_t delta); /** + * Adjust the clock increment rate on an Ethernet device. + * + * This is usually used in conjunction with other Ethdev timesync functions to + * synchronize the device time using the IEEE1588/802.1AS protocol. + * + * @param port_id + * The port identifier of the Ethernet device. + * @param ppm + * Parts per million with 16-bit fractional field + * + * @return + * - 0: Success. + * - -ENODEV: The port ID is invalid. + * - -EIO: if device is removed. + * - -ENOTSUP: The function is not supported by the Ethernet driver. + */ +int rte_eth_timesync_adjust_freq(uint16_t port_id, int64_t ppm); + +/** * Read the time from the timesync clock on an Ethernet device. * * This is usually used in conjunction with other Ethdev timesync functions to -- 2.9.5
[RFC 2/4] net/ice: add frequency adjustment support for PTP
Add ice support for new ethdev API to adjust frequency for IEEE1588 PTP. Also, this patch reworks code for converting software update to hardware update. Signed-off-by: Simei Su --- drivers/net/ice/ice_ethdev.c | 111 --- 1 file changed, 72 insertions(+), 39 deletions(-) diff --git a/drivers/net/ice/ice_ethdev.c b/drivers/net/ice/ice_ethdev.c index 9a88cf9..fa4d840 100644 --- a/drivers/net/ice/ice_ethdev.c +++ b/drivers/net/ice/ice_ethdev.c @@ -158,6 +158,7 @@ static int ice_timesync_read_rx_timestamp(struct rte_eth_dev *dev, static int ice_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct timespec *timestamp); static int ice_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta); +static int ice_timesync_adjust_freq(struct rte_eth_dev *dev, int64_t ppm); static int ice_timesync_read_time(struct rte_eth_dev *dev, struct timespec *timestamp); static int ice_timesync_write_time(struct rte_eth_dev *dev, @@ -274,6 +275,7 @@ static const struct eth_dev_ops ice_eth_dev_ops = { .timesync_read_rx_timestamp = ice_timesync_read_rx_timestamp, .timesync_read_tx_timestamp = ice_timesync_read_tx_timestamp, .timesync_adjust_time = ice_timesync_adjust_time, + .timesync_adjust_freq = ice_timesync_adjust_freq, .timesync_read_time = ice_timesync_read_time, .timesync_write_time = ice_timesync_write_time, .timesync_disable = ice_timesync_disable, @@ -5840,23 +5842,6 @@ ice_timesync_enable(struct rte_eth_dev *dev) } } - /* Initialize cycle counters for system time/RX/TX timestamp */ - memset(&ad->systime_tc, 0, sizeof(struct rte_timecounter)); - memset(&ad->rx_tstamp_tc, 0, sizeof(struct rte_timecounter)); - memset(&ad->tx_tstamp_tc, 0, sizeof(struct rte_timecounter)); - - ad->systime_tc.cc_mask = ICE_CYCLECOUNTER_MASK; - ad->systime_tc.cc_shift = 0; - ad->systime_tc.nsec_mask = 0; - - ad->rx_tstamp_tc.cc_mask = ICE_CYCLECOUNTER_MASK; - ad->rx_tstamp_tc.cc_shift = 0; - ad->rx_tstamp_tc.nsec_mask = 0; - - ad->tx_tstamp_tc.cc_mask = ICE_CYCLECOUNTER_MASK; - ad->tx_tstamp_tc.cc_shift = 0; - ad->tx_tstamp_tc.nsec_mask = 0; - ad->ptp_ena = 1; return 0; @@ -5871,14 +5856,13 @@ ice_timesync_read_rx_timestamp(struct rte_eth_dev *dev, ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); struct ice_rx_queue *rxq; uint32_t ts_high; - uint64_t ts_ns, ns; + uint64_t ts_ns; rxq = dev->data->rx_queues[flags]; ts_high = rxq->time_high; ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, ts_high); - ns = rte_timecounter_update(&ad->rx_tstamp_tc, ts_ns); - *timestamp = rte_ns_to_timespec(ns); + *timestamp = rte_ns_to_timespec(ts_ns); return 0; } @@ -5891,7 +5875,7 @@ ice_timesync_read_tx_timestamp(struct rte_eth_dev *dev, struct ice_adapter *ad = ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); uint8_t lport; - uint64_t ts_ns, ns, tstamp; + uint64_t ts_ns, tstamp; const uint64_t mask = 0x; int ret; @@ -5904,8 +5888,7 @@ ice_timesync_read_tx_timestamp(struct rte_eth_dev *dev, } ts_ns = ice_tstamp_convert_32b_64b(hw, ad, 1, (tstamp >> 8) & mask); - ns = rte_timecounter_update(&ad->tx_tstamp_tc, ts_ns); - *timestamp = rte_ns_to_timespec(ns); + *timestamp = rte_ns_to_timespec(ts_ns); return 0; } @@ -5913,12 +5896,66 @@ ice_timesync_read_tx_timestamp(struct rte_eth_dev *dev, static int ice_timesync_adjust_time(struct rte_eth_dev *dev, int64_t delta) { - struct ice_adapter *ad = - ICE_DEV_PRIVATE_TO_ADAPTER(dev->data->dev_private); + struct ice_hw *hw = ICE_DEV_PRIVATE_TO_HW(dev->data->dev_private); + uint8_t tmr_idx = hw->func_caps.ts_func_info.tmr_index_assoc; + uint32_t lo, lo2, hi; + uint64_t time, ns; + int ret; + + if (delta > INT32_MAX || delta < INT32_MIN) { + lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx)); + hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx)); + lo2 = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx)); + + if (lo2 < lo) { + lo = ICE_READ_REG(hw, GLTSYN_TIME_L(tmr_idx)); + hi = ICE_READ_REG(hw, GLTSYN_TIME_H(tmr_idx)); + } + + time = ((uint64_t)hi << 32) | lo; + ns = time + delta; + + return ice_ptp_init_time(hw, ns); + } + + ret = ice_ptp_adj_clock(hw, delta, true); + if (ret) + return -1; + + return 0; +} - ad->systime_tc.nsec += delta; - ad->rx_tstamp_tc.nsec += delta; - ad->tx_t
[RFC 3/4] examples/ptpclient: refine application
This patch reworks code to split delay request message parsing from follow up message parsing. Signed-off-by: Simei Su Signed-off-by: Wenjun Wu --- examples/ptpclient/ptpclient.c | 48 -- 1 file changed, 32 insertions(+), 16 deletions(-) diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c index cdf2da6..74a1bf5 100644 --- a/examples/ptpclient/ptpclient.c +++ b/examples/ptpclient/ptpclient.c @@ -382,21 +382,11 @@ parse_sync(struct ptpv2_data_slave_ordinary *ptp_data, uint16_t rx_tstamp_idx) static void parse_fup(struct ptpv2_data_slave_ordinary *ptp_data) { - struct rte_ether_hdr *eth_hdr; - struct rte_ether_addr eth_addr; struct ptp_header *ptp_hdr; - struct clock_id *client_clkid; struct ptp_message *ptp_msg; - struct delay_req_msg *req_msg; - struct rte_mbuf *created_pkt; struct tstamp *origin_tstamp; - struct rte_ether_addr eth_multicast = ether_multicast; - size_t pkt_size; - int wait_us; struct rte_mbuf *m = ptp_data->m; - int ret; - eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); ptp_hdr = (struct ptp_header *)(rte_pktmbuf_mtod(m, char *) + sizeof(struct rte_ether_hdr)); if (memcmp(&ptp_data->master_clock_id, @@ -413,6 +403,26 @@ parse_fup(struct ptpv2_data_slave_ordinary *ptp_data) ptp_data->tstamp1.tv_sec = ((uint64_t)ntohl(origin_tstamp->sec_lsb)) | (((uint64_t)ntohs(origin_tstamp->sec_msb)) << 32); +} + +static void +send_delay_request(struct ptpv2_data_slave_ordinary *ptp_data) +{ + struct rte_ether_hdr *eth_hdr; + struct rte_ether_addr eth_addr; + struct ptp_header *ptp_hdr; + struct clock_id *client_clkid; + struct delay_req_msg *req_msg; + struct rte_mbuf *created_pkt; + struct rte_ether_addr eth_multicast = ether_multicast; + size_t pkt_size; + int wait_us; + struct rte_mbuf *m = ptp_data->m; + int ret; + + eth_hdr = rte_pktmbuf_mtod(m, struct rte_ether_hdr *); + ptp_hdr = (struct ptp_header *)(rte_pktmbuf_mtod(m, char *) + + sizeof(struct rte_ether_hdr)); if (ptp_data->seqID_FOLLOWUP == ptp_data->seqID_SYNC) { ret = rte_eth_macaddr_get(ptp_data->portid, ð_addr); @@ -550,12 +560,6 @@ parse_drsp(struct ptpv2_data_slave_ordinary *ptp_data) ((uint64_t)ntohl(rx_tstamp->sec_lsb)) | (((uint64_t)ntohs(rx_tstamp->sec_msb)) << 32); - /* Evaluate the delta for adjustment. */ - ptp_data->delta = delta_eval(ptp_data); - - rte_eth_timesync_adjust_time(ptp_data->portid, -ptp_data->delta); - ptp_data->current_ptp_port = ptp_data->portid; /* Update kernel time if enabled in app parameters. */ @@ -568,6 +572,16 @@ parse_drsp(struct ptpv2_data_slave_ordinary *ptp_data) } } +static void +ptp_adjust_time(struct ptpv2_data_slave_ordinary *ptp_data) +{ + /* Evaluate the delta for adjustment. */ + ptp_data->delta = delta_eval(ptp_data); + + rte_eth_timesync_adjust_time(ptp_data->portid, +ptp_data->delta); +} + /* This function processes PTP packets, implementing slave PTP IEEE1588 L2 * functionality. */ @@ -594,9 +608,11 @@ parse_ptp_frames(uint16_t portid, struct rte_mbuf *m) { break; case FOLLOW_UP: parse_fup(&ptp_data); + send_delay_request(&ptp_data); break; case DELAY_RESP: parse_drsp(&ptp_data); + ptp_adjust_time(&ptp_data); print_clock_info(&ptp_data); break; default: -- 2.9.5
[RFC 4/4] examples/ptpclient: add frequency adjustment support
This patch adds PI servo algorithm to support frequency adjustment API for IEEE1588 PTP. For example, the command for starting ptpclient with PI algorithm is: ./build/examples/dpdk-ptpclient -a :81:00.0 -c 1 -n 3 -- -T 0 -p 0x1 --controller=pi Signed-off-by: Simei Su Signed-off-by: Wenjun Wu --- examples/ptpclient/ptpclient.c | 178 + 1 file changed, 161 insertions(+), 17 deletions(-) diff --git a/examples/ptpclient/ptpclient.c b/examples/ptpclient/ptpclient.c index 74a1bf5..3d074af 100644 --- a/examples/ptpclient/ptpclient.c +++ b/examples/ptpclient/ptpclient.c @@ -43,6 +43,28 @@ #define KERNEL_TIME_ADJUST_LIMIT 2 #define PTP_PROTOCOL 0x88F7 +#define KP 0.7 +#define KI 0.3 + +enum servo_state { + SERVO_UNLOCKED, + SERVO_JUMP, + SERVO_LOCKED, +}; + +struct pi_servo { + double offset[2]; + double local[2]; + double drift; + int count; +}; + +enum controller_mode { + MODE_NONE, + MODE_PI, + MAX_ALL +} mode; + struct rte_mempool *mbuf_pool; uint32_t ptp_enabled_port_mask; uint8_t ptp_enabled_port_nb; @@ -132,6 +154,9 @@ struct ptpv2_data_slave_ordinary { uint8_t ptpset; uint8_t kernel_time_set; uint16_t current_ptp_port; + int64_t master_offset; + int64_t path_delay; + struct pi_servo *servo; }; static struct ptpv2_data_slave_ordinary ptp_data; @@ -290,36 +315,44 @@ print_clock_info(struct ptpv2_data_slave_ordinary *ptp_data) ptp_data->tstamp3.tv_sec, (ptp_data->tstamp3.tv_nsec)); - printf("\nT4 - Master Clock. %lds %ldns ", + printf("\nT4 - Master Clock. %lds %ldns\n", ptp_data->tstamp4.tv_sec, (ptp_data->tstamp4.tv_nsec)); - printf("\nDelta between master and slave clocks:%"PRId64"ns\n", + if (mode == MODE_NONE) { + printf("\nDelta between master and slave clocks:%"PRId64"ns\n", ptp_data->delta); - clock_gettime(CLOCK_REALTIME, &sys_time); - rte_eth_timesync_read_time(ptp_data->current_ptp_port, &net_time); + clock_gettime(CLOCK_REALTIME, &sys_time); + rte_eth_timesync_read_time(ptp_data->current_ptp_port, + &net_time); - time_t ts = net_time.tv_sec; + time_t ts = net_time.tv_sec; - printf("\n\nComparison between Linux kernel Time and PTP:"); + printf("\n\nComparison between Linux kernel Time and PTP:"); - printf("\nCurrent PTP Time: %.24s %.9ld ns", + printf("\nCurrent PTP Time: %.24s %.9ld ns", ctime(&ts), net_time.tv_nsec); - nsec = (int64_t)timespec64_to_ns(&net_time) - + nsec = (int64_t)timespec64_to_ns(&net_time) - (int64_t)timespec64_to_ns(&sys_time); - ptp_data->new_adj = ns_to_timeval(nsec); + ptp_data->new_adj = ns_to_timeval(nsec); + + gettimeofday(&ptp_data->new_adj, NULL); - gettimeofday(&ptp_data->new_adj, NULL); + time_t tp = ptp_data->new_adj.tv_sec; - time_t tp = ptp_data->new_adj.tv_sec; + printf("\nCurrent SYS Time: %.24s %.6ld ns", + ctime(&tp), ptp_data->new_adj.tv_usec); - printf("\nCurrent SYS Time: %.24s %.6ld ns", - ctime(&tp), ptp_data->new_adj.tv_usec); + printf("\nDelta between PTP and Linux Kernel time:%"PRId64"ns\n", + nsec); + } - printf("\nDelta between PTP and Linux Kernel time:%"PRId64"ns\n", - nsec); + if (mode == MODE_PI) { + printf("path delay: %"PRId64"ns\n", ptp_data->path_delay); + printf("master offset: %"PRId64"ns\n", ptp_data->master_offset); + } printf("[Ctrl+C to quit]\n"); @@ -405,6 +438,76 @@ parse_fup(struct ptpv2_data_slave_ordinary *ptp_data) (((uint64_t)ntohs(origin_tstamp->sec_msb)) << 32); } +static double +pi_sample(struct pi_servo *s, double offset, double local_ts, + enum servo_state *state) +{ + double ppb = 0.0; + + switch (s->count) { + case 0: + s->offset[0] = offset; + s->local[0] = local_ts; + *state = SERVO_UNLOCKED; + s->count = 1; + break; + case 1: + s->offset[1] = offset; + s->local[1] = local_ts; + *state = SERVO_UNLOCKED; + s->count = 2; + break; + case 2: + s->drift += (s->offset[1] - s->offset[0]) / + (s->local[1] - s->local[0]); + *state = SERVO_UNLOCKED; + s->count = 3; + break; + case 3: + *state = SERVO_JUMP; +
RE: [PATCH 2/2] net/gve: update copyright holders
> -Original Message- > From: Thomas Monjalon > Sent: Thursday, March 30, 2023 21:14 > To: Ferruh Yigit ; Zhang, Qi Z > ; Wu, Jingjing ; Xing, > Beilei ; Rushil Gupta ; Guo, > Junfeng > Cc: dev@dpdk.org; Joshua Washington ; Jeroen > de Borst > Subject: Re: [PATCH 2/2] net/gve: update copyright holders > > 30/03/2023 09:20, Guo, Junfeng: > > From: Thomas Monjalon > > > 28/03/2023 11:35, Guo, Junfeng: > > > > The background is that, in the past (DPDK 22.11) we didn't get the > > > approval > > > > of license from Google, thus chose the MIT License for the base > code, > > > and > > > > BSD-3 License for GVE common code (without the files in /base > folder). > > > > We also left the copyright holder of base code just to Google Inc, > and > > > made > > > > Intel as the copyright holder of GVE common code (without /base > > > folder). > > > > > > > > Today we are working together for GVE dev and maintaining. And > we > > > got > > > > the approval of BSD-3 License from Google for the base code. > > > > Thus we dicided to 1) switch the License of GVE base code from MIT > to > > > BSD-3; > > > > 2) add Google LLC as one of the copyright holders for GVE common > > > code. > > > > > > Do you realize we had lenghty discussions in the Technical Board, > > > the Governing Board, and with lawyers, just for this unneeded > exception? > > > > > > Now looking at the patches, there seem to be some big mistakes like > > > removing some copyright. I don't understand how it can be taken so > > > lightly. > > > > > > I regret how fast we were, next time we will surely operate differently. > > > If you want to improve the reputation of this driver, > > > please ask other copyright holders to be more active and responsive. > > > > > > > Really sorry for causing such severe trouble. > > > > Yes, we did take lots of efforts in the Technical Board and the Governing > > Board about this MIT exception. We really appreciate that. > > > > About this patch set, it is my severe mistake to switch the MIT License > > directly for the upstream-ed code in community, in the wrong way. > > In the past we upstream-ed this driver with MIT License followed from > > the kernel community's gve driver base code. And now we want to > > use the code with BSD-3 License (approved by Google). > > So I suppose that the correct way may be 1) first remove all these code > > under MIT License and 2) then add the new files under BSD-3 License. > > The code under BSD is different of the MIT code? > If it is the same with a new approved license, you can just change the > license. For this patch set, the code lines remain unchanged. We want to use BSD licensed source files to replace the MIT licensed ones. This patch set is mainly used to for the license related purpose. You can check the latest v4 patch set: https://patchwork.dpdk.org/project/dpdk/list/?series=27570&state=* Sorry about the misleading titles and statements in this patch set, that cause the misunderstanding of changing license/copyright unconsidered. As Rushil replied, Google is about to upstream driver code to FreeBSD under BSD license as well so we will see this code under BSD license soon. And he will consult the lawyers on his end as well. Thanks > > > Please correct me if there are still misunderstanding in my statement. > > Thanks Thomas for pointing out my mistake. I'll be careful to fix this. > > > > Copyright holder for the gve base code will stay unchanged. Google LLC > > will be added as one of the copyright holders for the gve common code. > > @Rushil Gupta Please also be more active and responsive for the code > > review and contribution in the community. Thanks! > >
[PATCH v5 00/15] graph enhancement for multi-core dispatch
V5: Fix CI build issues about dynamically update doc. V4: Fix CI build issues about undefined reference of sched apis. Remove inline for model setting. V3: Fix CI build issues about TLS and typo. V2: Use git mv to keep git history. Use TLS for per-thread local storage. Change model name to mcore dispatch. Change API with specific mode name. Split big patch. Fix CI issues. Rebase l3fwd-graph example. Update doc and maintainers files. Currently, rte_graph supports RTC (Run-To-Completion) model within each of a single core. RTC is one of the typical model of packet processing. Others like Pipeline or Hybrid are lack of support. The patch set introduces a 'multicore dispatch' model selection which is a self-reacting scheme according to the core affinity. The new model enables a cross-core dispatching mechanism which employs a scheduling work-queue to dispatch streams to other worker cores which being associated with the destination node. When core flavor of the destination node is a default 'current', the stream can be continue executed as normal. Example: 3-node graph targets 3-core budget RTC: Graph: node-0 -> node-1 -> node-2 @Core0. + - - - - - - - - - - - - - - - - - - - - - + 'Core #0/1/2' ' ' ' ++ +-+ ++ ' ' | Node-0 | --> | Node-1 | --> | Node-2 | ' ' ++ +-+ ++ ' ' ' + - - - - - - - - - - - - - - - - - - - - - + Dispatch: Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3. Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2. .. code-block:: diff + - - - - - -+ +- - - - - - - - - - - - - + + - - - - - -+ ' Core #0 ' ' Core #1 ' ' Core #2 ' '' ' ' '' ' ++ ' ' ++++ ' ' ++ ' ' | Node-0 | - - - ->| Node-1 || Node-3 |<- - - - | Node-2 | ' ' ++ ' ' ++++ ' ' ++ ' '' ' |' ' ^ ' + - - - - - -+ +- - -|- - - - - - - - - - + + - - -|- - -+ | | + - - - - - - - - - - - - - - - - + The patch set has been break down as below: 1. Split graph worker into common and default model part. 2. Inline graph node processing to make it reusable. 3. Add set/get APIs to choose worker model. 4. Introduce core affinity API to set the node run on specific worker core. (only use in new model) 5. Introduce graph affinity API to bind one graph with specific worker core. 6. Introduce graph clone API. 7. Introduce stream moving with scheduler work-queue in patch 8~12. 8. Add stats for new models. 9. Abstract default graph config process and integrate new model into example/l3fwd-graph. Add new parameters for model choosing. We could run with new worker model by this: ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P --model="dispatch" References: https://static.sched.com/hosted_files/dpdkuserspace22/a6/graph%20introduce%20remote%20dispatch%20for%20mult-core%20scaling.pdf Zhirun Yan (15): graph: rename rte_graph_work as common graph: split graph worker into common and default model graph: move node process into inline function graph: add get/set graph worker model APIs graph: introduce graph node core affinity API graph: introduce graph bind unbind API graph: introduce graph clone API for other worker core graph: add struct for stream moving between cores graph: introduce stream moving cross cores graph: enable create and destroy graph scheduling workqueue graph: introduce graph walk by cross-core dispatch graph: enable graph multicore dispatch scheduler model graph: add stats for cross-core dispatching examples/l3fwd-graph: introduce multicore dispatch worker model doc: update multicore dispatch model in graph guides MAINTAINERS | 1 + doc/guides/prog_guide/graph_lib.rst | 59 ++- examples/l3fwd-graph/main.c | 236 +--- lib/graph/graph.c| 179 + lib/graph/graph_debug.c | 6 + lib/graph/graph_populate.c | 1 + lib/graph/graph_private.h| 44 +++ lib/graph/graph_stats.c | 74 +++- lib/graph/meson.build| 4 +- lib/graph/node.c | 1 + lib/graph/rte_graph.h| 44 +++ lib/graph/rte_graph_model_dispatch.c | 179 + lib/graph/rte_graph_model_dispatch.h | 122 ++ lib/graph/rte_graph_model_rtc.h | 45 +++ lib/graph/rte_graph_worker.c | 54 +++ lib/graph/rte_graph_worker.h | 498 + lib/graph/rte_graph_worker_common.h | 539 +++ lib/graph/version.map
[PATCH v5 01/15] graph: rename rte_graph_work as common
Rename rte_graph_work.h to rte_graph_work_common.h for supporting multiple graph worker model. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- MAINTAINERS | 1 + lib/graph/graph_pcap.c | 2 +- lib/graph/graph_private.h | 2 +- lib/graph/meson.build | 2 +- lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} | 6 +++--- 5 files changed, 7 insertions(+), 6 deletions(-) rename lib/graph/{rte_graph_worker.h => rte_graph_worker_common.h} (99%) diff --git a/MAINTAINERS b/MAINTAINERS index 280058adfc..9d9467dd00 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -1714,6 +1714,7 @@ F: doc/guides/prog_guide/bpf_lib.rst Graph - EXPERIMENTAL M: Jerin Jacob M: Kiran Kumar K +M: Zhirun Yan F: lib/graph/ F: doc/guides/prog_guide/graph_lib.rst F: app/test/test_graph* diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index 6c43330029..8a220370fa 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -10,7 +10,7 @@ #include #include -#include "rte_graph_worker.h" +#include "rte_graph_worker_common.h" #include "graph_pcap_private.h" diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index 7d1b30b8ac..f08dbc7e9d 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -12,7 +12,7 @@ #include #include "rte_graph.h" -#include "rte_graph_worker.h" +#include "rte_graph_worker_common.h" extern int rte_graph_logtype; diff --git a/lib/graph/meson.build b/lib/graph/meson.build index 3526d1b5d4..4e2b612ad3 100644 --- a/lib/graph/meson.build +++ b/lib/graph/meson.build @@ -16,6 +16,6 @@ sources = files( 'graph_populate.c', 'graph_pcap.c', ) -headers = files('rte_graph.h', 'rte_graph_worker.h') +headers = files('rte_graph.h', 'rte_graph_worker_common.h') deps += ['eal', 'pcapng'] diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker_common.h similarity index 99% rename from lib/graph/rte_graph_worker.h rename to lib/graph/rte_graph_worker_common.h index 438595b15c..0bad2938f3 100644 --- a/lib/graph/rte_graph_worker.h +++ b/lib/graph/rte_graph_worker_common.h @@ -2,8 +2,8 @@ * Copyright(C) 2020 Marvell International Ltd. */ -#ifndef _RTE_GRAPH_WORKER_H_ -#define _RTE_GRAPH_WORKER_H_ +#ifndef _RTE_GRAPH_WORKER_COMMON_H_ +#define _RTE_GRAPH_WORKER_COMMON_H_ /** * @file rte_graph_worker.h @@ -518,4 +518,4 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src, } #endif -#endif /* _RTE_GRAPH_WORKER_H_ */ +#endif /* _RTE_GRAPH_WORKER_COIMMON_H_ */ -- 2.37.2
[PATCH v5 03/15] graph: move node process into inline function
Node process is a single and reusable block, move the code into an inline function. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/rte_graph_model_rtc.h | 20 ++--- lib/graph/rte_graph_worker_common.h | 33 + 2 files changed, 35 insertions(+), 18 deletions(-) diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h index 665560f831..0dcb7151e9 100644 --- a/lib/graph/rte_graph_model_rtc.h +++ b/lib/graph/rte_graph_model_rtc.h @@ -20,9 +20,6 @@ rte_graph_walk_rtc(struct rte_graph *graph) const rte_node_t mask = graph->cir_mask; uint32_t head = graph->head; struct rte_node *node; - uint64_t start; - uint16_t rc; - void **objs; /* * Walk on the source node(s) ((cir_start - head) -> cir_start) and then @@ -41,21 +38,8 @@ rte_graph_walk_rtc(struct rte_graph *graph) */ while (likely(head != graph->tail)) { node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); - RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); - objs = node->objs; - rte_prefetch0(objs); - - if (rte_graph_has_stats_feature()) { - start = rte_rdtsc(); - rc = node->process(graph, node, objs, node->idx); - node->total_cycles += rte_rdtsc() - start; - node->total_calls++; - node->total_objs += rc; - } else { - node->process(graph, node, objs, node->idx); - } - node->idx = 0; - head = likely((int32_t)head > 0) ? head & mask : head; + __rte_node_process(graph, node); + head = likely((int32_t)head > 0) ? head & mask : head; } graph->tail = 0; } diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h index b58f8f6947..41428974db 100644 --- a/lib/graph/rte_graph_worker_common.h +++ b/lib/graph/rte_graph_worker_common.h @@ -130,6 +130,39 @@ void __rte_node_stream_alloc_size(struct rte_graph *graph, /* Fast path helper functions */ +/** + * @internal + * + * Enqueue a given node to the tail of the graph reel. + * + * @param graph + * Pointer Graph object. + * @param node + * Pointer to node object to be enqueued. + */ +static __rte_always_inline void +__rte_node_process(struct rte_graph *graph, struct rte_node *node) +{ + uint64_t start; + uint16_t rc; + void **objs; + + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); + objs = node->objs; + rte_prefetch0(objs); + + if (rte_graph_has_stats_feature()) { + start = rte_rdtsc(); + rc = node->process(graph, node, objs, node->idx); + node->total_cycles += rte_rdtsc() - start; + node->total_calls++; + node->total_objs += rc; + } else { + node->process(graph, node, objs, node->idx); + } + node->idx = 0; +} + /** * @internal * -- 2.37.2
[PATCH v5 02/15] graph: split graph worker into common and default model
To support multiple graph worker model, split graph into common and default. Naming the current walk function as rte_graph_model_rtc cause the default model is RTC(Run-to-completion). Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph_pcap.c | 2 +- lib/graph/graph_private.h | 2 +- lib/graph/meson.build | 2 +- lib/graph/rte_graph_model_rtc.h | 61 + lib/graph/rte_graph_worker.h| 34 lib/graph/rte_graph_worker_common.h | 57 --- 6 files changed, 98 insertions(+), 60 deletions(-) create mode 100644 lib/graph/rte_graph_model_rtc.h create mode 100644 lib/graph/rte_graph_worker.h diff --git a/lib/graph/graph_pcap.c b/lib/graph/graph_pcap.c index 8a220370fa..6c43330029 100644 --- a/lib/graph/graph_pcap.c +++ b/lib/graph/graph_pcap.c @@ -10,7 +10,7 @@ #include #include -#include "rte_graph_worker_common.h" +#include "rte_graph_worker.h" #include "graph_pcap_private.h" diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index f08dbc7e9d..7d1b30b8ac 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -12,7 +12,7 @@ #include #include "rte_graph.h" -#include "rte_graph_worker_common.h" +#include "rte_graph_worker.h" extern int rte_graph_logtype; diff --git a/lib/graph/meson.build b/lib/graph/meson.build index 4e2b612ad3..3526d1b5d4 100644 --- a/lib/graph/meson.build +++ b/lib/graph/meson.build @@ -16,6 +16,6 @@ sources = files( 'graph_populate.c', 'graph_pcap.c', ) -headers = files('rte_graph.h', 'rte_graph_worker_common.h') +headers = files('rte_graph.h', 'rte_graph_worker.h') deps += ['eal', 'pcapng'] diff --git a/lib/graph/rte_graph_model_rtc.h b/lib/graph/rte_graph_model_rtc.h new file mode 100644 index 00..665560f831 --- /dev/null +++ b/lib/graph/rte_graph_model_rtc.h @@ -0,0 +1,61 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Intel Corporation + */ + +#include "rte_graph_worker_common.h" + +/** + * Perform graph walk on the circular buffer and invoke the process function + * of the nodes and collect the stats. + * + * @param graph + * Graph pointer returned from rte_graph_lookup function. + * + * @see rte_graph_lookup() + */ +static inline void +rte_graph_walk_rtc(struct rte_graph *graph) +{ + const rte_graph_off_t *cir_start = graph->cir_start; + const rte_node_t mask = graph->cir_mask; + uint32_t head = graph->head; + struct rte_node *node; + uint64_t start; + uint16_t rc; + void **objs; + + /* +* Walk on the source node(s) ((cir_start - head) -> cir_start) and then +* on the pending streams (cir_start -> (cir_start + mask) -> cir_start) +* in a circular buffer fashion. +* +* +-+ <= cir_start - head [number of source nodes] +* | | +* | ... | <= source nodes +* | | +* +-+ <= cir_start [head = 0] [tail = 0] +* | | +* | ... | <= pending streams +* | | +* +-+ <= cir_start + mask +*/ + while (likely(head != graph->tail)) { + node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); + RTE_ASSERT(node->fence == RTE_GRAPH_FENCE); + objs = node->objs; + rte_prefetch0(objs); + + if (rte_graph_has_stats_feature()) { + start = rte_rdtsc(); + rc = node->process(graph, node, objs, node->idx); + node->total_cycles += rte_rdtsc() - start; + node->total_calls++; + node->total_objs += rc; + } else { + node->process(graph, node, objs, node->idx); + } + node->idx = 0; + head = likely((int32_t)head > 0) ? head & mask : head; + } + graph->tail = 0; +} diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h new file mode 100644 index 00..7ea18ba80a --- /dev/null +++ b/lib/graph/rte_graph_worker.h @@ -0,0 +1,34 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Intel Corporation + */ + +#ifndef _RTE_GRAPH_WORKER_H_ +#define _RTE_GRAPH_WORKER_H_ + +#ifdef __cplusplus +extern "C" { +#endif + +#include "rte_graph_model_rtc.h" + +/** + * Perform graph walk on the circular buffer and invoke the process function + * of the nodes and collect the stats. + * + * @param graph + * Graph pointer returned from rte_graph_lookup function. + * + * @see rte_graph_lookup() + */ +__rte_experimental +static inline void +rte_graph_walk(struct rte_graph *graph) +{ + rte_graph_walk_rtc(graph); +} + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_GRAPH_WORKER_H_ */ diff --git a/lib/graph/
[PATCH v5 04/15] graph: add get/set graph worker model APIs
Add new get/set APIs to configure graph worker model which is used to determine which model will be chosen. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/meson.build | 1 + lib/graph/rte_graph_worker.c| 54 + lib/graph/rte_graph_worker_common.h | 19 ++ lib/graph/version.map | 3 ++ 4 files changed, 77 insertions(+) create mode 100644 lib/graph/rte_graph_worker.c diff --git a/lib/graph/meson.build b/lib/graph/meson.build index 3526d1b5d4..9fab8243da 100644 --- a/lib/graph/meson.build +++ b/lib/graph/meson.build @@ -15,6 +15,7 @@ sources = files( 'graph_stats.c', 'graph_populate.c', 'graph_pcap.c', +'rte_graph_worker.c', ) headers = files('rte_graph.h', 'rte_graph_worker.h') diff --git a/lib/graph/rte_graph_worker.c b/lib/graph/rte_graph_worker.c new file mode 100644 index 00..cabc101262 --- /dev/null +++ b/lib/graph/rte_graph_worker.c @@ -0,0 +1,54 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Intel Corporation + */ + +#include "rte_graph_worker_common.h" + +RTE_DEFINE_PER_LCORE(enum rte_graph_worker_model, worker_model) = RTE_GRAPH_MODEL_DEFAULT; + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * Set the graph worker model + * + * @note This function does not perform any locking, and is only safe to call + *before graph running. + * + * @param name + * Name of the graph worker model. + * + * @return + * 0 on success, -1 otherwise. + */ +int +rte_graph_worker_model_set(enum rte_graph_worker_model model) +{ + if (model >= RTE_GRAPH_MODEL_LIST_END) + goto fail; + + RTE_PER_LCORE(worker_model) = model; + return 0; + +fail: + RTE_PER_LCORE(worker_model) = RTE_GRAPH_MODEL_DEFAULT; + return -1; +} + +/** + * @warning + * @b EXPERIMENTAL: this API may change, or be removed, without prior notice + * + * Get the graph worker model + * + * @param name + * Name of the graph worker model. + * + * @return + * Graph worker model on success. + */ +inline +enum rte_graph_worker_model +rte_graph_worker_model_get(void) +{ + return RTE_PER_LCORE(worker_model); +} diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h index 41428974db..1526da6e2c 100644 --- a/lib/graph/rte_graph_worker_common.h +++ b/lib/graph/rte_graph_worker_common.h @@ -19,6 +19,7 @@ #include #include #include +#include #include #include #include @@ -95,6 +96,16 @@ struct rte_node { struct rte_node *nodes[] __rte_cache_min_aligned; /**< Next nodes. */ } __rte_cache_aligned; +/** Graph worker models */ +enum rte_graph_worker_model { + RTE_GRAPH_MODEL_DEFAULT, + RTE_GRAPH_MODEL_RTC = RTE_GRAPH_MODEL_DEFAULT, + RTE_GRAPH_MODEL_MCORE_DISPATCH, + RTE_GRAPH_MODEL_LIST_END +}; + +RTE_DECLARE_PER_LCORE(enum rte_graph_worker_model, worker_model); + /** * @internal * @@ -490,6 +501,14 @@ rte_node_next_stream_move(struct rte_graph *graph, struct rte_node *src, } } +__rte_experimental +enum rte_graph_worker_model +rte_graph_worker_model_get(void); + +__rte_experimental +int +rte_graph_worker_model_set(enum rte_graph_worker_model model); + #ifdef __cplusplus } #endif diff --git a/lib/graph/version.map b/lib/graph/version.map index 13b838752d..eea73ec9ca 100644 --- a/lib/graph/version.map +++ b/lib/graph/version.map @@ -43,5 +43,8 @@ EXPERIMENTAL { rte_node_next_stream_put; rte_node_next_stream_move; + rte_graph_worker_model_set; + rte_graph_worker_model_get; + local: *; }; -- 2.37.2
[PATCH v5 05/15] graph: introduce graph node core affinity API
Add lcore_id for node to hold affinity core id and impl rte_graph_model_dispatch_lcore_affinity_set to set node affinity with specific lcore. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph_private.h| 1 + lib/graph/meson.build| 1 + lib/graph/node.c | 1 + lib/graph/rte_graph_model_dispatch.c | 31 lib/graph/rte_graph_model_dispatch.h | 43 lib/graph/version.map| 2 ++ 6 files changed, 79 insertions(+) create mode 100644 lib/graph/rte_graph_model_dispatch.c create mode 100644 lib/graph/rte_graph_model_dispatch.h diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index 7d1b30b8ac..409eed3284 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -50,6 +50,7 @@ struct node { STAILQ_ENTRY(node) next; /**< Next node in the list. */ char name[RTE_NODE_NAMESIZE]; /**< Name of the node. */ uint64_t flags; /**< Node configuration flag. */ + unsigned int lcore_id;/**< Node runs on the Lcore ID */ rte_node_process_t process; /**< Node process function. */ rte_node_init_t init; /**< Node init function. */ rte_node_fini_t fini; /**< Node fini function. */ diff --git a/lib/graph/meson.build b/lib/graph/meson.build index 9fab8243da..c729d984b6 100644 --- a/lib/graph/meson.build +++ b/lib/graph/meson.build @@ -16,6 +16,7 @@ sources = files( 'graph_populate.c', 'graph_pcap.c', 'rte_graph_worker.c', +'rte_graph_model_dispatch.c', ) headers = files('rte_graph.h', 'rte_graph_worker.h') diff --git a/lib/graph/node.c b/lib/graph/node.c index 149414dcd9..339b4a0da5 100644 --- a/lib/graph/node.c +++ b/lib/graph/node.c @@ -100,6 +100,7 @@ __rte_node_register(const struct rte_node_register *reg) goto free; } + node->lcore_id = RTE_MAX_LCORE; node->id = node_id++; /* Add the node at tail */ diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c new file mode 100644 index 00..4a2f99496d --- /dev/null +++ b/lib/graph/rte_graph_model_dispatch.c @@ -0,0 +1,31 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Intel Corporation + */ + +#include "graph_private.h" +#include "rte_graph_model_dispatch.h" + +int +rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id) +{ + struct node *node; + int ret = -EINVAL; + + if (lcore_id >= RTE_MAX_LCORE) + return ret; + + graph_spinlock_lock(); + + STAILQ_FOREACH(node, node_list_head_get(), next) { + if (strncmp(node->name, name, RTE_NODE_NAMESIZE) == 0) { + node->lcore_id = lcore_id; + ret = 0; + break; + } + } + + graph_spinlock_unlock(); + + return ret; +} + diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h new file mode 100644 index 00..179624e972 --- /dev/null +++ b/lib/graph/rte_graph_model_dispatch.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(C) 2023 Intel Corporation + */ + +#ifndef _RTE_GRAPH_MODEL_DISPATCH_H_ +#define _RTE_GRAPH_MODEL_DISPATCH_H_ + +/** + * @file rte_graph_model_dispatch.h + * + * @warning + * @b EXPERIMENTAL: + * All functions in this file may be changed or removed without prior notice. + * + * This API allows to set core affinity with the node. + */ +#include "rte_graph_worker_common.h" + +#ifdef __cplusplus +extern "C" { +#endif + +/** + * Set lcore affinity with the node. + * + * @param name + * Valid node name. In the case of the cloned node, the name will be + * "parent node name" + "-" + name. + * @param lcore_id + * The lcore ID value. + * + * @return + * 0 on success, error otherwise. + */ +__rte_experimental +int rte_graph_model_dispatch_lcore_affinity_set(const char *name, + unsigned int lcore_id); + +#ifdef __cplusplus +} +#endif + +#endif /* _RTE_GRAPH_MODEL_DISPATCH_H_ */ diff --git a/lib/graph/version.map b/lib/graph/version.map index eea73ec9ca..1f090be74e 100644 --- a/lib/graph/version.map +++ b/lib/graph/version.map @@ -46,5 +46,7 @@ EXPERIMENTAL { rte_graph_worker_model_set; rte_graph_worker_model_get; + rte_graph_model_dispatch_lcore_affinity_set; + local: *; }; -- 2.37.2
[PATCH v5 06/15] graph: introduce graph bind unbind API
Add lcore_id for graph to hold affinity core id where graph would run on. Add bind/unbind API to set/unset graph affinity attribute. lcore_id will be set as MAX by default, it means not enable this attribute. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph.c | 59 +++ lib/graph/graph_private.h | 2 ++ lib/graph/rte_graph.h | 22 +++ lib/graph/version.map | 2 ++ 4 files changed, 85 insertions(+) diff --git a/lib/graph/graph.c b/lib/graph/graph.c index a839a2803b..b39a99aac6 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -254,6 +254,64 @@ graph_mem_fixup_secondary(struct rte_graph *graph) return graph_mem_fixup_node_ctx(graph); } +static __rte_always_inline bool +graph_src_node_avail(struct graph *graph) +{ + struct graph_node *graph_node; + + STAILQ_FOREACH(graph_node, &graph->node_list, next) + if ((graph_node->node->flags & RTE_NODE_SOURCE_F) && + (graph_node->node->lcore_id == RTE_MAX_LCORE || +graph->lcore_id == graph_node->node->lcore_id)) + return true; + + return false; +} + +int +rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore) +{ + struct graph *graph; + + GRAPH_ID_CHECK(id); + if (!rte_lcore_is_enabled(lcore)) + SET_ERR_JMP(ENOLINK, fail, + "lcore %d not enabled\n", + lcore); + + STAILQ_FOREACH(graph, &graph_list, next) + if (graph->id == id) + break; + + graph->lcore_id = lcore; + graph->socket = rte_lcore_to_socket_id(lcore); + + /* check the availability of source node */ + if (!graph_src_node_avail(graph)) + graph->graph->head = 0; + + return 0; + +fail: + return -rte_errno; +} + +void +rte_graph_model_dispatch_core_unbind(rte_graph_t id) +{ + struct graph *graph; + + GRAPH_ID_CHECK(id); + STAILQ_FOREACH(graph, &graph_list, next) + if (graph->id == id) + break; + + graph->lcore_id = RTE_MAX_LCORE; + +fail: + return; +} + struct rte_graph * rte_graph_lookup(const char *name) { @@ -340,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm) graph->src_node_count = src_node_count; graph->node_count = graph_nodes_count(graph); graph->id = graph_id; + graph->lcore_id = RTE_MAX_LCORE; graph->num_pkt_to_capture = prm->num_pkt_to_capture; if (prm->pcap_filename) rte_strscpy(graph->pcap_filename, prm->pcap_filename, RTE_GRAPH_PCAP_FILE_SZ); diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index 409eed3284..ad1d058945 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -98,6 +98,8 @@ struct graph { /**< Circular buffer mask for wrap around. */ rte_graph_t id; /**< Graph identifier. */ + unsigned int lcore_id; + /**< Lcore identifier where the graph prefer to run on. */ size_t mem_sz; /**< Memory size of the graph. */ int socket; diff --git a/lib/graph/rte_graph.h b/lib/graph/rte_graph.h index c9a77297fc..c523809d1f 100644 --- a/lib/graph/rte_graph.h +++ b/lib/graph/rte_graph.h @@ -285,6 +285,28 @@ char *rte_graph_id_to_name(rte_graph_t id); __rte_experimental int rte_graph_export(const char *name, FILE *f); +/** + * Bind graph with specific lcore + * + * @param id + * Graph id to get the pointer of graph object + * @param lcore + * The lcore where the graph will run on + * @return + * 0 on success, error otherwise. + */ +__rte_experimental +int rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore); + +/** + * Unbind graph with lcore + * + * @param id + * Graph id to get the pointer of graph object + */ +__rte_experimental +void rte_graph_model_dispatch_core_unbind(rte_graph_t id); + /** * Get graph object from its name. * diff --git a/lib/graph/version.map b/lib/graph/version.map index 1f090be74e..7de6f08f59 100644 --- a/lib/graph/version.map +++ b/lib/graph/version.map @@ -18,6 +18,8 @@ EXPERIMENTAL { rte_graph_node_get_by_name; rte_graph_obj_dump; rte_graph_walk; + rte_graph_model_dispatch_core_bind; + rte_graph_model_dispatch_core_unbind; rte_graph_cluster_stats_create; rte_graph_cluster_stats_destroy; -- 2.37.2
[PATCH v5 07/15] graph: introduce graph clone API for other worker core
This patch adds graph API for supporting to clone the graph object for a specified worker core. The new graph will also clone all nodes. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph.c | 110 ++ lib/graph/graph_private.h | 2 + lib/graph/rte_graph.h | 20 +++ lib/graph/version.map | 1 + 4 files changed, 133 insertions(+) diff --git a/lib/graph/graph.c b/lib/graph/graph.c index b39a99aac6..90eaad0378 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -398,6 +398,7 @@ rte_graph_create(const char *name, struct rte_graph_param *prm) graph->src_node_count = src_node_count; graph->node_count = graph_nodes_count(graph); graph->id = graph_id; + graph->parent_id = RTE_GRAPH_ID_INVALID; graph->lcore_id = RTE_MAX_LCORE; graph->num_pkt_to_capture = prm->num_pkt_to_capture; if (prm->pcap_filename) @@ -462,6 +463,115 @@ rte_graph_destroy(rte_graph_t id) return rc; } +static int +clone_name(struct graph *graph, struct graph *parent_graph, const char *name) +{ + ssize_t sz, rc; + +#define SZ RTE_GRAPH_NAMESIZE + rc = rte_strscpy(graph->name, parent_graph->name, SZ); + if (rc < 0) + goto fail; + sz = rc; + rc = rte_strscpy(graph->name + sz, "-", RTE_MAX((int16_t)(SZ - sz), 0)); + if (rc < 0) + goto fail; + sz += rc; + sz = rte_strscpy(graph->name + sz, name, RTE_MAX((int16_t)(SZ - sz), 0)); + if (sz < 0) + goto fail; + + return 0; +fail: + rte_errno = E2BIG; + return -rte_errno; +} + +static rte_graph_t +graph_clone(struct graph *parent_graph, const char *name) +{ + struct graph_node *graph_node; + struct graph *graph; + + graph_spinlock_lock(); + + /* Don't allow to clone a node from a cloned graph */ + if (parent_graph->parent_id != RTE_GRAPH_ID_INVALID) + SET_ERR_JMP(EEXIST, fail, "A cloned graph is not allowed to be cloned"); + + /* Create graph object */ + graph = calloc(1, sizeof(*graph)); + if (graph == NULL) + SET_ERR_JMP(ENOMEM, fail, "Failed to calloc cloned graph object"); + + /* Naming ceremony of the new graph. name is node->name + "-" + name */ + if (clone_name(graph, parent_graph, name)) + goto free; + + /* Check for existence of duplicate graph */ + if (rte_graph_from_name(graph->name) != RTE_GRAPH_ID_INVALID) + SET_ERR_JMP(EEXIST, free, "Found duplicate graph %s", + graph->name); + + /* Clone nodes from parent graph firstly */ + STAILQ_INIT(&graph->node_list); + STAILQ_FOREACH(graph_node, &parent_graph->node_list, next) { + if (graph_node_add(graph, graph_node->node)) + goto graph_cleanup; + } + + /* Just update adjacency list of all nodes in the graph */ + if (graph_adjacency_list_update(graph)) + goto graph_cleanup; + + /* Initialize the graph object */ + graph->src_node_count = parent_graph->src_node_count; + graph->node_count = parent_graph->node_count; + graph->parent_id = parent_graph->id; + graph->lcore_id = parent_graph->lcore_id; + graph->socket = parent_graph->socket; + graph->id = graph_id; + + /* Allocate the Graph fast path memory and populate the data */ + if (graph_fp_mem_create(graph)) + goto graph_cleanup; + + /* Call init() of the all the nodes in the graph */ + if (graph_node_init(graph)) + goto graph_mem_destroy; + + /* All good, Lets add the graph to the list */ + graph_id++; + STAILQ_INSERT_TAIL(&graph_list, graph, next); + + graph_spinlock_unlock(); + return graph->id; + +graph_mem_destroy: + graph_fp_mem_destroy(graph); +graph_cleanup: + graph_cleanup(graph); +free: + free(graph); +fail: + graph_spinlock_unlock(); + return RTE_GRAPH_ID_INVALID; +} + +rte_graph_t +rte_graph_clone(rte_graph_t id, const char *name) +{ + struct graph *graph; + + GRAPH_ID_CHECK(id); + STAILQ_FOREACH(graph, &graph_list, next) + if (graph->id == id) + return graph_clone(graph, name); + +fail: + return RTE_GRAPH_ID_INVALID; +} + rte_graph_t rte_graph_from_name(const char *name) { diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index ad1d058945..d28a5af93e 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -98,6 +98,8 @@ struct graph { /**< Circular buffer mask for wrap around. */ rte_graph_t id; /**< Graph identifier. */ + rte_graph_t parent_id; + /**< Parent graph identifier. */ unsigned int lcore_id; /**< Lcore identifier where the graph prefer to run on. *
[PATCH v5 08/15] graph: add struct for stream moving between cores
Add graph_sched_wq_node to hold graph scheduling workqueue node. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph.c | 1 + lib/graph/graph_populate.c | 1 + lib/graph/graph_private.h | 12 lib/graph/rte_graph_worker_common.h | 21 + 4 files changed, 35 insertions(+) diff --git a/lib/graph/graph.c b/lib/graph/graph.c index 90eaad0378..dd3d69dbf7 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -284,6 +284,7 @@ rte_graph_model_dispatch_core_bind(rte_graph_t id, int lcore) break; graph->lcore_id = lcore; + graph->graph->lcore_id = graph->lcore_id; graph->socket = rte_lcore_to_socket_id(lcore); /* check the availability of source node */ diff --git a/lib/graph/graph_populate.c b/lib/graph/graph_populate.c index 2c0844ce92..7dcf1420c1 100644 --- a/lib/graph/graph_populate.c +++ b/lib/graph/graph_populate.c @@ -89,6 +89,7 @@ graph_nodes_populate(struct graph *_graph) } node->id = graph_node->node->id; node->parent_id = pid; + node->lcore_id = graph_node->node->lcore_id; nb_edges = graph_node->node->nb_edges; node->nb_edges = nb_edges; off += sizeof(struct rte_node); diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index d28a5af93e..b66b18ebbc 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -60,6 +60,18 @@ struct node { char next_nodes[][RTE_NODE_NAMESIZE]; /**< Names of next nodes. */ }; +/** + * @internal + * + * Structure that holds the graph scheduling workqueue node stream. + * Used for mcore dispatch model. + */ +struct graph_sched_wq_node { + rte_graph_off_t node_off; + uint16_t nb_objs; + void *objs[RTE_GRAPH_BURST_SIZE]; +} __rte_cache_aligned; + /** * @internal * diff --git a/lib/graph/rte_graph_worker_common.h b/lib/graph/rte_graph_worker_common.h index 1526da6e2c..dc0a0b5554 100644 --- a/lib/graph/rte_graph_worker_common.h +++ b/lib/graph/rte_graph_worker_common.h @@ -30,6 +30,13 @@ extern "C" { #endif +/** + * @internal + * + * Singly-linked list head for graph schedule run-queue. + */ +SLIST_HEAD(rte_graph_rq_head, rte_graph); + /** * @internal * @@ -41,6 +48,15 @@ struct rte_graph { uint32_t cir_mask; /**< Circular buffer wrap around mask. */ rte_node_t nb_nodes; /**< Number of nodes in the graph. */ rte_graph_off_t *cir_start; /**< Pointer to circular buffer. */ + /* Graph schedule */ + struct rte_graph_rq_head *rq __rte_cache_aligned; /* The run-queue */ + struct rte_graph_rq_head rq_head; /* The head for run-queue list */ + + SLIST_ENTRY(rte_graph) rq_next; /* The next for run-queue list */ + unsigned int lcore_id; /**< The graph running Lcore. */ + struct rte_ring *wq;/**< The work-queue for pending streams. */ + struct rte_mempool *mp; /**< The mempool for scheduling streams. */ + /* Graph schedule area */ rte_graph_off_t nodes_start; /**< Offset at which node memory starts. */ rte_graph_t id; /**< Graph identifier. */ int socket; /**< Socket ID where memory is allocated. */ @@ -74,6 +90,11 @@ struct rte_node { /** Original process function when pcap is enabled. */ rte_node_process_t original_process; + RTE_STD_C11 + union { + /* Fast schedule area for mcore dispatch model */ + unsigned int lcore_id; /**< Node running lcore. */ + }; /* Fast path area */ #define RTE_NODE_CTX_SZ 16 uint8_t ctx[RTE_NODE_CTX_SZ] __rte_cache_aligned; /**< Node Context. */ -- 2.37.2
[PATCH v5 09/15] graph: introduce stream moving cross cores
This patch introduces key functions to allow a worker thread to enable enqueue and move streams of objects to the next nodes over different cores. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph_private.h| 27 + lib/graph/meson.build| 2 +- lib/graph/rte_graph_model_dispatch.c | 145 +++ lib/graph/rte_graph_model_dispatch.h | 37 +++ lib/graph/version.map| 2 + 5 files changed, 212 insertions(+), 1 deletion(-) diff --git a/lib/graph/graph_private.h b/lib/graph/graph_private.h index b66b18ebbc..e1a2a4bfd8 100644 --- a/lib/graph/graph_private.h +++ b/lib/graph/graph_private.h @@ -366,4 +366,31 @@ void graph_dump(FILE *f, struct graph *g); */ void node_dump(FILE *f, struct node *n); +/** + * @internal + * + * Create the graph schedule work queue. And all cloned graphs attached to the + * parent graph MUST be destroyed together for fast schedule design limitation. + * + * @param _graph + * The graph object + * @param _parent_graph + * The parent graph object which holds the run-queue head. + * + * @return + * - 0: Success. + * - <0: Graph schedule work queue related error. + */ +int graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph); + +/** + * @internal + * + * Destroy the graph schedule work queue. + * + * @param _graph + * The graph object + */ +void graph_sched_wq_destroy(struct graph *_graph); + #endif /* _RTE_GRAPH_PRIVATE_H_ */ diff --git a/lib/graph/meson.build b/lib/graph/meson.build index c729d984b6..e21affa280 100644 --- a/lib/graph/meson.build +++ b/lib/graph/meson.build @@ -20,4 +20,4 @@ sources = files( ) headers = files('rte_graph.h', 'rte_graph_worker.h') -deps += ['eal', 'pcapng'] +deps += ['eal', 'pcapng', 'mempool', 'ring'] diff --git a/lib/graph/rte_graph_model_dispatch.c b/lib/graph/rte_graph_model_dispatch.c index 4a2f99496d..a300fefb85 100644 --- a/lib/graph/rte_graph_model_dispatch.c +++ b/lib/graph/rte_graph_model_dispatch.c @@ -5,6 +5,151 @@ #include "graph_private.h" #include "rte_graph_model_dispatch.h" +int +graph_sched_wq_create(struct graph *_graph, struct graph *_parent_graph) +{ + struct rte_graph *parent_graph = _parent_graph->graph; + struct rte_graph *graph = _graph->graph; + unsigned int wq_size; + + wq_size = GRAPH_SCHED_WQ_SIZE(graph->nb_nodes); + wq_size = rte_align32pow2(wq_size + 1); + + graph->wq = rte_ring_create(graph->name, wq_size, graph->socket, + RING_F_SC_DEQ); + if (graph->wq == NULL) + SET_ERR_JMP(EIO, fail, "Failed to allocate graph WQ"); + + graph->mp = rte_mempool_create(graph->name, wq_size, + sizeof(struct graph_sched_wq_node), + 0, 0, NULL, NULL, NULL, NULL, + graph->socket, MEMPOOL_F_SP_PUT); + if (graph->mp == NULL) + SET_ERR_JMP(EIO, fail_mp, + "Failed to allocate graph WQ schedule entry"); + + graph->lcore_id = _graph->lcore_id; + + if (parent_graph->rq == NULL) { + parent_graph->rq = &parent_graph->rq_head; + SLIST_INIT(parent_graph->rq); + } + + graph->rq = parent_graph->rq; + SLIST_INSERT_HEAD(graph->rq, graph, rq_next); + + return 0; + +fail_mp: + rte_ring_free(graph->wq); + graph->wq = NULL; +fail: + return -rte_errno; +} + +void +graph_sched_wq_destroy(struct graph *_graph) +{ + struct rte_graph *graph = _graph->graph; + + if (graph == NULL) + return; + + rte_ring_free(graph->wq); + graph->wq = NULL; + + rte_mempool_free(graph->mp); + graph->mp = NULL; +} + +static __rte_always_inline bool +__graph_sched_node_enqueue(struct rte_node *node, struct rte_graph *graph) +{ + struct graph_sched_wq_node *wq_node; + uint16_t off = 0; + uint16_t size; + +submit_again: + if (rte_mempool_get(graph->mp, (void **)&wq_node) < 0) + goto fallback; + + size = RTE_MIN(node->idx, RTE_DIM(wq_node->objs)); + wq_node->node_off = node->off; + wq_node->nb_objs = size; + rte_memcpy(wq_node->objs, &node->objs[off], size * sizeof(void *)); + + while (rte_ring_mp_enqueue_bulk_elem(graph->wq, (void *)&wq_node, + sizeof(wq_node), 1, NULL) == 0) + rte_pause(); + + off += size; + node->idx -= size; + if (node->idx > 0) + goto submit_again; + + return true; + +fallback: + if (off != 0) + memmove(&node->objs[0], &node->objs[off], + node->idx * sizeof(void *)); + + return false; +} + +bool __rte_noinline +__rte_graph_sched_node_enqueue(struct rte_node *node, + struct rte_graph_rq_hea
[PATCH v5 10/15] graph: enable create and destroy graph scheduling workqueue
This patch enables to create and destroy scheduling workqueue into common graph operations. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph.c | 9 + 1 file changed, 9 insertions(+) diff --git a/lib/graph/graph.c b/lib/graph/graph.c index dd3d69dbf7..1f1ee9b622 100644 --- a/lib/graph/graph.c +++ b/lib/graph/graph.c @@ -443,6 +443,10 @@ rte_graph_destroy(rte_graph_t id) while (graph != NULL) { tmp = STAILQ_NEXT(graph, next); if (graph->id == id) { + /* Destroy the schedule work queue if has */ + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) + graph_sched_wq_destroy(graph); + /* Call fini() of the all the nodes in the graph */ graph_node_fini(graph); /* Destroy graph fast path memory */ @@ -537,6 +541,11 @@ graph_clone(struct graph *parent_graph, const char *name) if (graph_fp_mem_create(graph)) goto graph_cleanup; + /* Create the graph schedule work queue */ + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH && + graph_sched_wq_create(graph, parent_graph)) + goto graph_mem_destroy; + /* Call init() of the all the nodes in the graph */ if (graph_node_init(graph)) goto graph_mem_destroy; -- 2.37.2
[PATCH v5 11/15] graph: introduce graph walk by cross-core dispatch
This patch introduces the task scheduler mechanism to enable dispatching tasks to another worker cores. Currently, there is only a local work queue for one graph to walk. We introduce a scheduler worker queue in each worker core for dispatching tasks. It will perform the walk on scheduler work queue first, then handle the local work queue. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/rte_graph_model_dispatch.h | 42 1 file changed, 42 insertions(+) diff --git a/lib/graph/rte_graph_model_dispatch.h b/lib/graph/rte_graph_model_dispatch.h index 18fa7ce0ab..65b2cc6d87 100644 --- a/lib/graph/rte_graph_model_dispatch.h +++ b/lib/graph/rte_graph_model_dispatch.h @@ -73,6 +73,48 @@ __rte_experimental int rte_graph_model_dispatch_lcore_affinity_set(const char *name, unsigned int lcore_id); +/** + * Perform graph walk on the circular buffer and invoke the process function + * of the nodes and collect the stats. + * + * @param graph + * Graph pointer returned from rte_graph_lookup function. + * + * @see rte_graph_lookup() + */ +__rte_experimental +static inline void +rte_graph_walk_mcore_dispatch(struct rte_graph *graph) +{ + const rte_graph_off_t *cir_start = graph->cir_start; + const rte_node_t mask = graph->cir_mask; + uint32_t head = graph->head; + struct rte_node *node; + + if (graph->wq != NULL) + __rte_graph_sched_wq_process(graph); + + while (likely(head != graph->tail)) { + node = (struct rte_node *)RTE_PTR_ADD(graph, cir_start[(int32_t)head++]); + + /* skip the src nodes which not bind with current worker */ + if ((int32_t)head < 0 && node->lcore_id != graph->lcore_id) + continue; + + /* Schedule the node until all task/objs are done */ + if (node->lcore_id != RTE_MAX_LCORE && + graph->lcore_id != node->lcore_id && graph->rq != NULL && + __rte_graph_sched_node_enqueue(node, graph->rq)) + continue; + + __rte_node_process(graph, node); + + head = likely((int32_t)head > 0) ? head & mask : head; + } + + graph->tail = 0; +} + #ifdef __cplusplus } #endif -- 2.37.2
[PATCH v5 12/15] graph: enable graph multicore dispatch scheduler model
This patch enables to chose new scheduler model. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/rte_graph_worker.h | 9 - 1 file changed, 8 insertions(+), 1 deletion(-) diff --git a/lib/graph/rte_graph_worker.h b/lib/graph/rte_graph_worker.h index 7ea18ba80a..d608c7513e 100644 --- a/lib/graph/rte_graph_worker.h +++ b/lib/graph/rte_graph_worker.h @@ -10,6 +10,7 @@ extern "C" { #endif #include "rte_graph_model_rtc.h" +#include "rte_graph_model_dispatch.h" /** * Perform graph walk on the circular buffer and invoke the process function @@ -24,7 +25,13 @@ __rte_experimental static inline void rte_graph_walk(struct rte_graph *graph) { - rte_graph_walk_rtc(graph); + int model = rte_graph_worker_model_get(); + + if (model == RTE_GRAPH_MODEL_DEFAULT || + model == RTE_GRAPH_MODEL_RTC) + rte_graph_walk_rtc(graph); + else if (model == RTE_GRAPH_MODEL_MCORE_DISPATCH) + rte_graph_walk_mcore_dispatch(graph); } #ifdef __cplusplus -- 2.37.2
[PATCH v5 13/15] graph: add stats for cross-core dispatching
Add stats for cross-core dispatching scheduler if stats collection is enabled. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- lib/graph/graph_debug.c | 6 +++ lib/graph/graph_stats.c | 74 +--- lib/graph/rte_graph.h| 2 + lib/graph/rte_graph_model_dispatch.c | 3 ++ lib/graph/rte_graph_worker_common.h | 2 + 5 files changed, 79 insertions(+), 8 deletions(-) diff --git a/lib/graph/graph_debug.c b/lib/graph/graph_debug.c index b84412f5dd..7dcf07b080 100644 --- a/lib/graph/graph_debug.c +++ b/lib/graph/graph_debug.c @@ -74,6 +74,12 @@ rte_graph_obj_dump(FILE *f, struct rte_graph *g, bool all) fprintf(f, " size=%d\n", n->size); fprintf(f, " idx=%d\n", n->idx); fprintf(f, " total_objs=%" PRId64 "\n", n->total_objs); + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) { + fprintf(f, " total_sched_objs=%" PRId64 "\n", + n->total_sched_objs); + fprintf(f, " total_sched_fail=%" PRId64 "\n", + n->total_sched_fail); + } fprintf(f, " total_calls=%" PRId64 "\n", n->total_calls); for (i = 0; i < n->nb_edges; i++) fprintf(f, " edge[%d] <%s>\n", i, diff --git a/lib/graph/graph_stats.c b/lib/graph/graph_stats.c index c0140ba922..aa22cc403c 100644 --- a/lib/graph/graph_stats.c +++ b/lib/graph/graph_stats.c @@ -40,13 +40,19 @@ struct rte_graph_cluster_stats { struct cluster_node clusters[]; } __rte_cache_aligned; +#define boarder_model_dispatch() \ + fprintf(f, "+---+---+" \ + "---+---+---+---+" \ + "---+---+-" \ + "--+\n") + #define boarder() \ fprintf(f, "+---+---+" \ "---+---+---+---+-" \ "--+\n") static inline void -print_banner(FILE *f) +print_banner_default(FILE *f) { boarder(); fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s\n", "|Node", "|calls", @@ -55,6 +61,27 @@ print_banner(FILE *f) boarder(); } +static inline void +print_banner_dispatch(FILE *f) +{ + boarder_model_dispatch(); + fprintf(f, "%-32s%-16s%-16s%-16s%-16s%-16s%-16s%-16s%-16s\n", + "|Node", "|calls", + "|objs", "|sched objs", "|sched fail", + "|realloc_count", "|objs/call", "|objs/sec(10E6)", + "|cycles/call|"); + boarder_model_dispatch(); +} + +static inline void +print_banner(FILE *f) +{ + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) + print_banner_dispatch(f); + else + print_banner_default(f); +} + static inline void print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat) { @@ -76,11 +103,21 @@ print_node(FILE *f, const struct rte_graph_cluster_node_stats *stat) objs_per_sec = ts_per_hz ? (objs - prev_objs) / ts_per_hz : 0; objs_per_sec /= 100; - fprintf(f, - "|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64 - "|%-15.3f|%-15.6f|%-11.4f|\n", - stat->name, calls, objs, stat->realloc_count, objs_per_call, - objs_per_sec, cycles_per_call); + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH) { + fprintf(f, + "|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64 + "|%-15" PRIu64 "|%-15" PRIu64 + "|%-15.3f|%-15.6f|%-11.4f|\n", + stat->name, calls, objs, stat->sched_objs, + stat->sched_fail, stat->realloc_count, objs_per_call, + objs_per_sec, cycles_per_call); + } else { + fprintf(f, + "|%-31s|%-15" PRIu64 "|%-15" PRIu64 "|%-15" PRIu64 + "|%-15.3f|%-15.6f|%-11.4f|\n", + stat->name, calls, objs, stat->realloc_count, objs_per_call, + objs_per_sec, cycles_per_call); + } } static int @@ -88,13 +125,20 @@ graph_cluster_stats_cb(bool is_first, bool is_last, void *cookie, const struct rte_graph_cluster_node_stats *stat) { FILE *f = cookie; + int model; + + model = rte_graph_worker_model_get(); if (unlikely(is_first)) print_banner(f); if (stat->objs) print_node(f, stat); -
[PATCH v5 14/15] examples/l3fwd-graph: introduce multicore dispatch worker model
Add new parameter "model" to choose dispatch or rtc worker model. And in dispatch model, the node will affinity to worker core successively. Note: only support one RX node for remote model in current implementation. ./dpdk-l3fwd-graph -l 8,9,10,11 -n 4 -- -p 0x1 --config="(0,0,9)" -P --model="dispatch" Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- examples/l3fwd-graph/main.c | 236 +--- 1 file changed, 194 insertions(+), 42 deletions(-) diff --git a/examples/l3fwd-graph/main.c b/examples/l3fwd-graph/main.c index 5feeab4f0f..7078ed4c77 100644 --- a/examples/l3fwd-graph/main.c +++ b/examples/l3fwd-graph/main.c @@ -55,6 +55,9 @@ #define NB_SOCKETS 8 +/* Graph module */ +#define WORKER_MODEL_RTC "rtc" +#define WORKER_MODEL_MCORE_DISPATCH "dispatch" /* Static global variables used within this file. */ static uint16_t nb_rxd = RX_DESC_DEFAULT; static uint16_t nb_txd = TX_DESC_DEFAULT; @@ -88,6 +91,10 @@ struct lcore_rx_queue { char node_name[RTE_NODE_NAMESIZE]; }; +struct model_conf { + enum rte_graph_worker_model model; +}; + /* Lcore conf */ struct lcore_conf { uint16_t n_rx_queue; @@ -153,6 +160,19 @@ static struct ipv4_l3fwd_lpm_route ipv4_l3fwd_lpm_route_array[] = { {RTE_IPV4(198, 18, 6, 0), 24, 6}, {RTE_IPV4(198, 18, 7, 0), 24, 7}, }; +static int +check_worker_model_params(void) +{ + if (rte_graph_worker_model_get() == RTE_GRAPH_MODEL_MCORE_DISPATCH && + nb_lcore_params > 1) { + printf("Exceeded max number of lcore params for remote model: %hu\n", + nb_lcore_params); + return -1; + } + + return 0; +} + static int check_lcore_params(void) { @@ -276,6 +296,7 @@ print_usage(const char *prgname) " --eth-dest=X,MM:MM:MM:MM:MM:MM: Ethernet destination for " "port X\n" " --max-pkt-len PKTLEN: maximum packet length in decimal (64-9600)\n" + " --model NAME: walking model name, dispatch or rtc(by default)\n" " --no-numa: Disable numa awareness\n" " --per-port-pool: Use separate buffer pool per port\n" " --pcap-enable: Enables pcap capture\n" @@ -318,6 +339,20 @@ parse_max_pkt_len(const char *pktlen) return len; } +static int +parse_worker_model(const char *model) +{ + if (strcmp(model, WORKER_MODEL_MCORE_DISPATCH) == 0) { + rte_graph_worker_model_set(RTE_GRAPH_MODEL_MCORE_DISPATCH); + return RTE_GRAPH_MODEL_MCORE_DISPATCH; + } else if (strcmp(model, WORKER_MODEL_RTC) == 0) + return RTE_GRAPH_MODEL_RTC; + + rte_exit(EXIT_FAILURE, "Invalid worker model: %s", model); + + return RTE_GRAPH_MODEL_LIST_END; +} + static int parse_portmask(const char *portmask) { @@ -434,6 +469,8 @@ static const char short_options[] = "p:" /* portmask */ #define CMD_LINE_OPT_PCAP_ENABLE "pcap-enable" #define CMD_LINE_OPT_NUM_PKT_CAP "pcap-num-cap" #define CMD_LINE_OPT_PCAP_FILENAME "pcap-file-name" +#define CMD_LINE_OPT_WORKER_MODEL "model" + enum { /* Long options mapped to a short option */ @@ -449,6 +486,7 @@ enum { CMD_LINE_OPT_PARSE_PCAP_ENABLE, CMD_LINE_OPT_PARSE_NUM_PKT_CAP, CMD_LINE_OPT_PCAP_FILENAME_CAP, + CMD_LINE_OPT_WORKER_MODEL_TYPE, }; static const struct option lgopts[] = { @@ -460,6 +498,7 @@ static const struct option lgopts[] = { {CMD_LINE_OPT_PCAP_ENABLE, 0, 0, CMD_LINE_OPT_PARSE_PCAP_ENABLE}, {CMD_LINE_OPT_NUM_PKT_CAP, 1, 0, CMD_LINE_OPT_PARSE_NUM_PKT_CAP}, {CMD_LINE_OPT_PCAP_FILENAME, 1, 0, CMD_LINE_OPT_PCAP_FILENAME_CAP}, + {CMD_LINE_OPT_WORKER_MODEL, 1, 0, CMD_LINE_OPT_WORKER_MODEL_TYPE}, {NULL, 0, 0, 0}, }; @@ -551,6 +590,11 @@ parse_args(int argc, char **argv) printf("Pcap file name: %s\n", pcap_filename); break; + case CMD_LINE_OPT_WORKER_MODEL_TYPE: + printf("Use new worker model: %s\n", optarg); + parse_worker_model(optarg); + break; + default: print_usage(prgname); return -1; @@ -726,15 +770,15 @@ print_stats(void) static int graph_main_loop(void *conf) { + struct model_conf *mconf = conf; struct lcore_conf *qconf; struct rte_graph *graph; uint32_t lcore_id; - RTE_SET_USED(conf); - lcore_id = rte_lcore_id(); qconf = &lcore_conf[lcore_id]; graph = qconf->graph; + rte_graph_worker_model_set(mconf->model); if (!graph) { RTE_LOG(INFO, L3FWD_GRAPH, "Lcore %u has nothing to do\n", @@ -788,6 +832,139 @@ config_port_max_pkt_len(struct rte_eth_conf *conf, return 0; } +static void +graph_config_mcore_dispatch(struct rte_graph_param gra
[PATCH v5 15/15] doc: update multicore dispatch model in graph guides
Update graph documentation to introduce new multicore dispatch model. Signed-off-by: Haiyue Wang Signed-off-by: Cunming Liang Signed-off-by: Zhirun Yan --- doc/guides/prog_guide/graph_lib.rst | 59 +++-- 1 file changed, 55 insertions(+), 4 deletions(-) diff --git a/doc/guides/prog_guide/graph_lib.rst b/doc/guides/prog_guide/graph_lib.rst index 1cfdc86433..72e26f3a5a 100644 --- a/doc/guides/prog_guide/graph_lib.rst +++ b/doc/guides/prog_guide/graph_lib.rst @@ -189,14 +189,65 @@ In the above example, A graph object will be created with ethdev Rx node of port 0 and queue 0, all ipv4* nodes in the system, and ethdev tx node of all ports. -Multicore graph processing -~~ -In the current graph library implementation, specifically, -``rte_graph_walk()`` and ``rte_node_enqueue*`` fast path API functions +graph model chossing + +Currently, there are 2 different walking model. Use +``rte_graph_worker_model_set()`` to set the walking model. + +RTC (Run-To-Completion) +^^^ +This is the default graph walking model. specifically, +``rte_graph_walk_rtc()`` and ``rte_node_enqueue*`` fast path API functions are designed to work on single-core to have better performance. The fast path API works on graph object, So the multi-core graph processing strategy would be to create graph object PER WORKER. +Example: + +Graph: node-0 -> node-1 -> node-2 @Core0. + +.. code-block:: diff + ++ - - - - - - - - - - - - - - - - - - - - - + +' Core #0 ' +' ' +' ++ +-+ ++ ' +' | Node-0 | --> | Node-1 | --> | Node-2 | ' +' ++ +-+ ++ ' +' ' ++ - - - - - - - - - - - - - - - - - - - - - + + +Dispatch model +^^ +The dispatch model enables a cross-core dispatching mechanism which employs +a scheduling work-queue to dispatch streams to other worker cores which +being associated with the destination node. + +Use ``rte_graph_model_dispatch_lcore_affinity_set()`` to set lcore affinity +with the node. +Each worker core will have a graph repetition. Use ``rte_graph_clone()`` to +clone graph for each worker and use``rte_graph_model_dispatch_core_bind()`` +to bind graph with the worker core. + +Example: + +Graph topo: node-0 -> Core1; node-1 -> node-2; node-2 -> node-3. +Config graph: node-0 @Core0; node-1/3 @Core1; node-2 @Core2. + +.. code-block:: diff + ++ - - - - - -+ +- - - - - - - - - - - - - + + - - - - - -+ +' Core #0 ' ' Core #1 ' ' Core #2 ' +'' ' ' '' +' ++ ' ' ++++ ' ' ++ ' +' | Node-0 | - - - ->| Node-1 || Node-3 |<- - - - | Node-2 | ' +' ++ ' ' ++++ ' ' ++ ' +'' ' |' ' ^ ' ++ - - - - - -+ +- - -|- - - - - - - - - - + + - - -|- - -+ + | | + + - - - - - - - - - - - - - - - - + + + In fast path Typical fast-path code looks like below, where the application -- 2.37.2
[Bug 1207] testpmd: No probed ethernet devices when running testpmd with Intel E810XXVDA2G1P5
https://bugs.dpdk.org/show_bug.cgi?id=1207 Bug ID: 1207 Summary: testpmd: No probed ethernet devices when running testpmd with Intel E810XXVDA2G1P5 Product: DPDK Version: 22.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: major Priority: Normal Component: testpmd Assignee: dev@dpdk.org Reporter: wang.j...@nokia-sbell.com Target Milestone: --- Created attachment 248 --> https://bugs.dpdk.org/attachment.cgi?id=248&action=edit Test logs and debug logs Hi DPDK support: When I try to install DPDK and run testpmd,No probed ethernet devices always returned and can not forwad any packet .Test NIC is Intel E810XXVDA2G1P5. Attached all test logs for details. Pls check to check what may cuase the issue and what method I can try to solve the issue? Thanks a lot. BR, wangjing -- You are receiving this mail because: You are the assignee for the bug.
[Bug 1208] testpmd: No probed ethernet devices when running testpmd with Intel E810XXVDA2G1P5
https://bugs.dpdk.org/show_bug.cgi?id=1208 Bug ID: 1208 Summary: testpmd: No probed ethernet devices when running testpmd with Intel E810XXVDA2G1P5 Product: DPDK Version: 22.11 Hardware: x86 OS: Linux Status: UNCONFIRMED Severity: major Priority: Normal Component: testpmd Assignee: dev@dpdk.org Reporter: wang.j...@nokia-sbell.com Target Milestone: --- Hi DPDK support: When I try to install DPDK and run testpmd,No probed ethernet devices always returned and can not forwad any packet .Test NIC is Intel E810XXVDA2G1P5. Attached all test logs for details. Pls check to check what may cuase the issue and what method I can try to solve the issue? Thanks a lot. BR, wangjing -- You are receiving this mail because: You are the assignee for the bug.
[Bug 1208] testpmd: No probed ethernet devices when running testpmd with Intel E810XXVDA2G1P5
https://bugs.dpdk.org/show_bug.cgi?id=1208 David Marchand (david.march...@redhat.com) changed: What|Removed |Added CC||david.march...@redhat.com Status|UNCONFIRMED |RESOLVED Resolution|--- |DUPLICATE --- Comment #1 from David Marchand (david.march...@redhat.com) --- . *** This bug has been marked as a duplicate of bug 1207 *** -- You are receiving this mail because: You are the assignee for the bug.