Hi, > On Oct 21, 2024, at 09:52, Wathsala Vithanage <wathsala.vithan...@arm.com> > wrote: > > External email: Use caution opening links or attachments > > > DPDK applications benefit from Direct Cache Access (DCA) features like > Intel DDIO and Arm's write-allocate-to-SLC. However, those features do > not allow fine-grained control of direct cache access, such as stashing > packets into upper-level caches (L2 caches) of a processor or the shared > cache of a chiplet. PCIe TLP Processing Hints (TPH) addresses this need > in a vendor-agnostic manner. TPH capability has existed since > PCI Express Base Specification revision 3.0; today, numerous Network > Interface Cards and interconnects from different vendors support TPH > capability. TPH comprises a steering tag (ST) and a processing hint > (PH). ST specifies the cache level of a CPU at which the data should be > written to (or DCAed into), while PH is a hint provided by the PCIe > requester to the completer on an upcoming traffic pattern. Some NIC > vendors bundle TPH capability with fine-grained control over the type of > objects that can be stashed into CPU caches, such as > > - Rx/Tx queue descriptors > - Packet-headers > - Packet-payloads > - Data from a given offset from the start of a packet > > Note that stashable object types are outside the scope of PCIe standard; > therefore, vendors could support any combination of the above items as > they see fit. > > To enable TPH and fine-grained packet stashing, this API extends the > ethdev library, PCI library, and the PCI driver. In this design, the > application via the ethdev stashing API provides hints to the PMD to > indicate the underlying hardware at which processor and cache level it > prefers a packet to end up. Once the PMD receives a CPU and a > cache-level combination, it must extract the matching ST from the TPH > ACPI _DSM of the PCIe root port to which the NIC is connected. To > facilitate the extraction of STs, the PCI library and the PCI driver > APIs are extended. > > PMD's implementation of eth_dev_ops stashing_rx_hints_set and > stashing_tx_hints_set function pointers are responsible for extracting > the ST. The PCI bus driver provides the generic TPH ST extraction API > that can be used by any PMD that drives a PCIe device. The extraction > process begins by calling rte_pci_extract_tph_st() function in > drivers/bus/pci/rte_bus_pci.h, which takes an initialized input object > rte_tph_acpi__dsm_args and a pointer to rte_tph_acpi__dsm_return to > store the ST returned by the TPH _DSM. rte_tph_acpi__dsm_arg and > rte_tph_acpi__dsm_return objects are defined in lib/pci/rte_pci_tph.h as > defined by the PCIe firmware specification and the associated ECN titled > "Revised _DSM for Cache Locality TPH Features". The helper function > rte_init_tph_acpi__dsm_args is used by the rte_pci_extract_tph_st() to > convert lcore_id and cache_level provided by the PMD into well-formatted > rte_tph_acpi__dsm_args. The processor or, in some cases, a container ID > (which is synonymous with a core complex of a chiplet die) and the cache > level in the rte_tph_acpi__dsm_args structure are not the same as the > lcore_id and the cache_level provided by the application to the ethdev > library, which PMD passes down to the rte_pci_extract_st() function. The > rte_init_tph_acpi__dsm_args helper converts lcore_id to an APIC > processor-id or a PPTT processor-container-id if the container of the > lcore_id was requested as the target by the application. Similarly, it > must convert cache_level to a PPTT cache-reference-id. These conversions > are possible with the hwloc library or some other library DPDK may > eventually provide. However, DPDK cannot execute the TPH _DSM directly, > as it can only be done with kernel privileges. Therefore, appropriate > mechanisms must be established in supported Operating Systems(Linux, > FreeBSD, and Windows) to expose the _DSM return for a given argument. > For instance, on Linux, this mechanism could be sysfs. Therefore, the > implementation of rte_pci_extract_tph_st() is done in OS-specific files > drivers/bus/pci/{bsd, linux, windows}/pci.c. > > Once the ST is acquired from the OS-specific method described earlier, > the stashing_rx_hints_set/stashing_tx_hints_set PMD implementations are > ready to set the ST. As per PCIe specification, hints can be put on the > MSI-X tables or using a device-specific method. Considering this, many > NICs that support TPH allow setting steering tags and processing hints > on the device's MSI-X table and queue contexts. For PMDs, setting the ST > on queue contexts is the only viable method of using TPH. Therefore, the > DPDK can only support setting ST in queue contexts. An application uses > the cache stashing ethdev API by first calling the > rte_eth_dev_stashing_capabilities_get() function to find out what object > types can be stashed into a processor cache by the NIC out of the object > types in the bulleted list above. This function takes a port_id and a > pointer to a uint16_t to report back the object type flags. PMD > implements the stashing_capabilities_get function pointer in > eth_dev_ops. If the underlying platform or the NIC does not support TPH, > this function returns -ENOTSUP and the application should consider any > values stored in the objects pointer invalid. > > Once the application knows the supported object types that can be > stashed, the next step is to set the steering tags for the packets > associated with Rx and Tx queues via > rte_eth_dev_stashing_rx_config_set() and > rte_eth_dev_stashing_tx_config_set() ethdev library function > respectively. These functions execute the rte_pci_extract_tph_st() via > eth_dev_ops pointers stashing_rx_hints_set and stashing_tx_hints_set. > Both the functions have an identical signature, a port_id, a queue_id, > and a config object. The port_id and the queue-id are used to locate the > device and the queue. The config object is of type struct > rte_eth_stashing_config, which specifies the lcore_id and the > cache_level, indicating where objects from this queue should be stashed. > It also has the field 'container' to indicate if the target should be > the container of the processor specified by the lcore_id in a > chiplet-based SoC. The 'objects' field in the config sets the types of > objects the application wishes to stash based on the capabilities found > earlier. If the objects field includes the flag > RTE_ETH_DEV_STASH_OBJECT_OFFSET, the 'offset' field must be used to set > the desired offset. These functions invoke PMD implementations of the > stashing functionality via stashing_rx_hints_set and > stashing_tx_hints_set, function pointers in eth_dev_ops, respectively. > > > Wathsala Vithanage (2): > pci: introduce the PCIe TLP Processing Hints API > ethdev: introduce the cache stashing hints API > > drivers/bus/pci/bsd/pci.c | 12 +++ > drivers/bus/pci/linux/pci.c | 12 +++ > drivers/bus/pci/rte_bus_pci.h | 22 +++++ > drivers/bus/pci/version.map | 3 + > drivers/bus/pci/windows/pci.c | 14 +++ > lib/ethdev/ethdev_driver.h | 66 ++++++++++++++ > lib/ethdev/rte_ethdev.c | 120 ++++++++++++++++++++++++++ > lib/ethdev/rte_ethdev.h | 156 ++++++++++++++++++++++++++++++++++ > lib/ethdev/version.map | 4 + > lib/pci/meson.build | 2 + > lib/pci/rte_pci.h | 2 + > lib/pci/rte_pci_tph.c | 20 +++++ > lib/pci/rte_pci_tph.h | 111 ++++++++++++++++++++++++ > 13 files changed, 544 insertions(+) > create mode 100644 lib/pci/rte_pci_tph.c > create mode 100644 lib/pci/rte_pci_tph.h > > — > 2.34.1 >
Do you have some numbers about how much performance this feature can improve? /Chenbo