Hi, > From: Yang Ming <ming.1.y...@nokia-sbell.com> > Sent: Wednesday, March 12, 2025 3:56 AM > To: Bruce Richardson <bruce.richard...@intel.com>; Stephen Hemminger > <step...@networkplumber.org> > Cc: Dariusz Sosnowski <dsosnow...@nvidia.com>; Slava Ovsiienko > <viachesl...@nvidia.com>; Bing Zhao <bi...@nvidia.com>; Ori Kam > <or...@nvidia.com>; > Suanming Mou <suanmi...@nvidia.com>; Matan Azrad > <ma...@nvidia.com>; dev@dpdk.org > Subject: Re: [PATCH 1/2] net/mlx5: improve socket file path > > External email: Use caution opening links or attachments > > > On 2025/1/3 10:51, Ming 1. Yang (NSB) wrote: > > On 2024/12/14 01:16, Bruce Richardson wrote: > On Fri, Dec 13, 2024 at 09:12:39AM -0800, Stephen Hemminger wrote: > On Fri, 13 Dec 2024 17:24:42 +0800 > Yang Ming mailto:ming.1.y...@nokia-sbell.com wrote: > > 1. /var/tmp is hard code which is not a good style > 2. /var/tmp may be not allowed to be written via container's > read only mode. > > Signed-off-by: Yang Ming mailto:ming.1.y...@nokia-sbell.com > Since this is a unix domain socket, why not use abstract socket > that doesn't have to be associated with filesystem? > In general, I think we should avoid abstract sockets in DPDK. Primary > reason is that they are linux-specific. Last time I checked other unixes, > like BSD, don't support them. A secondary concern is that having a > filesystem path allows permission checks, so for e.g. telemetry sockets, > only users with appropriate permissions can connect. With an abstract socket > we'd have to open up the area of user authentication. > > /Bruce > > Hi Stephen & Bruce, > I'm not sure whether abstract socket is a good idea. Maybe it can be improved > further or step by step. But we don't need to discuss it for this commit. > We do this improvement because "/var/tmp" and "/var/log" can't be write in > Readonly mode of container except that we add /var/ specfic for DPDK > > application in container's setting. But nearly all DPDK modules have already > used common runtime path returned from `rte_eal_get_runtime_dir()`. Why > not > we apply this common path for Mellanox NIC? > > > > Hi Stephen, > > I'm not entirely sure whether using an abstract socket is the best approach. > It might be possible to improve it further or incrementally. However, we > > don't need to discuss this for the current commit. > We made this improvement because the directories "/var/tmp" and "/var/log" > cannot be written to in a container with read-only mode, unless we > > specifically configure the /var/ directory for the DPDK application in the > container's settings. Nearly all DPDK modules already use the common runtime > > path returned by rte_eal_get_runtime_dir(). Therefore, it makes sense to > apply this common path for the Mellanox NIC as well. > Actually, the objective of this patch series is to prevent the DPDK Mellanox > driver from crashing when attempting to access the read-only directories "/> > var/" in a container. > > Brs, > Yang Ming
Let me provide the context for the functionality in question here. mlx5 PMD has an ability to dump HW flow rules using mlx_steering_dump tool (documented in https://doc.dpdk.org/guides/nics/mlx5.html#how-to-dump-flows and in https://github.com/Mellanox/mlx_steering_dump/tree/master/hws). Dumping itself is supported only on Linux. There are 2 ways to use that tool: 1. Application calls rte_flow_dev_dump(), providing FILE* handle. This saves the flow rules metadata (e.g., IDs of HW objects) in the file. mlx_steering_dump tool is then used to parse it and dump HW rules. 2. mlx_steering_dump communicates through the Unix socket with mlx5 PMD, sharing destination file descriptor of output metadata file with DPDK process through SCM_RIGHTS mechanism. mlx5 PMD internally calls rte_flow_dev_dump(), passing the provided file. After dumping is done, the tool parses the metadata file and extracts rules from the HW. In practice, 2nd option is the more frequently used one, because of its convenience since it does not require modification of application code. It has also the benefit that the tool itself owns the output dump file e.g., tool could be called from outside of the container and debug dump will be generated in host's context, not container context (assuming socket path is mounted on the host). This is the case for example when containers use VFs of mlx5 NICs. Changing the filesystem path of the Unix socket would require the update of all the tooling - there would be a need to copy the logic of runtime directory discovery (like it is done in dpdk-telemetry.py). Until it is done, this change would introduce a breakage for existing users. What do you think about fallback mechanism here? If "/var/tmp" is read-only or creation of the socket on this path fails, then socket will be created in EAL runtime directory. Best regards, Dariusz Sosnowski