Re: [dpdk-dev] [PATCH v5 2/6] eal: abstract away IPC socket path generation
On 3/8/2018 12:56 AM, Anatoly Burakov wrote: Signed-off-by: Anatoly Burakov Acked-by: Jianfeng Tan Thanks, Jianfeng --- Notes: v5: remove lock files, leaving only socket paths code v4: replace lock files with init files lib/librte_eal/common/eal_common_proc.c | 48 - 1 file changed, 23 insertions(+), 25 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index da7930f..1aab3ac 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -91,6 +91,17 @@ find_sync_request(const char *dst, const char *act_name) return r; } +static void +create_socket_path(const char *name, char *buf, int len) +{ + const char *prefix = eal_mp_socket_path(); + + if (strlen(name) > 0) + snprintf(buf, len, "%s_%s", prefix, name); + else + snprintf(buf, len, "%s", prefix); +} + int rte_eal_primary_proc_alive(const char *config_file_path) { @@ -290,8 +301,12 @@ mp_handle(void *arg __rte_unused) static int open_socket_fd(void) { + char peer_name[PATH_MAX] = {0}; struct sockaddr_un un; - const char *prefix = eal_mp_socket_path(); + + if (rte_eal_process_type() == RTE_PROC_SECONDARY) + snprintf(peer_name, sizeof(peer_name), + "%d_%"PRIx64, getpid(), rte_rdtsc()); mp_fd = socket(AF_UNIX, SOCK_DGRAM, 0); if (mp_fd < 0) { @@ -301,13 +316,11 @@ open_socket_fd(void) memset(&un, 0, sizeof(un)); un.sun_family = AF_UNIX; - if (rte_eal_process_type() == RTE_PROC_PRIMARY) - snprintf(un.sun_path, sizeof(un.sun_path), "%s", prefix); - else { - snprintf(un.sun_path, sizeof(un.sun_path), "%s_%d_%"PRIx64, -prefix, getpid(), rte_rdtsc()); - } + + create_socket_path(peer_name, un.sun_path, sizeof(un.sun_path)); + unlink(un.sun_path); /* May still exist since last run */ + if (bind(mp_fd, (struct sockaddr *)&un, sizeof(un)) < 0) { RTE_LOG(ERR, EAL, "failed to bind %s: %s\n", un.sun_path, strerror(errno)); @@ -342,20 +355,6 @@ unlink_sockets(const char *filter) return 0; } -static void -unlink_socket_by_path(const char *path) -{ - char *filename; - char *fullpath = strdup(path); - - if (!fullpath) - return; - filename = basename(fullpath); - unlink_sockets(filename); - free(fullpath); - RTE_LOG(INFO, EAL, "Remove socket %s\n", path); -} - int rte_mp_channel_init(void) { @@ -444,10 +443,9 @@ send_msg(const char *dst_path, struct rte_mp_msg *msg, int type) if (snd < 0) { rte_errno = errno; /* Check if it caused by peer process exits */ - if (errno == ECONNREFUSED) { - /* We don't unlink the primary's socket here */ - if (rte_eal_process_type() == RTE_PROC_PRIMARY) - unlink_socket_by_path(dst_path); + if (errno == ECONNREFUSED && + rte_eal_process_type() == RTE_PROC_PRIMARY) { + unlink(dst_path); return 0; } if (errno == ENOBUFS) {
Re: [dpdk-dev] [PATCH v5 4/6] eal: lock IPC directory on init and send
On 3/8/2018 12:56 AM, Anatoly Burakov wrote: When sending IPC messages, prevent new sockets from initializing. Signed-off-by: Anatoly Burakov Make sense. Acked-by: Jianfeng Tan Thanks!
Re: [dpdk-dev] [PATCH v5 5/6] eal: simplify IPC sync request timeout code
On 3/8/2018 12:56 AM, Anatoly Burakov wrote: Signed-off-by: Anatoly Burakov Why I did not find this simple way in the first place :-) Acked-by: Jianfeng Tan Thanks!
[dpdk-dev] Question on AESNI PMD
Hi, I'm working on an SPDK module that uses the DPDK cryptodev framework, initially I'm using the AESNI PMD and have a few questions. in the doc it says that only in-place is supported however I see code in set_mb_job_params() just after the comment "Mutable crypto operation parameters" it appears to support a separate src and dst m_buf so curious about that. For my use case (storage) I'm using external data buffers so I can't use that code anyways but I was able to make some minor changes and am able to pass in different src and dst m_bufs that point to my own data buffers (not in the packet) and it seems to be working fine. So my 2 questions are: (1) is the documented in-place limitation simply not correct? (2) would there be any upstream interest in supporting a patch that enables m_bufs using external data buffers for src and dst? Thanks! Paul
Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
Hi Andrew, On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote: > Size of memory chunk required to populate mempool objects depends > on how objects are stored in the memory. Different mempool drivers > may have different requirements and a new operation allows to > calculate memory size in accordance with driver requirements and > advertise requirements on minimum memory chunk size and alignment > in a generic way. > > Bump ABI version since the patch breaks it. > > Signed-off-by: Andrew Rybchenko > --- > RFCv2 -> v1: > - move default calc_mem_size callback to rte_mempool_ops_default.c > - add ABI changes to release notes > - name default callback consistently: rte_mempool_op__default() > - bump ABI version since it is the first patch which breaks ABI > - describe default callback behaviour in details > - avoid introduction of internal function to cope with depration >(keep it to deprecation patch) > - move cache-line or page boundary chunk alignment to default callback > - highlight that min_chunk_size and align parameters are output only > [...] > diff --git a/lib/librte_mempool/rte_mempool_ops_default.c > b/lib/librte_mempool/rte_mempool_ops_default.c > new file mode 100644 > index 000..57fe79b > --- /dev/null > +++ b/lib/librte_mempool/rte_mempool_ops_default.c > @@ -0,0 +1,38 @@ > +/* SPDX-License-Identifier: BSD-3-Clause > + * Copyright(c) 2016 Intel Corporation. > + * Copyright(c) 2016 6WIND S.A. > + * Copyright(c) 2018 Solarflare Communications Inc. > + */ > + > +#include > + > +ssize_t > +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp, > + uint32_t obj_num, uint32_t pg_shift, > + size_t *min_chunk_size, size_t *align) > +{ > + unsigned int mp_flags; > + int ret; > + size_t total_elt_sz; > + size_t mem_size; > + > + /* Get mempool capabilities */ > + mp_flags = 0; > + ret = rte_mempool_ops_get_capabilities(mp, &mp_flags); > + if ((ret < 0) && (ret != -ENOTSUP)) > + return ret; > + > + total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; > + > + mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift, > + mp->flags | mp_flags); > + Looks ok to me except a nit: (mp->flags | mp_flags) style expression is to differentiate that mp_flags holds driver specific flag like BLK_ALIGN and mp->flags has appl specific flags.. is it so? If not then why not simply do like: mp->flags |= mp_flags. Thanks.
Re: [dpdk-dev] [dpdk-stable] [PATCH] net/mlx5: fix disabling Tx packet inlining
Tuesday, February 27, 2018 9:51 AM, NĂ©lio Laranjeiro > On Mon, Feb 26, 2018 at 09:50:57AM -0800, Yongseok Koh wrote: > > Adding 'txq_inline=0' to PMD parameter should disable Tx packet > > inlining but it doesn't work properly for Enhanced Multi-Packet Send. > > > > Fixes: 6ce84bd88919 ("net/mlx5: add enhanced multi-packet send for > > ConnectX-5") > > Cc: sta...@dpdk.org > > > > Signed-off-by: Yongseok Koh > > Acked-by: Shahaf Shuler > Acked-by: Nelio Laranjeiro Applied to next-net-mlx, thanks.
[dpdk-dev] Mellanox 18.05 roadmap
Hi, Sorry for sending it a little bit late. Below is what we are planning to implement in 18.05 release: Support Rx stateless ethdev offloads (RSS, checksum and packet type classification) for various tunnel protocols. Including VXLAN, VXLAN-GPE, MPLSoGRE and MPLSoUDP. This new API will be implemented in mlx5. Support Tx stateless ethdev offloads (TSO and checksum) for custom tunnel protocols. Custom tunnel protocol is defined by informing the PMD on the offsets and header types of the inner and outer headers inside the packet payload. This new API will be implemented in mlx5. API to support full HW offload including: Redirecting specific flows from one VM to another VM without SW processing. Manipulation of packet payload including encap/decap of headers, vlan push/pop, headers rewrite, etc Defining , in collaboration with Intel, API for VF representors and control API to configure VF from host. Participating in new devargs syntax development, which is bus agnostic and separate generic and driver specific properties. rte_flow support for bonding PMD. TSO support for TAP PMD. mlx5 and mlx4 support new dynamic memory allocation suggested by the series: http://dpdk.org/ml/archives/dev/2018-March/092070.html. mlx5 and mlx4 support non-contiguous memory pools. mlx5 performance improvements: - stride RQ to reach 100G with 64B message - OOB performance --- Olga Shern SW Director DPDK Mellanox Technologies, Raanana, Israel
Re: [dpdk-dev] [PATCH] net/octeontx: use the new offload APIs
On Friday 09 March 2018 12:37 AM, Pavan Nikhilesh wrote: > Use the new Rx/Tx offload APIs and remove the old style offloads. > > Signed-off-by: Pavan Nikhilesh > --- Acked-by: Santosh Shukla
Re: [dpdk-dev] [PATCH v3 18/18] net/axgbe: moved license headers to SPDX format
On Fri, 9 Mar 2018 03:42:34 -0500 Ravi Kumar wrote: > +.. Copyright (c) 2018 Advanced Micro Devices, Inc. All rights reserved. > +SPDX-License-Identifier: BSD-3-Clause > By convention SPDX tag is always first line of the file.
Re: [dpdk-dev] [PATCH v5 6/6] eal: ignore messages until init is complete
On 3/8/2018 12:56 AM, Anatoly Burakov wrote: If we receive messages that don't have a callback registered for them, and we haven't finished initialization yet, it can be reasonably inferred that we shouldn't have gotten the message in the first place. Therefore, send requester a special message telling them to ignore response to this request, as if this process wasn't there. Two nits: - I think we could add a note in commit log that "it only applies to primary to secondary request". - To make the change more simple, maybe we can just put a declaration of function mp_send. Signed-off-by: Anatoly Burakov Acked-by: Jianfeng Tan --- Notes: v5: add this patch No changes in mp_send and send_msg - just code move. lib/librte_eal/common/eal_common_proc.c | 280 +--- 1 file changed, 151 insertions(+), 129 deletions(-) diff --git a/lib/librte_eal/common/eal_common_proc.c b/lib/librte_eal/common/eal_common_proc.c index fe27d68..1ea6045 100644 --- a/lib/librte_eal/common/eal_common_proc.c +++ b/lib/librte_eal/common/eal_common_proc.c @@ -52,6 +52,7 @@ enum mp_type { MP_MSG, /* Share message with peers, will not block */ MP_REQ, /* Request for information, Will block for a reply */ MP_REP, /* Response to previously-received request */ + MP_IGN, /* Response telling requester to ignore this response */ }; struct mp_msg_internal { @@ -205,6 +206,130 @@ rte_mp_action_unregister(const char *name) free(entry); } +/** + * Return -1, as fail to send message and it's caused by the local side. + * Return 0, as fail to send message and it's caused by the remote side. + * Return 1, as succeed to send message. + * + */ +static int +send_msg(const char *dst_path, struct rte_mp_msg *msg, int type) +{ + int snd; + struct iovec iov; + struct msghdr msgh; + struct cmsghdr *cmsg; + struct sockaddr_un dst; + struct mp_msg_internal m; + int fd_size = msg->num_fds * sizeof(int); + char control[CMSG_SPACE(fd_size)]; + + m.type = type; + memcpy(&m.msg, msg, sizeof(*msg)); + + memset(&dst, 0, sizeof(dst)); + dst.sun_family = AF_UNIX; + snprintf(dst.sun_path, sizeof(dst.sun_path), "%s", dst_path); + + memset(&msgh, 0, sizeof(msgh)); + memset(control, 0, sizeof(control)); + + iov.iov_base = &m; + iov.iov_len = sizeof(m) - sizeof(msg->fds); + + msgh.msg_name = &dst; + msgh.msg_namelen = sizeof(dst); + msgh.msg_iov = &iov; + msgh.msg_iovlen = 1; + msgh.msg_control = control; + msgh.msg_controllen = sizeof(control); + + cmsg = CMSG_FIRSTHDR(&msgh); + cmsg->cmsg_len = CMSG_LEN(fd_size); + cmsg->cmsg_level = SOL_SOCKET; + cmsg->cmsg_type = SCM_RIGHTS; + memcpy(CMSG_DATA(cmsg), msg->fds, fd_size); + + do { + snd = sendmsg(mp_fd, &msgh, 0); + } while (snd < 0 && errno == EINTR); + + if (snd < 0) { + rte_errno = errno; + /* Check if it caused by peer process exits */ + if (errno == ECONNREFUSED && + rte_eal_process_type() == RTE_PROC_PRIMARY) { + unlink(dst_path); + return 0; + } + if (errno == ENOBUFS) { + RTE_LOG(ERR, EAL, "Peer cannot receive message %s\n", + dst_path); + return 0; + } + RTE_LOG(ERR, EAL, "failed to send to (%s) due to %s\n", + dst_path, strerror(errno)); + return -1; + } + + return 1; +} + +static int +mp_send(struct rte_mp_msg *msg, const char *peer, int type) +{ + int dir_fd, ret = 0; + DIR *mp_dir; + struct dirent *ent; + + if (!peer && (rte_eal_process_type() == RTE_PROC_SECONDARY)) + peer = eal_mp_socket_path(); + + if (peer) { + if (send_msg(peer, msg, type) < 0) + return -1; + else + return 0; + } + + /* broadcast to all secondary processes */ + mp_dir = opendir(mp_dir_path); + if (!mp_dir) { + RTE_LOG(ERR, EAL, "Unable to open directory %s\n", + mp_dir_path); + rte_errno = errno; + return -1; + } + + dir_fd = dirfd(mp_dir); + /* lock the directory to prevent processes spinning up while we send */ + if (flock(dir_fd, LOCK_EX)) { + RTE_LOG(ERR, EAL, "Unable to lock directory %s\n", + mp_dir_path); + rte_errno = errno; + closedir(mp_dir); + return -1; + } + + while ((ent = readdir(mp_dir))) { + char path[PATH_MAX]; + + if (fnmatch(mp_filter, ent->d_name, 0) != 0) +
Re: [dpdk-dev] [PATCH] vdev: fix name comparison in find_vdev
Hi, On 3/10/2018 6:28 AM, Nachiketa Prachanda wrote: Use strcmp to compare device names as the strncmp in original code causes find_vdev to return -EEXIST for names that are prefix of another. The creation of interfaces fails unpredictably based on the order of their creation. An easy way hit this bug is to create eth_vhost1 after eth_vhost11. Signed-off-by: Nachiketa Prachanda Except a nit bellow: Acked-by: Jianfeng Tan Fixes: dda987315ca2 ("vdev: make virtual bus use its device struct") Cc: jblu...@infradead.org Cc: sta...@dpdk.org AFAIK, we usually put above line before Signed-off-by. --- drivers/bus/vdev/vdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bus/vdev/vdev.c b/drivers/bus/vdev/vdev.c index e4bc724..7eae319 100644 --- a/drivers/bus/vdev/vdev.c +++ b/drivers/bus/vdev/vdev.c @@ -188,7 +188,7 @@ find_vdev(const char *name) TAILQ_FOREACH(dev, &vdev_device_list, next) { const char *devname = rte_vdev_device_name(dev); - if (!strncmp(devname, name, strlen(name))) + if (!strcmp(devname, name)) return dev; }
Re: [dpdk-dev] [PATCH v3] doc: add driver limitation for vhost dequeue zero copy
On 3/9/2018 6:07 PM, Junjie Chen wrote: In vhost-switch example, when binding nic to vfio-pci with iommu enabled, dequeue zero copy cannot work in VM2NIC mode due to no iommu dma mapping is setup for guest memory currently. Signed-off-by: Junjie Chen --- Changes in V3: - update limitation to iommu Changes in V2: - add doc in vhost lib doc/guides/prog_guide/vhost_lib.rst | 5 + doc/guides/sample_app_ug/vhost.rst | 5 - 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/doc/guides/prog_guide/vhost_lib.rst b/doc/guides/prog_guide/vhost_lib.rst index 18227b6..06d214f 100644 --- a/doc/guides/prog_guide/vhost_lib.rst +++ b/doc/guides/prog_guide/vhost_lib.rst @@ -83,6 +83,11 @@ The following is an overview of some key Vhost API functions: of those segments, thus the fewer the segments, the quicker we will get the mapping. NOTE: we may speed it by using tree searching in future. +* zero copy does not work when using driver with iommu mode currently, this Considering FreeBSD driver nic_uio does not support iommu, we can make "driver" more explicitly. For example, "driver with iommu mode" -> "vfio-pci with iommu mode" + is because we don't setup iommu dma mapping for guest memory. For example, when you bind device to vfio-pci driver, you need to set driver to work in noiommu mode. Reword the above sentence a little bit: "If vfio-pci is a must in your case, insert vfio-pci kernel module in noiommu mode." + - ``RTE_VHOST_USER_IOMMU_SUPPORT`` IOMMU support will be enabled when this flag is set. It is disabled by diff --git a/doc/guides/sample_app_ug/vhost.rst b/doc/guides/sample_app_ug/vhost.rst index a4bdc6a..f0bb169 100644 --- a/doc/guides/sample_app_ug/vhost.rst +++ b/doc/guides/sample_app_ug/vhost.rst @@ -147,7 +147,10 @@ retries on an RX burst, it takes effect only when rx retry is enabled. The default value is 15. **--dequeue-zero-copy** -Dequeue zero copy will be enabled when this option is given. +Dequeue zero copy will be enabled when this option is given. it is worth to +note that if NIC is binded to driver with iommu enabled, dequeue zero copy +cannot work at VM2NIC mode (vm2vm=0) due to currently we don't setup iommu +dma mapping for guest memory. **--vlan-strip 0|1** VLAN strip option is removed, because different NICs have different behaviors
Re: [dpdk-dev] [PATCH 2/4] bus/vdev: bus scan by multi-process channel
On 3/7/2018 10:00 PM, Burakov, Anatoly wrote: On 04-Mar-18 3:30 PM, Jianfeng Tan wrote: To scan the vdevs in primary, we send request to primary process to obtain the names for vdevs. Only the name is shared from the primary. In probe(), the device driver is supposed to locate (or request more) the detail information from the primary. Signed-off-by: Jianfeng Tan --- General note - you probably want to syncrhonize access to the tailq. Multiple secondaries may initialize, a vdev hotplug event may be in process, etc. Make sense, will change it in next version. Thanks, Jianfeng
Re: [dpdk-dev] [PATCH 3/4] drivers/net: do not allocate rte_eth_dev_data privately
On 3/7/2018 2:10 PM, Matan Azrad wrote: Yes, we are removing all private rte_eth_dev_data. If I missed some device, welcome to point out. net/ring What is about next PCI device? net/cxgbe Will change these two in next version. Thank you, Matan.
Re: [dpdk-dev] [PATCH] eventdev: add timestamping to received packets
-Original Message- > Date: Tue, 20 Feb 2018 06:30:54 -0500 > From: Nikhil Rao > To: jerin.ja...@caviumnetworks.com > CC: dev@dpdk.org, gage.e...@intel.com, narender.vang...@intel.com, > abhinandan.guj...@intel.com, nikhil@intel.com > Subject: [PATCH] eventdev: add timestamping to received packets > X-Mailer: git-send-email 1.8.3.1 > > Add timestamp to received packets before enqueuing to > event device if the timestamp is not already set. Adding > timestamp in the Rx adapter avoids additional latency due > to the event device. > > Signed-off-by: Nikhil Rao Acked-by: Jerin Jacob
Re: [dpdk-dev] [PATCH] net/dpaa: prefer defines for link speed values
> -Original Message- > From: Ferruh Yigit [mailto:ferruh.yi...@intel.com] > Sent: Friday, March 9, 2018 11:29 PM > To: Hemant Agrawal ; Shreyansh Jain > > Cc: dev@dpdk.org; Ferruh Yigit > Subject: [PATCH] net/dpaa: prefer defines for link speed values > > Use existing defines instead of hardcoded vales. > > Signed-off-by: Ferruh Yigit > --- Acked-by: Shreyansh Jain
Re: [dpdk-dev] [PATCH v2 1/2] eventdev: add device stop flush callback
-Original Message- > When an event device is stopped, it drains all event queues. These events > may contain pointers, so to prevent memory leaks eventdev now supports a > user-provided flush callback that is called during the queue drain process. > This callback is stored in process memory, so the callback must be > registered by any process that may call rte_event_dev_stop(). > > This commit also clarifies the behavior of rte_event_dev_stop(). > > This follows this mailing list discussion: > http://dpdk.org/ml/archives/dev/2018-January/087484.html > > Signed-off-by: Gage Eads > --- > v2: allow a NULL callback pointer to unregister the callback > > lib/librte_eventdev/rte_eventdev.c | 17 + > lib/librte_eventdev/rte_eventdev.h | 55 > +++- > lib/librte_eventdev/rte_eventdev_version.map | 6 +++ > 3 files changed, 76 insertions(+), 2 deletions(-) > > diff --git a/lib/librte_eventdev/rte_eventdev.c > b/lib/librte_eventdev/rte_eventdev.c > index 851a119..1aacb7b 100644 > --- a/lib/librte_eventdev/rte_eventdev.c > +++ b/lib/librte_eventdev/rte_eventdev.c > @@ -1123,6 +1123,23 @@ rte_event_dev_start(uint8_t dev_id) > return 0; > } > > +typedef void (*eventdev_stop_flush_t)(uint8_t dev_id, struct rte_event event, > + void *arg); > +/**< Callback function called during rte_event_dev_stop(), invoked once per > + * flushed event. > + */ > + > #define RTE_EVENTDEV_NAME_MAX_LEN(64) > /**< @internal Max length of name of event PMD */ > > @@ -1176,6 +1194,11 @@ struct rte_eventdev { > event_dequeue_burst_t dequeue_burst; > /**< Pointer to PMD dequeue burst function. */ > > + eventdev_stop_flush_t dev_stop_flush; > + /**< Optional, user-provided event flush function */ > + void *dev_stop_flush_arg; > + /**< User-provided argument for event flush function */ > + I think, we can move this additions to the internal rte_eventdev_data structure. Reasons are 1) Changes to "struct rte_eventdev" would call for ABI change 2) We can keep "struct rte_eventdev" only for fast path functions, slow path functions can have additional redirection. > struct rte_eventdev_data *data; > /**< Pointer to device data */ > const struct rte_eventdev_ops *dev_ops; > @@ -1822,6 +1845,34 @@ rte_event_dev_xstats_reset(uint8_t dev_id, > */ > int rte_event_dev_selftest(uint8_t dev_id); > > +/** > + * Registers a callback function to be invoked during rte_event_dev_stop() > for > + * each flushed event. This function can be used to properly dispose of > queued > + * events, for example events containing memory pointers. > + * > + * The callback function is only registered for the calling process. The > + * callback function must be registered in every process that can call > + * rte_event_dev_stop(). > + * > + * To unregister a callback, call this function with a NULL callback pointer. > + * > + * @param dev_id > + * The identifier of the device. > + * @param callback > + * Callback function invoked once per flushed event. > + * @param userdata > + * Argument supplied to callback. > + * > + * @return > + * - 0 on success. > + * - -EINVAL if *dev_id* is invalid > + * > + * @see rte_event_dev_stop() > + */ > +int > +rte_event_dev_stop_flush_callback_register(uint8_t dev_id, > + eventdev_stop_flush_t callback, void *userdata); > + IMO, It would be better if we place this function near to rte_event_dev_stop(). Other than above minor changes, It looks good to me.
Re: [dpdk-dev] [PATCH v1 1/9] mempool: add op to calculate memory size to be allocated
On 03/11/2018 03:51 PM, santosh wrote: Hi Andrew, On Saturday 10 March 2018 09:09 PM, Andrew Rybchenko wrote: Size of memory chunk required to populate mempool objects depends on how objects are stored in the memory. Different mempool drivers may have different requirements and a new operation allows to calculate memory size in accordance with driver requirements and advertise requirements on minimum memory chunk size and alignment in a generic way. Bump ABI version since the patch breaks it. Signed-off-by: Andrew Rybchenko --- RFCv2 -> v1: - move default calc_mem_size callback to rte_mempool_ops_default.c - add ABI changes to release notes - name default callback consistently: rte_mempool_op__default() - bump ABI version since it is the first patch which breaks ABI - describe default callback behaviour in details - avoid introduction of internal function to cope with depration (keep it to deprecation patch) - move cache-line or page boundary chunk alignment to default callback - highlight that min_chunk_size and align parameters are output only [...] diff --git a/lib/librte_mempool/rte_mempool_ops_default.c b/lib/librte_mempool/rte_mempool_ops_default.c new file mode 100644 index 000..57fe79b --- /dev/null +++ b/lib/librte_mempool/rte_mempool_ops_default.c @@ -0,0 +1,38 @@ +/* SPDX-License-Identifier: BSD-3-Clause + * Copyright(c) 2016 Intel Corporation. + * Copyright(c) 2016 6WIND S.A. + * Copyright(c) 2018 Solarflare Communications Inc. + */ + +#include + +ssize_t +rte_mempool_op_calc_mem_size_default(const struct rte_mempool *mp, +uint32_t obj_num, uint32_t pg_shift, +size_t *min_chunk_size, size_t *align) +{ + unsigned int mp_flags; + int ret; + size_t total_elt_sz; + size_t mem_size; + + /* Get mempool capabilities */ + mp_flags = 0; + ret = rte_mempool_ops_get_capabilities(mp, &mp_flags); + if ((ret < 0) && (ret != -ENOTSUP)) + return ret; + + total_elt_sz = mp->header_size + mp->elt_size + mp->trailer_size; + + mem_size = rte_mempool_xmem_size(obj_num, total_elt_sz, pg_shift, +mp->flags | mp_flags); + Looks ok to me except a nit: (mp->flags | mp_flags) style expression is to differentiate that mp_flags holds driver specific flag like BLK_ALIGN and mp->flags has appl specific flags.. is it so? If not then why not simply do like: mp->flags |= mp_flags. In fact it does not mater a lot since the code is removed in the patch 3. Here it is required just for consistency. Also, mp argument is a const which will not allow to change its members.