On 8/14/24 15:55, Mina Almasry wrote:
On Wed, Aug 14, 2024 at 10:11 AM Pavel Begunkov wrote:
...
diff --git a/net/core/devmem.c b/net/core/devmem.c
index 301f4250ca82..2f2a7f4dee4c 100644
--- a/net/core/devmem.c
+++ b/net/core/devmem.c
@@ -17,6 +17,7 @@
#include
#include
#include
the devmem.
Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
0.
Signed-off-by: Willem de Bruijn
Signed-off-by: Kaiyuan Zhang
Signed-off-by: Mina Almasry
Reviewed-by: Pavel Begunkov
---
v19:
- Add PP_FLAG_ALLOW_UNREADABLE_NETMEM flag. It serves 2 purposes, (a)
it
On 8/13/24 15:39, Jakub Kicinski wrote:
On Tue, 13 Aug 2024 03:31:13 +0100 Pavel Begunkov wrote:
I'm getting lost, so repeating myself a bit. What I think
would be a good approach is if we get an error back from
the driver if it doesn't support netiov / providers.
netdev_rx_que
On 8/13/24 00:57, Jakub Kicinski wrote:
On Mon, 12 Aug 2024 20:10:39 +0100 Pavel Begunkov wrote:
1. Drivers need to be able to say "I support unreadable netmem".
Failure to report unreadable netmem support should cause the netlink
API to fail when the user tries to bind dmabuf/io ur
On 8/13/24 01:15, Jakub Kicinski wrote:
On Mon, 12 Aug 2024 20:04:41 +0100 Pavel Begunkov wrote:
Also don't see the upside of the explicit "non-capable" flag,
but I haven't thought of that. Is there any use?
Or maybe I don't get what you're asking
On 8/12/24 19:55, Mina Almasry wrote:
On Mon, Aug 12, 2024 at 1:57 PM Jakub Kicinski wrote:
On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote:
I think we're talking about 2 slightly different flags, AFAIU.>
Pavel and I are suggesting the driver reports "I support mem
On 8/12/24 19:57, Pavel Begunkov wrote:
On 8/12/24 18:57, Jakub Kicinski wrote:
On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote:
I think we're talking about 2 slightly different flags, AFAIU.>
Pavel and I are suggesting the driver reports "I support memory
providers"
On 8/12/24 18:57, Jakub Kicinski wrote:
On Sun, 11 Aug 2024 22:51:13 +0100 Pavel Begunkov wrote:
I think we're talking about 2 slightly different flags, AFAIU.>
Pavel and I are suggesting the driver reports "I support memory
providers" directly to core (via the queue-api or
On 8/11/24 03:21, Mina Almasry wrote:
On Fri, Aug 9, 2024 at 11:52 PM Jakub Kicinski wrote:
On Fri, 9 Aug 2024 16:45:50 +0100 Pavel Begunkov wrote:
I think this is good, and it doesn't seem hacky to me, because we can
check the page_pools of the netdev while we hold rtnl, so we can be
p_params flag explicitly telling
if pp should use providers. It's more explicit and feels a little
less hacky.
--
Pavel Begunkov
Zhang
Signed-off-by: Mina Almasry
Same, lost tag from v13
Reviewed-by: Pavel Begunkov
And, as a follow up, would be great to clean up the loop.
Helper functions and "continue" should help to bring the
indention down.
--
Pavel Begunkov
e stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page. All pages are released when the socket is
destroyed.
Already gave it in v13, but it got lost
Reviewed-by: Pavel Begunkov
--
Pavel Begunkov
can
be handled by follow up patches.
Reviewed-by: Pavel Begunkov
diff --git a/net/core/sock.c b/net/core/sock.c
index 9abc4fe259535..040c66ac26244 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
...
+#ifdef CONFIG_PAGE_POOL
+static noinline_for_stack int
+sock_devmem_dontneed(struct sock
e stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped once the userspace indicates that it is
done reading this page. All pages are released when the socket is
destroyed.
Reviewed-by: Pavel Begunkov
--
Pavel Begunkov
On 6/21/24 21:31, Mina Almasry wrote:
On Mon, Jun 17, 2024 at 9:36 AM Pavel Begunkov wrote:
On 6/13/24 02:35, Mina Almasry wrote:
The pages awaiting freeing are stored in the newly added
sk->sk_user_frags, and each page passed to userspace is get_page()'d.
This reference is dropped
On 6/21/24 19:48, Mina Almasry wrote:
On Mon, Jun 17, 2024 at 7:17 AM Pavel Begunkov wrote:
...
static inline unsigned long netmem_to_pfn(netmem_ref netmem)
{
+ if (netmem_is_net_iov(netmem))
+ return 0;
IIRC 0 is a valid pfn. Not much of a concern since it's
used
On 6/18/24 07:43, Christoph Hellwig wrote:
On Mon, Jun 17, 2024 at 07:04:43PM +0100, Pavel Begunkov wrote:
There should be no other memory source other than the page allocator
and dmabuf. If you need different life time control for your
zero copy proposal don't mix that up with the cont
On 6/10/24 23:15, Jason Gunthorpe wrote:
On Mon, Jun 10, 2024 at 08:20:08PM +0100, Pavel Begunkov wrote:
On 6/10/24 16:16, David Ahern wrote:
There is no reason you shouldn't be able to use your fast io_uring
completion and lifecycle flow with DMABUF backed memory. Those are not
On 6/11/24 07:34, Christoph Hellwig wrote:
On Fri, Jun 07, 2024 at 02:45:55PM +0100, Pavel Begunkov wrote:
On 6/5/24 09:24, Christoph Hellwig wrote:
On Mon, Jun 03, 2024 at 03:52:32PM +0100, Pavel Begunkov wrote:
The question for Christoph is what exactly is the objection here? Why we
would
x27;s always a
struct page underneath. All the page pool internals are converted to
use struct netmem instead of struct page, and the page pool now exports
2 APIs:
1. The existing struct page API.
2. The new struct netmem API.
nits below,
Reviewed-by: Pavel Begunkov
Keeping the existing A
p;sk->sk_user_frags);
trace_tcp_destroy_sock(sk);
diff --git a/net/ipv4/tcp_minisocks.c b/net/ipv4/tcp_minisocks.c
index bc67f6b9efae4..5d563312efe14 100644
--- a/net/ipv4/tcp_minisocks.c
+++ b/net/ipv4/tcp_minisocks.c
@@ -624,6 +624,8 @@ struct sock *tcp_create_openreq_child(const struct sock *sk,
__TCP_INC_STATS(sock_net(sk), TCP_MIB_PASSIVEOPENS);
+ xa_init_flags(&newsk->sk_user_frags, XA_FLAGS_ALLOC1);
+
return newsk;
}
EXPORT_SYMBOL(tcp_create_openreq_child);
--
Pavel Begunkov
the devmem.
Support for PP_FLAG_DMA_SYNC_DEV is omitted for simplicity & p.order !=
0.
Signed-off-by: Willem de Bruijn
Signed-off-by: Kaiyuan Zhang
Signed-off-by: Mina Almasry
Comments below, apart from them
Reviewed-by: Pavel Begunkov
diff --git a/net/core/devmem.c b/net/core/devmem.
cess these common fields regardless of
whether the underlying type is page or net_iov.
Implement checks for net_iov in netmem helpers which delegate to mm
APIs, to ensure net_iov are never passed to the mm stack.
Signed-off-by: Mina Almasry
Apart from small comments below
Reviewed-by: Pave
emory region in a net_iov
struct.
Reviewed-by: Pavel Begunkov
Signed-off-by: Willem de Bruijn
Signed-off-by: Kaiyuan Zhang
Signed-off-by: Mina Almasry
---
--
Pavel Begunkov
e comment below
Reviewed-by: Pavel Begunkov # excluding netlink
diff --git a/include/net/devmem.h b/include/net/devmem.h
new file mode 100644
index 0..eaf3fd965d7a8
...
diff --git a/net/core/dev.c b/net/core/dev.c
index c361a7b69da86..84c9f96a6c9bf 100644
--- a/net/core/dev.c
+++ b
On 6/13/24 02:35, Mina Almasry wrote:
Add netdev_rx_queue_restart() function to netdev_rx_queue.h
see nit below
Reviewed-by: Pavel Begunkov
Signed-off-by: David Wei
Signed-off-by: Mina Almasry
---
v11:
- Fix not checking dev->queue_mgmt_ops (Pavel).
- Fix ndo_queue_mem_free call t
On 6/10/24 16:41, Mina Almasry wrote:
On Mon, Jun 10, 2024 at 5:38 AM Christian König
wrote:
Am 10.06.24 um 14:16 schrieb Jason Gunthorpe:
On Mon, Jun 10, 2024 at 02:07:01AM +0100, Pavel Begunkov wrote:
On 6/10/24 01:37, David Wei wrote:
On 2024-06-07 17:52, Jason Gunthorpe wrote:
IMHO it
On 6/10/24 16:16, David Ahern wrote:
On 6/10/24 6:16 AM, Jason Gunthorpe wrote:
On Mon, Jun 10, 2024 at 02:07:01AM +0100, Pavel Begunkov wrote:
On 6/10/24 01:37, David Wei wrote:
On 2024-06-07 17:52, Jason Gunthorpe wrote:
IMHO it seems to compose poorly if you can only use the io_uring
On 6/7/24 17:59, Mina Almasry wrote:
On Fri, Jun 7, 2024 at 8:47 AM Pavel Begunkov wrote:
On 6/7/24 16:42, Pavel Begunkov wrote:
On 6/7/24 15:27, David Ahern wrote:
On 6/7/24 7:42 AM, Pavel Begunkov wrote:
I haven't seen any arguments against from the (net) maintainers so
far. Nor
et,
for which a ring for returning buffers might even be a nuisance.
--
Pavel Begunkov
On 6/7/24 16:42, Pavel Begunkov wrote:
On 6/7/24 15:27, David Ahern wrote:
On 6/7/24 7:42 AM, Pavel Begunkov wrote:
I haven't seen any arguments against from the (net) maintainers so
far. Nor I see any objection against callbacks from them (considering
that either option adds an if).
I
On 6/7/24 15:27, David Ahern wrote:
On 6/7/24 7:42 AM, Pavel Begunkov wrote:
I haven't seen any arguments against from the (net) maintainers so
far. Nor I see any objection against callbacks from them (considering
that either option adds an if).
I have said before I do not understand wh
On 6/5/24 09:24, Christoph Hellwig wrote:
On Mon, Jun 03, 2024 at 03:52:32PM +0100, Pavel Begunkov wrote:
The question for Christoph is what exactly is the objection here? Why we
would not be using well defined ops when we know there will be more
users?
The point is that there should be no
On 6/3/24 16:43, Mina Almasry wrote:
On Mon, Jun 3, 2024 at 7:52 AM Pavel Begunkov wrote:
On 6/3/24 15:17, Mina Almasry wrote:
On Fri, May 31, 2024 at 10:35 PM Christoph Hellwig wrote:
On Thu, May 30, 2024 at 08:16:01PM +, Mina Almasry wrote:
I'm unsure if the discussion has
quot; in the ring buffer is useless.
netmem is a pointer with one bit serving as a flag, considering
mangling it might be better to %p it and perhaps also print its
type (page* vs iov) separately.
--
Pavel Begunkov
ges(pool, gfp);
+ if (unlikely(page_pool_is_dmabuf(pool)))
+ netmem = mp_dmabuf_devmem_alloc_pages():
else
netmem = __page_pool_alloc_pages_slow(pool, gfp);
return netmem;
--
Pavel Begunkov
art)
+ return -EOPNOTSUPP;
+
+ DEBUG_NET_WARN_ON_ONCE(!rtnl_is_locked());
--
Pavel Begunkov
hen it comes to concerns
of devmem + io_uring coexisting if you're able to take care, awesome,
if not, I can look into squashing some fix.
Let it be this way then. It's not a problem while there is
only one such a provider.
--
Pavel Begunkov
On 5/8/24 16:51, Christoph Hellwig wrote:
On Wed, May 08, 2024 at 12:35:52PM +0100, Pavel Begunkov wrote:
all these, because e.g. ttm internally does have a page pool because
depending upon allocator, that's indeed beneficial. Other drm drivers have
more buffer-based concept
On 5/8/24 16:58, Jason Gunthorpe wrote:
On Wed, May 08, 2024 at 04:44:32PM +0100, Pavel Begunkov wrote:
like a weird and indirect way to get there. Why can't io_uring just be
the entity that does the final free and not mess with the logic
allocator?
Then the user has to do a syscall
On 5/8/24 15:25, Jason Gunthorpe wrote:
On Wed, May 08, 2024 at 12:30:07PM +0100, Pavel Begunkov wrote:
I'm not going to pretend to know about page pool details, but dmabuf
is the way to get the bulk of pages into a pool within the net stack's
allocator and keep that bulk properly
On 5/8/24 08:16, Daniel Vetter wrote:
On Tue, May 07, 2024 at 08:32:47PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 08:35:37PM +0100, Pavel Begunkov wrote:
On 5/7/24 18:56, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote:
On 5/7/24 17:48
On 5/8/24 00:32, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 08:35:37PM +0100, Pavel Begunkov wrote:
On 5/7/24 18:56, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote:
On 5/7/24 17:48, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 09:42:05AM -0700
On 5/7/24 18:56, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 06:25:52PM +0100, Pavel Begunkov wrote:
On 5/7/24 17:48, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 09:42:05AM -0700, Mina Almasry wrote:
1. Align with devmem TCP to use udmabuf for your io_uring memory. I
think in the past
On 5/7/24 18:15, Mina Almasry wrote:
On Tue, May 7, 2024 at 9:55 AM Pavel Begunkov wrote:
On 5/7/24 17:23, Christoph Hellwig wrote:
On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them
t
a direct replacement for the ops, it'd mandate uapi change in a not
desirable way.
--
Pavel Begunkov
On 5/7/24 17:42, Mina Almasry wrote:
On Tue, May 7, 2024 at 9:24 AM Christoph Hellwig wrote:
On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them enough rope, and they should not have
On 5/7/24 17:23, Christoph Hellwig wrote:
On Tue, May 07, 2024 at 01:18:57PM -0300, Jason Gunthorpe wrote:
On Tue, May 07, 2024 at 05:05:12PM +0100, Pavel Begunkov wrote:
even in tree if you give them enough rope, and they should not have
that rope when the only sensible options are page/folio
e a lot of cons to that:
No. Just have branches for page based vs dmabuf in a few places.
--
Pavel Begunkov
ussion.
do here or if something more appropriate to be in the patches you
apply on top.
I additionally think you may also need to run the
page_pool_benchmark_simple tests like I do in the cover letter to see
if you're affecting those.
--
Pavel Begunkov
On 4/5/24 21:04, Oliver Crumrine wrote:
Pavel Begunkov wrote:
On 4/4/24 23:17, Oliver Crumrine wrote:
In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
specifically disabled REQ_F_CQE_SKIP, as (at least from my
understanding) the userspace program wouldn't receiv
On 4/4/24 23:17, Oliver Crumrine wrote:
In his patch to enable zerocopy networking for io_uring, Pavel Begunkov
specifically disabled REQ_F_CQE_SKIP, as (at least from my
understanding) the userspace program wouldn't receive the
IORING_CQE_F_MORE flag in the result value.
No. IORING_CQE_F
--
tools/include/io_uring/mini_liburing.h| 18 +
.../selftests/net/io_uring_zerocopy_tx.c | 37 +--
.../selftests/net/io_uring_zerocopy_tx.sh | 7 +++-
4 files changed, 59 insertions(+), 10 deletions(-)
--
Pavel Begunkov
On 3/6/24 21:59, Mina Almasry wrote:
On Wed, Mar 6, 2024 at 11:14 AM Pavel Begunkov wrote:
On 3/6/24 17:04, Mina Almasry wrote:
On Wed, Mar 6, 2024 at 6:30 AM Pavel Begunkov wrote:
On 3/5/24 22:36, Mina Almasry wrote:
...
To be honest, I think it makes sense for the TCP stack to be
On 3/6/24 17:04, Mina Almasry wrote:
On Wed, Mar 6, 2024 at 6:30 AM Pavel Begunkov wrote:
On 3/5/24 22:36, Mina Almasry wrote:
On Tue, Mar 5, 2024 at 1:55 PM David Wei wrote:
On 2024-03-04 18:01, Mina Almasry wrote:
+struct memory_provider_ops {
+ int (*init)(struct page_pool *pool
not included here
struct pp_provider_params;
struct netdev_rx_queue {
...
struct pp_provider_params *pp_params;
};
--
Pavel Begunkov
for all requests to finish so there
is no step 4 in the meantime. Might change, can be useful, but it
was much easier to hook into the pp release loop.
Another concern is who and when can reset ifq / kill pp outside
of io_uring/devmem. I assume it can happen on a whim, which is
hard to handle gracefully.
--
Pavel Begunkov
On 2/13/24 21:11, Mina Almasry wrote:
On Tue, Feb 13, 2024 at 5:28 AM Pavel Begunkov wrote:
...
A bit of a churn with the padding and nesting net_iov but looks
sturdier. No duplication, and you can just check positions of the
structure instead of per-field NET_IOV_ASSERT_OFFSET, which you
XPORT_SYMBOL(dmabuf_devmem_ops);
It might make sense to move all these functions together with
new code from core/dev.c into a new file
--
Pavel Begunkov
void *addr;
+ struct net_iov niov;
};
};
...
--
Pavel Begunkov
t should have been niov->dma_addr
+}
+
+static inline struct netdev_dmabuf_binding *
+net_iov_binding(const struct net_iov *niov)
+{
+ return net_iov_owner(niov)->binding;
+}
+
/* netmem */
struct netmem {
...
--
Pavel Begunkov
On 12/14/23 20:03, Mina Almasry wrote:
On Mon, Dec 11, 2023 at 12:37 PM Pavel Begunkov wrote:
...
If you remove the branch, let it fall into ->release and rely
on refcounting there, then the callback could also fix up
release_cnt or ask pp to do it, like in the patch I linked above
Sadl
On 12/11/23 02:30, Mina Almasry wrote:
On Sat, Dec 9, 2023 at 7:05 PM Pavel Begunkov wrote:
On 12/8/23 23:25, Mina Almasry wrote:
On Fri, Dec 8, 2023 at 2:56 PM Pavel Begunkov wrote:
On 12/8/23 00:52, Mina Almasry wrote:
...
+ if (pool->p.queue)
+ binding = READ_O
On 12/8/23 23:25, Mina Almasry wrote:
On Fri, Dec 8, 2023 at 2:56 PM Pavel Begunkov wrote:
On 12/8/23 00:52, Mina Almasry wrote:
...
+ if (pool->p.queue)
+ binding = READ_ONCE(pool->p.queue->binding);
+
+ if (binding) {
+ pool->mp_ops = &d
pool_iov(struct page *page)
+{
+ if (page_is_page_pool_iov(page))
+ return (struct page_pool_iov *)((unsigned long)page & ~PP_IOV);
+
+ DEBUG_NET_WARN_ON_ONCE(true);
+ return NULL;
+}
+
/**
* page_pool_dev_alloc_pages() - allocate a page.
* @pool: pool from which to allocate
--
Pavel Begunkov
ory_provider_ops dmabuf_devmem_ops = {
+ .init = mp_dmabuf_devmem_init,
+ .destroy = mp_dmabuf_devmem_destroy,
+ .alloc_pages= mp_dmabuf_devmem_alloc_pages,
+ .release_page = mp_dmabuf_devmem_release_page,
+};
+EXPORT_SYMBOL(dmabuf_devmem_ops);
--
Pavel Begunkov
e this.
asynchronous ring-based API would be selected, io_uring or otherwise,
I think the concise notification encoding would remain as is.
Since this is an operation on a socket, I find a setsockopt the
fitting interface.
--
Pavel Begunkov
That way you can still have the "userland
directly fills the RX ring" behaviour even with TCP sockets.
--
Pavel Begunkov
heory, but the api wouldn't suit io_uring, internals
wouldn't be properly optimised, and we can't use it with some
important features like multishot recv because of cmsg.
I'm not really concerned with faster. I would prefer something cleaner :-)
Or maybe we should just have
On 11/11/23 17:19, David Ahern wrote:
On 11/10/23 7:26 AM, Pavel Begunkov wrote:
On 11/7/23 23:03, Mina Almasry wrote:
On Tue, Nov 7, 2023 at 2:55 PM David Ahern wrote:
On 11/7/23 3:10 PM, Mina Almasry wrote:
On Mon, Nov 6, 2023 at 3:44 PM David Ahern wrote:
On 11/5/23 7:44 PM, Mina
f code if we want to have a convenient
and performant api via io_uring.
Most (all?) of this patch set can work with any memory; only device
memory is unreadable.
--
Pavel Begunkov
71 matches
Mail list logo