On 1/27/26 06:48, Bobby Eshleman wrote:
On Mon, Jan 26, 2026 at 10:00 PM Stanislav Fomichev
<[email protected]> wrote:
On 01/26, Jakub Kicinski wrote:
On Mon, 26 Jan 2026 10:45:22 -0800 Bobby Eshleman wrote:
I'm onboard with improving what we have since it helps all of us
currently using this API, though I'm not opposed to discussing a
redesign in another thread/RFC. I do see the attraction to locating the
core logic in one place and possibly reducing some complexity around
socket/binding relationships.
FWIW regarding nl, I do see it supports rtnl lock-free operations via
'62256f98f244 rtnetlink: add RTNL_FLAG_DOIT_UNLOCKED' and routing was
recently made lockless with that. I don't see / know of any fast path
precedent. I'm aware there are some things I'm not sure about being
relevant performance-wise, like hitting skb alloc an additional time
every release batch. I'd want to do some minimal latency comparisons
between that path and sockopt before diving head-first.
FTR I'm not really pushing Netlink specifically, it may work it
may not. Perhaps some other ioctl-y thing exists. Just in general
setsockopt() on a specific socket feels increasingly awkward for
buffer flow. Maybe y'all disagree.
I thought I'd clarify since I may be seen as "Mr Netlink Everywhere" :)
From my side, if we do a completely new uapi, my preference would be on
an af_xdp like mapped rings (presumably on a netlink socket?) to completely
avoid the user-kernel copies.
I second liking that approach. No put_cmsg() and or token alloc overhead (both
jump up in my profiling).
Hmm, makes me wonder why not use zcrx instead of reinventing it? It
doesn't bind net_iov to sockets just as you do in this series. And it
also returns buffers back via a shared ring. Otherwise you'll be facing
same issues, like rings running out of space, and so you will need to
have a fallback path. And user space will need to synchronise the ring
if it's shared with other threads, and there will be a question of how
to scale it next, possibly by creating multiple rings as I'll likely to
do soon for zcrx.
--
Pavel Begunkov