On 5/12/25 6:06 PM, Daniel Borkmann wrote: > Extend 'inhibit=on' setting with the option to specify a pinned XSK map > path along with a starting index (default 0) to push the created XSK > sockets into. Example usage: > > # ./build/qemu-system-x86_64 [...] \ > -netdev > af-xdp,ifname=enp2s0f0np0,id=net0,mode=native,queues=2,start-queue=14,inhibit=on,map-path=/sys/fs/bpf/xsks_map,map-start-index=14 > \ > -device virtio-net-pci,netdev=net0 [...] > > This is useful for the case where an existing XDP program with XSK map > is present on the AF_XDP supported phys device and the XSK map is not > yet populated. For example, the former could have been pre-loaded onto > the netdevice by a control plane, which later launches qemu to populate > it with XSK sockets. > > Normally, the main idea behind 'inhibit=on' is that the qemu instance > doesn't need to have a lot of privileges to use the pre-loaded program > and the pre-created sockets, but this mentioned use-case here is different > where qemu still needs privileges to create the sockets. > > The 'map-start-index' parameter is optional and defaults to 0. It allows > flexible placement of the XSK sockets, and is up to the user to specify > when the XDP program with XSK map was already preloaded. In the simplest > case the queue-to-map-slot mapping is just 1:1 based on ctx->rx_queue_index > but the user might as well have a different scheme (or smaller map size, > e.g. ctx->rx_queue_index % max_size) to push the inbound traffic to one > of the XSK sockets. > > Signed-off-by: Daniel Borkmann <dan...@iogearbox.net> > Cc: Ilya Maximets <i.maxim...@ovn.org> > Cc: Anton Protopopov <as...@isovalent.com> > --- > net/af-xdp.c | 65 +++++++++++++++++++++++++++++++++++++++++++------ > qapi/net.json | 24 ++++++++++++------ > qemu-options.hx | 3 +++ > 3 files changed, 77 insertions(+), 15 deletions(-)
Hi, Daniel. Thanks for v2 and sorry for delayed response. I tried it out for myself and the change does indeed work as expected. And it's quite nice! So I looked closely at the code this time. See some comments below. > > diff --git a/net/af-xdp.c b/net/af-xdp.c > index 01c5fb914e..494b894706 100644 > --- a/net/af-xdp.c > +++ b/net/af-xdp.c > @@ -51,6 +51,8 @@ typedef struct AFXDPState { > > uint32_t n_queues; > uint32_t xdp_flags; > + char *map_path; > + uint32_t map_start_index; > bool inhibit; > } AFXDPState; > > @@ -261,6 +263,7 @@ static void af_xdp_send(void *opaque) > static void af_xdp_cleanup(NetClientState *nc) > { > AFXDPState *s = DO_UPCAST(AFXDPState, nc, nc); > + int pin_fd, idx; > > qemu_purge_queued_packets(nc); > > @@ -282,6 +285,22 @@ static void af_xdp_cleanup(NetClientState *nc) > "af-xdp: unable to remove XDP program from '%s', ifindex: > %d\n", > s->ifname, s->ifindex); > } I'd add an empty line here. The statements above and below are not really related. > + if (s->map_path) { > + pin_fd = bpf_obj_get(s->map_path); > + if (pin_fd < 0) { > + fprintf(stderr, > + "af-xdp: unable to remove %s's XSK sockets from '%s', > ifindex: %d\n", > + s->ifname, s->map_path, s->ifindex); Maybe it makes more sense to just not close it? i.e. keep the open file descriptor in the AFXDPState and use and close it on cleanup. This way we can be sure that the map doesn't just go away underneath us and so we can guarantee a proper cleanup. WDYT? Also, the ifindex isn't really part of this operation, so it may be reasonable to not print it here saving some line length. > + } else { > + idx = nc->queue_index; > + if (s->map_start_index > 0) { Should probably check the has_ just in case. > + idx += s->map_start_index; > + } > + bpf_map_delete_elem(pin_fd, &idx); The uAPI says that this call can also fail. We should warn the user about this failure. > + close(pin_fd); > + } > + } > + g_free(s->map_path); > } > > static int af_xdp_umem_create(AFXDPState *s, int sock_fd, Error **errp) > @@ -343,7 +362,7 @@ static int af_xdp_socket_create(AFXDPState *s, > .bind_flags = XDP_USE_NEED_WAKEUP, > .xdp_flags = XDP_FLAGS_UPDATE_IF_NOEXIST, > }; > - int queue_id, error = 0; > + int queue_id, pin_fd, xsk_fd, idx, error = 0; > > s->inhibit = opts->has_inhibit && opts->inhibit; > if (s->inhibit) { > @@ -384,6 +403,23 @@ static int af_xdp_socket_create(AFXDPState *s, > } > } > > + if (!error && s->map_path) { > + pin_fd = bpf_obj_get(s->map_path); > + if (pin_fd < 0) { > + error = errno; > + } else { > + xsk_fd = xsk_socket__fd(s->xsk); > + idx = s->nc.queue_index; > + if (s->map_start_index) { > + idx += s->map_start_index; > + } > + if (bpf_map_update_elem(pin_fd, &idx, &xsk_fd, 0)) { > + error = errno; > + } > + close(pin_fd); > + } > + } > + This part doesn't seem to belong in this function. It may be better if we just create sockets here and create a separate function af_xdp_update_bpf_map or something like that and call it from the main net_init_af_xdp. Main reasons for that are mostly related to the error path: 1. The error message below is a bit misleading and doesn't really tell what happened. For example, if the map_path is incorrect, then we get: "failed to create AF_XDP socket ... : No such path or directory". There is no mention of the map path that we tried to open and failed and the information is kinda wrong as we did actually create the socket. 2. It's better if have xdp_flags properly initialized when the socket is actually created and not error out of this function early. This may not matter today, but seems cleaner and may save us from some headaches in the future. > if (error) { > error_setg_errno(errp, error, > "failed to create AF_XDP socket for %s queue_id: > %d", > @@ -465,8 +501,12 @@ int net_init_af_xdp(const Netdev *netdev, > return -1; > } > > - if ((opts->has_inhibit && opts->inhibit) != !!opts->sock_fds) { > - error_setg(errp, "'inhibit=on' requires 'sock-fds' and vice versa"); > + if ((opts->has_inhibit && opts->inhibit) != !!(opts->sock_fds || > opts->map_path)) { > + if (opts->has_inhibit && opts->inhibit) { > + error_setg(errp, "'inhibit=on' requires setting 'sock-fds' or > 'map-path'"); > + } else { > + error_setg(errp, "'sock-fds' or 'map-path' requires setting > 'inhibit=on'"); > + } > return -1; > } This is getting a little hard to navigate even after the re-wording, I think. And we also need to cover more cases now. So, I'd suggest to just bite the bullet and spell out every case separately, e.g.: inhibit = opts->has_inhibit && opts->inhibit; if (inhibit && !opts->sock_fds && !opts->map_path) { error_setg(errp, "'inhibit=on' requires 'sock-fds' or 'map-path'"); return -1; } if (!inhibit && (opts->sock_fds || opts->map_path)) { error_setg(errp, "'sock-fds' and 'map-path' require 'inhibit=on'"); return -1; } if (opts->sock_fds && opts->map_path) { error_setg(errp, "'sock-fds' and 'map-path' are mutually exclusive"); return -1; } This should be easier to understand and also saves some line lengths. WDYT? The exclusivity check here is because we expect the file descriptors in the sock-fds to be already in the map. It would be an awkward configuration if we have the fds and the map, so it might be better to just prohibit this. We'll also need a check that map-start-index requires the map-path. > > @@ -491,6 +531,12 @@ int net_init_af_xdp(const Netdev *netdev, > pstrcpy(s->ifname, sizeof(s->ifname), opts->ifname); > s->ifindex = ifindex; > s->n_queues = queues; I'd put an empty line here. > + if (opts->map_path) { This check doesn't seem necessary, g_strdup is NULL-safe. > + s->map_path = g_strdup(opts->map_path); > + if (opts->has_map_start_index && opts->map_start_index > 0) { > + s->map_start_index = opts->map_start_index; > + } > + } > > if (af_xdp_umem_create(s, sock_fds ? sock_fds[i] : -1, errp) > || af_xdp_socket_create(s, opts, errp)) { > @@ -504,10 +550,15 @@ int net_init_af_xdp(const Netdev *netdev, > if (nc0) { > s = DO_UPCAST(AFXDPState, nc, nc0); > if (bpf_xdp_query_id(s->ifindex, s->xdp_flags, &prog_id) || > !prog_id) { > - error_setg_errno(errp, errno, > - "no XDP program loaded on '%s', ifindex: %d", > - s->ifname, s->ifindex); > - goto err; > + if (!s->inhibit) { The s->inhibit is set as a side effect of af_xdp_socket_create(). It was a right decision when the socket creation and the cleanup were the only users. But since we're using it now here, we should move setting of this value to the loop above. > + error_setg_errno(errp, errno, > + "no XDP program loaded on '%s', ifindex: > %d", > + s->ifname, s->ifindex); > + goto err; > + } else { > + warn_report("no XDP program loaded yet on '%s', ifindex: %d", > + s->ifname, s->ifindex); > + } > } > } > > diff --git a/qapi/net.json b/qapi/net.json > index 310cc4fd19..66a1fcf6ae 100644 > --- a/qapi/net.json > +++ b/qapi/net.json > @@ -454,7 +454,7 @@ > # (default: 0). > # > # @inhibit: Don't load a default XDP program, use one already loaded > -# to the interface (default: false). Requires @sock-fds. > +# to the interface (default: false). Requires @sock-fds or @map-path. > # > # @sock-fds: A colon (:) separated list of file descriptors for > # already open but not bound AF_XDP sockets in the queue order. > @@ -462,17 +462,25 @@ > # into XDP socket map for corresponding queues. Requires > # @inhibit. > # > +# @map-path: The path to a pinned xsk map to push file descriptors > +# for bound AF_XDP sockets into. Requires @inhibit. (Since 10.1) > +# > +# @map-start-index: Use @map-path to insert xsk sockets starting from > +# this index number (default: 0). Requires @map-path. (Since 10.1) The doc uses double spaces between sentences. Also, may mention that map-path and sock-fds are mutually exclusive. For both options. > +# > # Since: 8.2 > ## > { 'struct': 'NetdevAFXDPOptions', > 'data': { > - 'ifname': 'str', > - '*mode': 'AFXDPMode', > - '*force-copy': 'bool', > - '*queues': 'int', > - '*start-queue': 'int', > - '*inhibit': 'bool', > - '*sock-fds': 'str' }, > + 'ifname': 'str', > + '*mode': 'AFXDPMode', > + '*force-copy': 'bool', > + '*queues': 'int', > + '*start-queue': 'int', > + '*inhibit': 'bool', > + '*sock-fds': 'str', > + '*map-path': 'str', > + '*map-start-index': 'int' }, > 'if': 'CONFIG_AF_XDP' } > > ## > diff --git a/qemu-options.hx b/qemu-options.hx > index dc694a99a3..50fc592f5d 100644 > --- a/qemu-options.hx > +++ b/qemu-options.hx > @@ -2909,6 +2909,7 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, > #ifdef CONFIG_AF_XDP > "-netdev > af-xdp,id=str,ifname=name[,mode=native|skb][,force-copy=on|off]\n" > " > [,queues=n][,start-queue=m][,inhibit=on|off][,sock-fds=x:y:...:z]\n" > + " [,map-path=/path/to/socket/map][,map-start-index=i]\n" > " attach to the existing network interface 'name' with > AF_XDP socket\n" > " use 'mode=MODE' to specify an XDP program attach mode\n" > " use 'force-copy=on|off' to force XDP copy mode even if > device supports zero-copy (default: off)\n" > @@ -2916,6 +2917,8 @@ DEF("netdev", HAS_ARG, QEMU_OPTION_netdev, > " with inhibit=on,\n" > " use 'sock-fds' to provide file descriptors for > already open AF_XDP sockets\n" > " added to a socket map in XDP program. One socket per > queue.\n" > + " use 'map-path' to provide the socket map location to > populate AF_XDP sockets with\n" > + " beginning from the 'map-start-index' specified index > (default: 0) (Since 10.1)\n" > " use 'queues=n' to specify how many queues of a > multiqueue interface should be used\n" > " use 'start-queue=m' to specify the first queue that > should be used\n" > #endif There is another section below with all the options listed and some examples. It will need an update. Not sure if we need an example for this new use case, but if you can make a small one with just using bpftool, that maight be useful. Best regards, Ilya Maximets.