On 6/17/25 1:59 PM, Ilya Maximets wrote:
On 6/4/25 1:29 PM, Daniel Borkmann wrote:
While testing, it turned out that upon error in the queue creation loop,
we never trigger the af_xdp_cleanup() handler. This is because we pass
errp instead of a local err pointer into the various AF_XDP setup functions
instead of a scheme like:
bool fn(..., Error **errp)
{
Error *err = NULL;
foo(arg, &err);
if (err) {
handle the error...
error_propagate(errp, err);
return false;
}
...
}
With a conversion into the above format, the af_xdp_cleanup() handler is
called as expected.
How exactly this prevents calling the cleanup function? I don't see the
errp being checked anywhere in the qemu_del_net_client() path.
Could you provide a more detailed call sequence description where the cleanup
is not called?
I agree thought that the local err variable is actually unused. We should
be able to just remove it and remove the error_propagate() call as well.
Ok, I basically manually injected an error in af_xdp_{umem_create,socket_create,
update_xsk_map} and noticed that in fact none of the af_xdp_cleanup() callback
was called and qemu was exiting right away.
From reading up on the qemu error handling patterns that should be used via
include/qapi/error.h I noticed that this was due to passing in errp directly
rather than a local error variable as done in many other places in qemu code.
Also, making sure the XDP program will be removed does
require to set s->n_queues to i + 1 since the test is nc->queue_index ==
s->n_queues - 1, where nc->queue_index was set to i earlier.
The idea behind 'i' instead of 'i + 1' was that if either af_xdp_umem_create()
or af_xdp_socket_create() fails, we do not have xdp_flags initialized on the
last queue. And without it we can't remove the program, so we remove it while
destroying the last actually configured queue. And this is OK, because the
failed queue was not added to the program, and if the af_xdp_socket_create()
fails for the very first queue, then we don't have a program loaded at all.
With the new changes in this patch set, we have an extra function that can fail,
which is a new af_xdp_update_xsk_map(), and only if this one fails, we need to
remove the program while cleaning up the current failed queue, since it was
already created and xdp_flags are available.
If we get this patch as-is and the af_xdp_socket_create() fails, we will not
remove the program, AFAICT.
I'll double check this concern and see if it can be solved (iirc we do test for
s->xdp_flags in the cleanup callback).. in case of xsk map, it should not detach
anything from an XDP program PoV (given inhibit) but rather it should remove
prior installed xsk sockets from the xsk map to not leave them around.
Best,
Daniel