On 4/28/25 9:46 PM, Eric Blake wrote:
> From: "Richard W.M. Jones" <rjo...@redhat.com>
> 
> Enable NBD multi-conn by spreading operations across multiple
> connections.
> 
> (XXX) This uses a naive round-robin approach which could be improved.
> For example we could look at how many requests are in flight and
> assign operations to the connections with fewest.  Or we could try to
> estimate (based on size of requests outstanding) the load on each
> connection.  But this implementation doesn't do any of that.
> 
> Signed-off-by: Richard W.M. Jones <rjo...@redhat.com>
> Message-ID: <20230309113946.1528247-5-rjo...@redhat.com>
> ---
>  block/nbd.c | 67 +++++++++++++++++++++++++++++++++++++++--------------
>  1 file changed, 49 insertions(+), 18 deletions(-)
> 
> diff --git a/block/nbd.c b/block/nbd.c
> index 19da1a7a1fe..bf5bc57569c 100644
> --- a/block/nbd.c
> +++ b/block/nbd.c
> 
> [...]


> @@ -2207,24 +2233,29 @@ static const char *const nbd_strong_runtime_opts[] = {
>  static void nbd_cancel_in_flight(BlockDriverState *bs)
>  {
>      BDRVNBDState *s = (BDRVNBDState *)bs->opaque;
> -    NBDConnState * const cs = s->conns[0];
> +    size_t i;
> +    NBDConnState *cs;
> 
> -    reconnect_delay_timer_del(cs);
> +    for (i = 0; i < MAX_MULTI_CONN; ++i) {
> +        cs = s->conns[i];
> 
> -    qemu_mutex_lock(&cs->requests_lock);
> -    if (cs->state == NBD_CLIENT_CONNECTING_WAIT) {
> -        cs->state = NBD_CLIENT_CONNECTING_NOWAIT;
> +        reconnect_delay_timer_del(cs);
> +

This code is causing iotests/{185,264,281} to segfault.  E.g.:

> (gdb) bt
> #0  0x000055bbaec58119 in reconnect_delay_timer_del (cs=0x0) at 
> ../block/nbd.c:205
> #1  0x000055bbaec5d8e4 in nbd_cancel_in_flight (bs=0x55bbb1458020) at 
> ../block/nbd.c:2242
> #2  0x000055bbaec4ff16 in bdrv_cancel_in_flight (bs=0x55bbb1458020) at 
> ../block/io.c:3737
> #3  0x000055bbaec54ec1 in mirror_cancel (job=0x55bbb21ce800, force=true) at 
> ../block/mirror.c:1335
> #4  0x000055bbaec18278 in job_cancel_async_locked (job=0x55bbb21ce800, 
> force=true) at ../job.c:893
> #5  0x000055bbaec18df2 in job_cancel_locked (job=0x55bbb21ce800, force=true) 
> at ../job.c:1143
> #6  0x000055bbaec18ef3 in job_force_cancel_err_locked (job=0x55bbb21ce800, 
> errp=0x7fff44f247a0) at ../job.c:1190
> #7  0x000055bbaec192a4 in job_finish_sync_locked (job=0x55bbb21ce800, 
> finish=0x55bbaec18ed2 <job_force_cancel_err_locked>, errp=0x0) at 
> ../job.c:1253
> #8  0x000055bbaec18f2e in job_cancel_sync_locked (job=0x55bbb21ce800, 
> force=true) at ../job.c:1196
> #9  0x000055bbaec19086 in job_cancel_sync_all () at ../job.c:1214
> #10 0x000055bbaed55177 in qemu_cleanup (status=0) at ../system/runstate.c:949
> #11 0x000055bbaedd0aad in qemu_default_main (opaque=0x0) at 
> ../system/main.c:51
> #12 0x000055bbaedd0b4f in main (argc=21, argv=0x7fff44f249d8) at 
> ../system/main.c:80

We're dereferencing a pointer to NBDConnState that was never
initialized.  So it should either be

> -    for (i = 0; i < MAX_MULTI_CONN; ++i) {
> +    for (i = 0; i < s->multi_conn; ++i) {

or

> -        cs = s->conns[i];
> +        if (s->conns[i]) {
> +            cs = s->conns[i];
> +        ...

[...]

Andrey

Reply via email to