Hi On Mon, Apr 20, 2020 at 5:10 AM Li Feng <fen...@smartx.com> wrote: > > Hi Lureau, > > Thanks for your comment. > I have spent some time to reproduce this crash, so that delay my > reply, apologies. > > This is the description of this bug using gdb: > 1. Set break points using gdb; > b chardev/char-socket.c:410 if s->state == TCP_CHARDEV_STATE_DISCONNECTED > b break vhost_user_write > > 2. hotplug a vhost-blk device: > echo "chardev-add > socket,id=spdk_vhost_blk2,path=/vhost-blk.0,reconnect=1 "| nc -U > /tmp/a.socket > echo "device_add > vhost-user-blk-pci,chardev=spdk_vhost_blk2,id=spdk_vhost_blk2,num-queues=4" > | nc -U /tmp/a.socket > > 3. Gdb will stop at vhost_user_write, and CTRL-C the vhost target. > 4. Put 'c' to let gdb continue. > You will see a crash: > > 192 login: qemu-system-x86_64: chardev/char-socket.c:125: > qemu_chr_socket_restart_timer: Assertion `!s->reconnect_timer' failed. > > The related code: > 394 static void tcp_chr_free_connection(Chardev *chr) > 395 { > 396 SocketChardev *s = SOCKET_CHARDEV(chr); > 397 int i; > 398 > 399 if (s->read_msgfds_num) { > 400 for (i = 0; i < s->read_msgfds_num; i++) { > 401 close(s->read_msgfds[i]); > 402 } > 403 g_free(s->read_msgfds); > 404 s->read_msgfds = NULL; > 405 s->read_msgfds_num = 0; > 406 } > 407 > 408 remove_hup_source(s); > 409 > 410 tcp_set_msgfds(chr, NULL, 0); > 411 remove_fd_in_watch(chr); > 412 object_unref(OBJECT(s->sioc)); > 413 s->sioc = NULL; > 414 object_unref(OBJECT(s->ioc)); > 415 s->ioc = NULL; > 416 g_free(chr->filename); > 417 chr->filename = NULL; > 418 tcp_chr_change_state(s, TCP_CHARDEV_STATE_DISCONNECTED); > 419 } > > #0 0x0000555555c1aae8 in tcp_chr_free_connection > (chr=chr@entry=0x55555751ee00) at chardev/char-socket.c:410 > #1 0x0000555555c1aec8 in tcp_chr_disconnect_locked > (chr=0x55555751ee00) at chardev/char-socket.c:479 > #2 0x0000555555c1af5d in tcp_chr_disconnect (chr=0x55555751ee00) at > chardev/char-socket.c:497 > #3 0x0000555555c16715 in qemu_chr_fe_set_handlers > (b=b@entry=0x5555575f3b48, fd_can_read=fd_can_read@entry=0x0, > fd_read=fd_ > read@entry=0x0, fd_event=fd_event@entry=0x55555588e740 > <vhost_user_blk_event>, be_change=be_change@entry=0x0, opaque=opaque@ > entry=0x5555575f3990, context=0x0, set_open=true) at chardev/char-fe.c:304 > #4 0x000055555588e5c0 in vhost_user_blk_device_realize > (dev=0x5555575f3990, errp=<optimized out>) > at /root/qemu-master/hw/block/vhost-user-blk.c:447 > #5 0x00005555558ca478 in virtio_device_realize (dev=0x5555575f3990, > errp=0x7fffffffbb90) > at /root/qemu-master/hw/virtio/virtio.c:3615 > #6 0x00005555559dc842 in device_set_realized (obj=<optimized out>, > value=<optimized out>, errp=0x7fffffffbd20) > at hw/core/qdev.c:891 > #7 0x0000555555b78e07 in property_set_bool (obj=0x5555575f3990, > v=<optimized out>, name=<optimized out>, opaque=0x555556453 > 040, errp=0x7fffffffbd20) at qom/object.c:2238 > #8 0x0000555555b7db3f in object_property_set_qobject > (obj=obj@entry=0x5555575f3990, value=value@entry=0x555556ab89c0, name= > name@entry=0x555555d15308 "realized", errp=errp@entry=0x7fffffffbd20) > at qom/qom-qobject.c:26 > #9 0x0000555555b7b2c5 in object_property_set_bool > (obj=0x5555575f3990, value=<optimized out>, name=0x555555d15308 > "realized > ", errp=0x7fffffffbd20) at qom/object.c:1390 > #10 0x0000555555af13b2 in virtio_pci_realize (pci_dev=0x5555575eb800, > errp=0x7fffffffbd20) at hw/virtio/virtio-pci.c:1807 > #11 0x0000555555a7a14b in pci_qdev_realize (qdev=<optimized out>, > errp=<optimized out>) at hw/pci/pci.c:2098 > #12 0x00005555559dc842 in device_set_realized (obj=<optimized out>, > value=<optimized out>, errp=0x7fffffffbef8) > at hw/core/qdev.c:891 > > (gdb) p s > $23 = (SocketChardev *) 0x55555751ee00 > (gdb) p s->state > $24 = TCP_CHARDEV_STATE_DISCONNECTED > > At 410 line of char-socket.c, the state has been changed to > TCP_CHARDEV_STATE_DISCONNECTED before tcp_chr_change_state(s, > TCP_CHARDEV_STATE_DISCONNECTED); > It means the tcp_chr_disconnect_locked is called by more than one > times, and the tcp socket has been freed before. > > The crash reason is s->reconnect_timer is set in the previous call of > tcp_chr_disconnect_locked. > The another approach fix maybe like this: > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > index 185fe38dda..94957de367 100644 > --- a/chardev/char-socket.c > +++ b/chardev/char-socket.c > @@ -486,7 +486,7 @@ static void tcp_chr_disconnect_locked(Chardev *chr) > if (emit_close) { > qemu_chr_be_event(chr, CHR_EVENT_CLOSED); > } > - if (s->reconnect_time) { > + if (s->reconnect_time && !s->reconnect_timer) { > qemu_chr_socket_restart_timer(chr); > } > } > > The base code is here: > 474 static void tcp_chr_disconnect_locked(Chardev *chr) > 475 { > 476 SocketChardev *s = SOCKET_CHARDEV(chr); > 477 bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED; > 478 > 479 tcp_chr_free_connection(chr); > 480 > 481 if (s->listener) { > 482 qio_net_listener_set_client_func_full(s->listener, tcp_chr_accept, > 483 chr, NULL, chr->gcontext); > 484 } > 485 update_disconnected_filename(s); > 486 if (emit_close) { > 487 qemu_chr_be_event(chr, CHR_EVENT_CLOSED); > 488 } > 489 if (s->reconnect_time) { > 490 qemu_chr_socket_restart_timer(chr); > 491 } > 492 } > > Which one is better?
+ if (s->reconnect_time && !s->reconnect_timer) Looks like a good solution to me. > > I don't know how to inject an error at socket write when writing tests > code(tests/test-char.c ). > Any tips? You could check the patch I just sent fixing a related issue: https://patchew.org/QEMU/20200420112012.567284-1-marcandre.lur...@redhat.com/ In your case, it might be as easy as calling qemu_chr_fe_disconnect() 2 times on a reconnect socket, I'll let you figure out. Thanks! > > Thanks, > Feng Li > > Marc-André Lureau <marcandre.lur...@gmail.com> 于2020年4月15日周三 下午6:35写道: > > > > > Hi > > > > On Wed, Apr 15, 2020 at 5:31 AM Li Feng <fen...@smartx.com> wrote: > > > > > > double call tcp_chr_free_connection generates a crash. > > > > > > Signed-off-by: Li Feng <fen...@smartx.com> > > > > Could you describe how you reach the "double call". > > > > Even better would be to write a test for it in tests/test-char.c > > > > thanks > > > > > --- > > > chardev/char-socket.c | 5 +++++ > > > 1 file changed, 5 insertions(+) > > > > > > diff --git a/chardev/char-socket.c b/chardev/char-socket.c > > > index 185fe38dda..43aab8f24b 100644 > > > --- a/chardev/char-socket.c > > > +++ b/chardev/char-socket.c > > > @@ -476,6 +476,11 @@ static void tcp_chr_disconnect_locked(Chardev *chr) > > > SocketChardev *s = SOCKET_CHARDEV(chr); > > > bool emit_close = s->state == TCP_CHARDEV_STATE_CONNECTED; > > > > > > + /* avoid re-enter when socket read/write error and disconnect event. > > > */ > > > + if (s->state == TCP_CHARDEV_STATE_DISCONNECTED) { > > > + return; > > > + } > > > + > > > tcp_chr_free_connection(chr); > > > > > > if (s->listener) { > > > -- > > > 2.11.0 > > > > > > > > > -- > > > The SmartX email address is only for business purpose. Any sent message > > > that is not related to the business is not authorized or permitted by > > > SmartX. > > > 本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权. > > > > > > > > > > > > > > > -- > > Marc-André Lureau > > -- > The SmartX email address is only for business purpose. Any sent message > that is not related to the business is not authorized or permitted by > SmartX. > 本邮箱为北京志凌海纳科技有限公司(SmartX)工作邮箱. 如本邮箱发出的邮件与工作无关,该邮件未得到本公司任何的明示或默示的授权. > > -- Marc-André Lureau