29.01.2021 10:35, Roman Kagan wrote:
On Fri, Jan 29, 2021 at 08:51:39AM +0300, Vladimir Sementsov-Ogievskiy wrote:
28.01.2021 23:14, Roman Kagan wrote:
During the final phase of migration the NBD reconnection logic may
encounter situations it doesn't expect during regular operation.
This series addresses some of them that make qemu crash. They are
reproducible when a vm with a secondary drive attached via nbd with
non-zero "reconnect-delay" runs a stress load (fio with big queue depth)
in the guest on that drive and is migrated (e.g. to a file), while the
nbd server is SIGKILL-ed and restarted every second.
See the individual patches for specific crash conditions and more
detailed explanations.
Roman Kagan (3):
block/nbd: only detach existing iochannel from aio_context
block/nbd: only enter connection coroutine if it's present
nbd: make nbd_read* return -EIO on error
include/block/nbd.h | 7 ++++---
block/nbd.c | 25 +++++++++++++++++--------
2 files changed, 21 insertions(+), 11 deletions(-)
Thanks a lot for fixing!
Do you have some reproducer scripts? Could you post them or may be add
an iotest?
I don't have it scripted, just ad hoc command lines. I'll look into
making up a test. Can you perhaps suggest what existing test to base
on?
For now reconnect feature is covered only by two tests tests/qemu-iotests/264
and tests/qemu-iotests/277.
Also note, that since "f203080bbd iotests: rewrite check into python" you should add new
iotests with human-readable file names into tests/qemu-iotests/tests subdirectory. Also you don't
need to update tests/qemu-iotests/group file (it's absent now), test groups are defined in tests
themselves in a comment, like "# group: rw quick".
--
Best regards,
Vladimir