[Yahoo-eng-team] [Bug 2048837] [NEW] Concurrent deletion of instances leads to residual multipath

Zhong Zhou Tue, 09 Jan 2024 22:36:23 -0800

Public bug reported:

Description
===========
A 100G **iSCSI** **shared** volume was attached to 3 instances scheduled on the 
same node(node-2), then I deleted these 3 instances concurrently, the 3 
instances could be deleted but the output of command 'multipath -ll' shown 
exception as follows.


[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 23:0:0:39 sdi 8:128 failed faulty running
  |- 18:0:0:39 sdo 8:224 failed faulty running
  |- 21:0:0:39 sdm 8:192 failed faulty running
  `- 20:0:0:39 sdp 8:240 failed faulty running


Steps to reproduce
==================
1.Booting 3 instances using RBD as root disk, there is no requirement for the 
protocol type of the system disk in this step.
2.Creating a iSCSI shared volume as the data disk of the instance, you may 
using commercial storage or other storage systems using the iSCSIs protocol.
3.Attaching the shared volume to the 3 instances separately.
4.Make sure all the instances were mounted successfully, then delete the 
instances concurrently.

Expected result
===============
The 3 instances could be deleted completely, and no residual multipaths when 
execute 'multipath -ll'.

Actual result
=============
The 3 instances could be deleted, but the node had residual multipaths, as you 
can see the output from description above.

Environment
===========
1. Exact version of OpenStack you are running. See the following
   Wallaby Nova & Cinder, docked a commercial storage using iSCSI.

2. Which hypervisor did you use?
   Libvirt 8.0.0 + qemu-kvm 6.2.0


2. Which storage type did you use?
   Using RBD as root disk, 1 shared iSCSI volume as data-disk to 3 instances 
scheduled on the same node.

3. Which networking type did you use?
   omit...

Logs & Configs
==============
According to code of deleting, nova will not disconnect the shared volume from 
instance when the volume also attached to the other instances on the same node, 
then log 'Detected multiple connections on this host for volume'. node-2 
nova-compute output:

2024-01-10 11:05:29.904 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:29.904196604+08:00 stdout F 2024-01-10 11:05:29.903 59580 INFO 
nova.virt.libvirt.driver [req-c9082d4c-457a-4859-a0be-c2c23953a17c 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.143 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:30.143536178+08:00 stdout F 2024-01-10 11:05:30.143 59580 INFO 
nova.virt.libvirt.driver [req-065c2b2b-ae16-453f-abb7-a5756ed87f3a 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
2024-01-10 11:05:30.334 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:30.334997487+08:00 stdout F 2024-01-10 11:05:30.334 59580 INFO 
nova.virt.libvirt.driver [req-41afd565-599f-4b35-b4cb-acf074332079 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m

And oslo:
[root@node-2 ~]# multipath -ll
Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua failed)
Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua failed)
mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
|-+- policy='round-robin 0' prio=0 status=enabled
| |- 24:0:0:39 sdj 8:144 failed faulty running
| |- 17:0:0:39 sdl 8:176 failed faulty running
| |- 22:0:0:39 sdk 8:160 failed faulty running
| `- 19:0:0:39 sdn 8:208 failed faulty running
`-+- policy='round-robin 0' prio=0 status=enabled
  |- 23:0:0:39 sdi 8:128 failed faulty running
  |- 18:0:0:39 sdo 8:224 failed faulty running
  |- 21:0:0:39 sdm 8:192 failed faulty running
  `- 20:0:0:39 sdp 8:240 failed faulty running

** Affects: nova
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Yahoo!
Engineering Team, which is subscribed to OpenStack Compute (nova).
https://bugs.launchpad.net/bugs/2048837

Title:
  Concurrent deletion of instances leads to residual multipath

Status in OpenStack Compute (nova):
  New

Bug description:
  Description
  ===========
  A 100G **iSCSI** **shared** volume was attached to 3 instances scheduled on 
the same node(node-2), then I deleted these 3 instances concurrently, the 3 
instances could be deleted but the output of command 'multipath -ll' shown 
exception as follows.

  [root@node-2 ~]# multipath -ll
  Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua 
failed)
  mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
  size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=0 status=enabled
  | |- 24:0:0:39 sdj 8:144 failed faulty running
  | |- 17:0:0:39 sdl 8:176 failed faulty running
  | |- 22:0:0:39 sdk 8:160 failed faulty running
  | `- 19:0:0:39 sdn 8:208 failed faulty running
  `-+- policy='round-robin 0' prio=0 status=enabled
    |- 23:0:0:39 sdi 8:128 failed faulty running
    |- 18:0:0:39 sdo 8:224 failed faulty running
    |- 21:0:0:39 sdm 8:192 failed faulty running
    `- 20:0:0:39 sdp 8:240 failed faulty running

  
  Steps to reproduce
  ==================
  1.Booting 3 instances using RBD as root disk, there is no requirement for the 
protocol type of the system disk in this step.
  2.Creating a iSCSI shared volume as the data disk of the instance, you may 
using commercial storage or other storage systems using the iSCSIs protocol.
  3.Attaching the shared volume to the 3 instances separately.
  4.Make sure all the instances were mounted successfully, then delete the 
instances concurrently.

  Expected result
  ===============
  The 3 instances could be deleted completely, and no residual multipaths when 
execute 'multipath -ll'.

  Actual result
  =============
  The 3 instances could be deleted, but the node had residual multipaths, as 
you can see the output from description above.

  Environment
  ===========
  1. Exact version of OpenStack you are running. See the following
     Wallaby Nova & Cinder, docked a commercial storage using iSCSI.

  2. Which hypervisor did you use?
     Libvirt 8.0.0 + qemu-kvm 6.2.0

  
  2. Which storage type did you use?
     Using RBD as root disk, 1 shared iSCSI volume as data-disk to 3 instances 
scheduled on the same node.

  3. Which networking type did you use?
     omit...

  Logs & Configs
  ==============
  According to code of deleting, nova will not disconnect the shared volume 
from instance when the volume also attached to the other instances on the same 
node, then log 'Detected multiple connections on this host for volume'. node-2 
nova-compute output:

  2024-01-10 11:05:29.904 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:29.904196604+08:00 stdout F 2024-01-10 11:05:29.903 59580 INFO 
nova.virt.libvirt.driver [req-c9082d4c-457a-4859-a0be-c2c23953a17c 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
  2024-01-10 11:05:30.143 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:30.143536178+08:00 stdout F 2024-01-10 11:05:30.143 59580 INFO 
nova.virt.libvirt.driver [req-065c2b2b-ae16-453f-abb7-a5756ed87f3a 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m
  2024-01-10 11:05:30.334 +0800 ¦ node-2 ¦ nova-compute-d94f6 ¦ nova-compute ¦ 
2024-01-10T11:05:30.334997487+08:00 stdout F 2024-01-10 11:05:30.334 59580 INFO 
nova.virt.libvirt.driver [req-41afd565-599f-4b35-b4cb-acf074332079 
fa0faf20c0e84275a5505eb6cb2673a8 793aac4869d643b19e60248715c3735b - default 
default] Detected multiple connections on this host for volume: 
f31b8fd2-1651-4667-af05-7364ac501cf9, skipping target disconnect.^[[00m

  And oslo:
  [root@node-2 ~]# multipath -ll
  Jan 10 10:25:42 | sdj: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdl: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdk: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdn: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdi: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdo: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdm: prio = const (setting: emergency fallback - alua 
failed)
  Jan 10 10:25:42 | sdp: prio = const (setting: emergency fallback - alua 
failed)
  mpathaj (36001405acb21c8bbf33e1449b295c517) dm-2 ESSTOR,IBLOCK
  size=100G features='1 queue_if_no_path' hwhandler='1 alua' wp=rw
  |-+- policy='round-robin 0' prio=0 status=enabled
  | |- 24:0:0:39 sdj 8:144 failed faulty running
  | |- 17:0:0:39 sdl 8:176 failed faulty running
  | |- 22:0:0:39 sdk 8:160 failed faulty running
  | `- 19:0:0:39 sdn 8:208 failed faulty running
  `-+- policy='round-robin 0' prio=0 status=enabled
    |- 23:0:0:39 sdi 8:128 failed faulty running
    |- 18:0:0:39 sdo 8:224 failed faulty running
    |- 21:0:0:39 sdm 8:192 failed faulty running
    `- 20:0:0:39 sdp 8:240 failed faulty running

To manage notifications about this bug go to:
https://bugs.launchpad.net/nova/+bug/2048837/+subscriptions


-- 
Mailing list: https://launchpad.net/~yahoo-eng-team
Post to     : yahoo-eng-team@lists.launchpad.net
Unsubscribe : https://launchpad.net/~yahoo-eng-team
More help   : https://help.launchpad.net/ListHelp

[Yahoo-eng-team] [Bug 2048837] [NEW] Concurrent deletion of instances leads to residual multipath

Reply via email to