Hi!
I am facing some issues with the cloudstack csi driver (leaseweb fork). In
general it works pretty good, but for example when draining a Kubernetes node
which triggers a lot of detach, attach operations, randomly something goes
wrong and i end up in a inconsistent state and i cant attach devices anymore to
the affected instance.
Scenario…
* Instance a has a few block volumes, requested by the CSI driver. Vda,
vdb, vdc, vdd, vde show up in the libvirt xml
* Vdd gets detached from instance a
* Instance a now has vda, vdb, vdc, vde in its libvirt xml
* CSI driver requests a new block volume for instance a, and tries to
attach it as vde, instead of using the meanwhile became free vdd
From that point on, no more devices can be attached tot he instance. The
management server shows this
2025-10-01 11:00:52,702 ERROR [c.c.a.ApiAsyncJobDispatcher]
(API-Job-Executor-85:[ctx-ee10aa59, job-629270]) (logid:5018a3b3) Unexpected
exception while executing
org.apache.cloudstack.api.command.user.volume.AttachVolumeCmd
com.cloud.utils.exception.CloudRuntimeException: Failed to attach volume
pvc-xxxx-b45f-4324-a85b-xxxx to VM kubetest-1; org.libvirt.LibvirtException:
XML error: target 'vde' duplicated for disk sources
'/mnt/xxxx-387c-3f14-aea7-0d19104d92dd/xxxx-c659-4699-8885-xxxx and
'/mnt/xxxx-387c-3f14-aea7-0d19104d92dd/xxxx-c659-4699-8885-xxxx
2025-10-01 11:00:52,702 DEBUG [o.a.c.f.j.i.AsyncJobManagerImpl]
(API-Job-Executor-85:[ctx-ee10aa59, job-629270]) (logid:5018a3b3) Complete
async job-629270, jobStatus: FAILED, resultCode: 530, result:
org.apache.cloudstack.api.response.ExceptionResponse/null/{"uuidList":[],"errorcode":"530","errortext":"Failed
to attach volume pvc-xxxx-b45f-4324-a85b-xxxx to VM kubetest -1;
org.libvirt.LibvirtException: XML error: target 'vde' duplicated for disk
sources /mnt/xxxx-387c-3f14-aea7-0d19104d92dd/xxxx-c659-4699-8885-xxxx ' and
'/mnt/xxxx-387c-3f14-aea7-0d19104d92dd/xxxx-c659-4699-8885-xxxx "}
If acs would try to add a new vdd interface (which became free) things would
work i guess. After a shutdown/reboot of the affected vm, everything starts
working again and new block devices can be attached.
We are currently on acs 4.20.1.0 on Ubuntu 24.04
Cheers,
Juergen