BannerSmash opened a new issue, #13292:
URL: https://github.com/apache/cloudstack/issues/13292

   ### problem
   
   On a fresh deployment with XCP-ng 8.3 hosts, the secondary storage VM and 
console proxy VM never deploy. They fail and retry in a loop with ascending VM 
ids. The management log reports `Could not mount secondary storage 
<nfs>:/<export>`, which is misleading: NFS connectivity, exports, and 
permissions are all fine (the share mounts manually from both the hosts and the 
management server without issue).
   
   The real cause is that the host-side XAPI plugin 
`/etc/xapi.d/plugins/cloud-plugin-storage` fails to load on XCP-ng 8.3 because 
it does a top-level `import lvhdutil`, and `lvhdutil` no longer exists as an 
importable module on XCP-ng 8.3. Because the plugin crashes at import time, 
every storage operation that goes through it fails, which CloudStack surfaces 
as a mount failure and a "storage unreachable" condition.
   
   The plugin deployed to XCP-ng 8.3 hosts is sourced from the `xenserver84` 
directory:
   `.../xenserver/xcpserver83/../xenserver84/cloud-plugin-storage`.
   
   ### versions
   
   ~~~
   4.22.0.0.
   Note: system VM template reported version 4.22.0.0 in our deployment.]
   Fully patched XCP-ng 8.3 servers
   ~~~
   
   
   ### The steps to reproduce the bug
   
   System VMs cycle endlessly with ascending ids. The management log shows, per 
attempt:
   
   ~~~
   WARN  [c.c.h.x.r.XcpServer83Resource] callHostPlugin failed for cmd: 
mountNfsSecondaryStorage
         with args remoteDir: <nfs>:/<export>, localDir: 
/var/cloud_mount/<uuid>, nfsVersion: null,
         due to Task failed!
   WARN  [c.c.h.x.r.Xenserver625StorageProcessor] ... Could not mount secondary 
storage <nfs>:/<export> ...
   ERROR [o.a.c.e.o.VolumeOrchestrator] Unable to create volume [ROOT-xxx] ...
   WARN  [c.c.v.ClusteredVirtualMachineManagerImpl] Resource [StoragePool:x] is 
unreachable ...
   ~~~
   
   The underlying XAPI task error reveals the actual failure (host-side Python 
traceback):
   
   ~~~
   errorInfo: [XENAPI_PLUGIN_FAILURE, non-zero exit, , Traceback (most recent 
call last):
     File "/etc/xapi.d/plugins/cloud-plugin-storage", line 34, in <module>
       import lvhdutil
   ModuleNotFoundError: No module named 'lvhdutil'
   ]
   ~~~
   
   On the XCP-ng 8.3 host, the `lvhdutil` source module is absent — only a 
stale orphan bytecode file remains, which Python 3 will not import:
   
   ~~~
   # find /opt/xensource/sm /usr/lib/python* -name 'lvhdutil*'
   /opt/xensource/sm/__pycache__/lvhdutil.cpython-36.pyc
   
   # ls /opt/xensource/sm/ | grep -i lvhd
   (no source .py present)
   ~~~
   
   Note the other imports just above `import lvhdutil` (`SR`, `VDI`, 
`SRCommand`, `util`, `lvutil`, `vhdutil`) succeed, which confirms 
`/opt/xensource/sm` is on the path and that `lvhdutil` specifically has been 
removed/refactored out in XCP-ng 8.3.
   
   
   ### What to do about it?
   
   Make the import non-fatal in
   
`/usr/share/cloudstack-common/scripts/vm/hypervisor/xenserver/xenserver84/cloud-plugin-storage`
   on the management server, then push the patched plugin to the hosts (the 
management server only copies host plugins on host add, not on 
reconnect/restart, so a manual push or host re-add is required):
   
   ~~~python
   import vhdutil
   import shutil
   try:
       import lvhdutil
   except ImportError:
       lvhdutil = None
   import errno
   ~~~
   
   After this, the NFS mount succeeds and the system VMs deploy normally.
   
   ##### OPEN QUESTION / SUGGESTED FIX
   The guard above unblocks NFS and basic operation, but it does not restore 
any plugin code path that genuinely needs `lvhdutil` (relevant to LVM/iSCSI 
primary storage operations such as snapshots and template-from-volume). The 
proper fix is likely to import `lvhdutil` from its new location in XCP-ng 8.3's 
refactored `sm` stack (or call its replacement), rather than guarding it to 
`None`. Guidance on the correct module/path for 8.3 would be appreciated.
   
   ##### RELATED
   Possibly related to discussion #11346 (4.20.1 + XCP-ng 8.3 system VMs 
failing to deploy), though that case was resolved as an NFSv4 negotiation issue 
on the storage server — this is a distinct root cause (plugin import failure).
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]

Reply via email to