Dear Yaniv, Please see my most recent response: https://www.dropbox.com/s/hlymmf9d6rn12tq/vdsm.vgpu.log?dl=0
I'm doing a clean install of the host right now to see if doing the exact same procedure a second time produces different results (this way lies madness, but we have excited bosses about vGPUs on oVirt). Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. [email protected]<mailto:[email protected]> On 17 May 2018, at 14:02, Yaniv Kaul <[email protected]<mailto:[email protected]>> wrote: It'd be easier if you could share the complete vdsm log. Perhaps file a bug and we can investigate it? Y. On Thu, May 17, 2018 at 11:25 AM, Callum Smith <[email protected]<mailto:[email protected]>> wrote: Some information that appears to be from around the time of installation to the cluster: WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -X libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -F libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -L libvirt-O-vnet0' failed: Chain 'libvirt-O-vnet0' doesn't exist. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ebtables --concurrent -t nat -D POSTROUTING -o vnet0 -j libvirt-O-vnet0' failed: Illegal target name 'libvirt-O-vnet0'. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X HI-vnet0' failed: ip6tables: No chain/target/match by that name. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F HI-vnet0' failed: ip6tables: No chain/target/match by that name. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -X FI-vnet0' failed: ip6tables: No chain/target/match by that name. firewalld WARNING: COMMAND_FAILED: '/usr/sbin/ip6tables -w2 -w -F FI-vnet0' failed: ip6tables: No chain/target/match by that name. firewalld Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. [email protected]<mailto:[email protected]> On 17 May 2018, at 09:20, Callum Smith <[email protected]<mailto:[email protected]>> wrote: PS. some other WARN's that come up on the host: WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.org.qemu.guest_agent.0 already removed vdsm WARN Attempting to remove a non existing net user: ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0 vdsm WARN Attempting to remove a non existing network: ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0 vdsm WARN File: /var/lib/libvirt/qemu/channels/1bc9dae8-a0ea-44b3-9103-5805100648d0.ovirt-guest-agent.0 already removed vdsm WARN Attempting to add an existing net user: ovirtmgmt/1bc9dae8-a0ea-44b3-9103-5805100648d0 vdsm Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. [email protected]<mailto:[email protected]> On 17 May 2018, at 09:16, Callum Smith <[email protected]<mailto:[email protected]>> wrote: OVN Network provider is used, and the node is running 4.2.3 (specifically 2018051606 clean install last night). Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. [email protected]<mailto:[email protected]> On 17 May 2018, at 07:47, Ales Musil <[email protected]<mailto:[email protected]>> wrote: On Thu, May 17, 2018 at 12:01 AM, Callum Smith <[email protected]<mailto:[email protected]>> wrote: Dear All, Our vGPU installation is progressing, though the VM is failing to start. 2018-05-16 22:57:34,328+0100 ERROR (vm/1bc9dae8) [virt.vm] (vmId='1bc9dae8-a0ea-44b3-9103-5805100648d0') The vm start process failed (vm:943) Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 872, in _startUnderlyingVm self._run() File "/usr/lib/python2.7/site-packages/vdsm/virt/vm.py", line 2872, in _run dom.createWithFlags(flags) File "/usr/lib/python2.7/site-packages/vdsm/common/libvirtconnection.py", line 130, in wrapper ret = f(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/common/function.py", line 92, in wrapper return func(inst, *args, **kwargs) File "/usr/lib64/python2.7/site-packages/libvirt.py", line 1099, in createWithFlags if ret == -1: raise libvirtError ('virDomainCreateWithFlags() failed', dom=self) libvirtError: Cannot get interface MTU on '': No such device That's the specific error, some other information. It seems the GPU 'allocation' of uuid against the nvidia-xx mdev type is proceeding correctly, and the device is being created by the VM instantiation but the VM does not succeed in going up with this error. Any other logs or information relevant to help diagnose? Regards, Callum -- Callum Smith Research Computing Core Wellcome Trust Centre for Human Genetics University of Oxford e. [email protected]<mailto:[email protected]> _______________________________________________ Users mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> Hi Callum, can you share your version of the setup? Also do you use OVS switch type in the cluster? Regards, Ales. -- ALES MUSIL INTERN - rhv network Red Hat EMEA<https://www.redhat.com/> [email protected]<mailto:[email protected]> IM: amusil [https://www.redhat.com/files/brand/email/sig-redhat.png]<https://red.ht/sig> _______________________________________________ Users mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]> _______________________________________________ Users mailing list -- [email protected]<mailto:[email protected]> To unsubscribe send an email to [email protected]<mailto:[email protected]>
_______________________________________________ Users mailing list -- [email protected] To unsubscribe send an email to [email protected]

