Re: [Openstack-operators] PCI Passthrough issues

2016-07-28 Thread Stig Telfer
Just out of interest, I saw this talk from DK Panda a few months ago which covers MPI developments, including for GPU-Direct and for running in virtualised environments: https://youtu.be/AsFakPJSplo Do you know if this means there is a version of MVAPICH2 that supports GPU-Direct optimised for

Re: [Openstack-operators] PCI Passthrough issues

2016-07-27 Thread Jonathan D. Proulx
On Tue, Jul 26, 2016 at 05:09:21PM +1000, Blair Bethwaite wrote: :Next question - has anyone figured out how to make GPU P2P work? We :haven't tried very hard yet, but with our current setup we're telling :Nova to pass through the GK210GL "3D controller" and that results in :the guest seeing indiv

Re: [Openstack-operators] PCI Passthrough issues

2016-07-26 Thread Blair Bethwaite
Hi Joe, Jon - We seem to be good now on both qemu 2.3 and 2.5 with kernel 3.19 (lowest we've tried). Also thanks to Jon we had an easy fix for the snapshot issues! Next question - has anyone figured out how to make GPU P2P work? We haven't tried very hard yet, but with our current setup we're tel

Re: [Openstack-operators] PCI Passthrough issues

2016-07-19 Thread Blair Bethwaite
Thanks for the confirmation Joe! On 20 July 2016 at 12:19, Joe Topjian wrote: > Hi Blair, > > We only updated qemu. We're running the version of libvirt from the Kilo > cloudarchive. > > We've been in production with our K80s for around two weeks now and have had > several users report success. >

Re: [Openstack-operators] PCI Passthrough issues

2016-07-19 Thread Joe Topjian
Hi Blair, We only updated qemu. We're running the version of libvirt from the Kilo cloudarchive. We've been in production with our K80s for around two weeks now and have had several users report success. Thanks, Joe On Tue, Jul 19, 2016 at 5:06 PM, Blair Bethwaite wrote: > Hilariously (or not

Re: [Openstack-operators] PCI Passthrough issues

2016-07-19 Thread Blair Bethwaite
Hilariously (or not!) we finally hit the same issue last week once folks actually started trying to do something (other than build and load drivers) with the K80s we're passing through. This https://devtalk.nvidia.com/default/topic/850833/pci-passthrough-kvm-for-cuda-usage/ is the best discussion o

Re: [Openstack-operators] PCI Passthrough issues

2016-07-07 Thread Jonathan Proulx
On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote: :Jon, : :Awesome, thanks for sharing. We've just run into an issue with SRIOV :VF passthrough that sounds like it might be the same problem (device :disappearing after a reboot), but haven't yet investigated deeply - :this will help w

Re: [Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Blair Bethwaite
Jon, Awesome, thanks for sharing. We've just run into an issue with SRIOV VF passthrough that sounds like it might be the same problem (device disappearing after a reboot), but haven't yet investigated deeply - this will help with somewhere to start! By the way, the nouveau mention was because we

Re: [Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Jonathan Proulx
On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote: : :I do have an odd remaining issue where I can run cuda jobs in the vm :but snapshots fail and after pause (for snapshotting) the pci device :can't be reattached (which is where i think it deletes the snapshot :it took). Got same

Re: [Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Jonathan D. Proulx
Joe, seems to have been mostly solved with the qemu upgrade. Since I plan on being on Mitaka before blessing the gpu instances with the 'production' label I'm OK with that. Blair I reflexively black list nouveau drivers about 5 ways in my installer and six in puppet :) I do have an odd remainin

Re: [Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Blair Bethwaite
Hi Jon, Do you have the nouveau driver/module loaded in the host by any chance? If so, blacklist, reboot, repeat. Whilst we're talking about this. Has anyone had any luck doing this with hosts having a PCI-e switch across multiple GPUs? Cheers, On 6 July 2016 at 23:27, Jonathan D. Proulx wrote

Re: [Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Joe Topjian
Hi Jon, We were also running into issues with the K80s. For our GPU nodes, we've gone with a 4.2 or 4.4 kernel. PCI Passthrough works much better in those releases. (I ran into odd issues with 4.4 and NFS, downgraded to 4.2 after a few hours of banging my head, problems went away, not a scientifi

[Openstack-operators] PCI Passthrough issues

2016-07-06 Thread Jonathan D. Proulx
Hi All, Trying to spass through some Nvidia K80 GPUs to soem instance and have gotten to the place where Nova seems to be doing the right thing gpu instances scheduled on the 1 gpu hypervisor I have and for inside the VM I see: root@gpu-x1:~# lspci | grep -i k80 00:06.0 3D controller: NVIDIA Corp