Just out of interest, I saw this talk from DK Panda a few months ago which
covers MPI developments, including for GPU-Direct and for running in
virtualised environments:
https://youtu.be/AsFakPJSplo
Do you know if this means there is a version of MVAPICH2 that supports
GPU-Direct optimised for
On Tue, Jul 26, 2016 at 05:09:21PM +1000, Blair Bethwaite wrote:
:Next question - has anyone figured out how to make GPU P2P work? We
:haven't tried very hard yet, but with our current setup we're telling
:Nova to pass through the GK210GL "3D controller" and that results in
:the guest seeing indiv
Hi Joe, Jon -
We seem to be good now on both qemu 2.3 and 2.5 with kernel 3.19
(lowest we've tried). Also thanks to Jon we had an easy fix for the
snapshot issues!
Next question - has anyone figured out how to make GPU P2P work? We
haven't tried very hard yet, but with our current setup we're tel
Thanks for the confirmation Joe!
On 20 July 2016 at 12:19, Joe Topjian wrote:
> Hi Blair,
>
> We only updated qemu. We're running the version of libvirt from the Kilo
> cloudarchive.
>
> We've been in production with our K80s for around two weeks now and have had
> several users report success.
>
Hi Blair,
We only updated qemu. We're running the version of libvirt from the Kilo
cloudarchive.
We've been in production with our K80s for around two weeks now and have
had several users report success.
Thanks,
Joe
On Tue, Jul 19, 2016 at 5:06 PM, Blair Bethwaite
wrote:
> Hilariously (or not
Hilariously (or not!) we finally hit the same issue last week once
folks actually started trying to do something (other than build and
load drivers) with the K80s we're passing through. This
https://devtalk.nvidia.com/default/topic/850833/pci-passthrough-kvm-for-cuda-usage/
is the best discussion o
On Thu, Jul 07, 2016 at 11:13:29AM +1000, Blair Bethwaite wrote:
:Jon,
:
:Awesome, thanks for sharing. We've just run into an issue with SRIOV
:VF passthrough that sounds like it might be the same problem (device
:disappearing after a reboot), but haven't yet investigated deeply -
:this will help w
Jon,
Awesome, thanks for sharing. We've just run into an issue with SRIOV
VF passthrough that sounds like it might be the same problem (device
disappearing after a reboot), but haven't yet investigated deeply -
this will help with somewhere to start!
By the way, the nouveau mention was because we
On Wed, Jul 06, 2016 at 12:32:26PM -0400, Jonathan D. Proulx wrote:
:
:I do have an odd remaining issue where I can run cuda jobs in the vm
:but snapshots fail and after pause (for snapshotting) the pci device
:can't be reattached (which is where i think it deletes the snapshot
:it took). Got same
Joe, seems to have been mostly solved with the qemu upgrade. Since I
plan on being on Mitaka before blessing the gpu instances with the
'production' label I'm OK with that.
Blair I reflexively black list nouveau drivers about 5 ways in my
installer and six in puppet :)
I do have an odd remainin
Hi Jon,
Do you have the nouveau driver/module loaded in the host by any
chance? If so, blacklist, reboot, repeat.
Whilst we're talking about this. Has anyone had any luck doing this
with hosts having a PCI-e switch across multiple GPUs?
Cheers,
On 6 July 2016 at 23:27, Jonathan D. Proulx wrote
Hi Jon,
We were also running into issues with the K80s.
For our GPU nodes, we've gone with a 4.2 or 4.4 kernel. PCI Passthrough
works much better in those releases. (I ran into odd issues with 4.4 and
NFS, downgraded to 4.2 after a few hours of banging my head, problems went
away, not a scientifi
Hi All,
Trying to spass through some Nvidia K80 GPUs to soem instance and have
gotten to the place where Nova seems to be doing the right thing gpu
instances scheduled on the 1 gpu hypervisor I have and for inside the
VM I see:
root@gpu-x1:~# lspci | grep -i k80
00:06.0 3D controller: NVIDIA Corp
13 matches
Mail list logo