Chris, > We do have the issue where the four free cores are on one socket, > rather than being equally distributed across the sockets. When I > solicited advice from SchedMD for our config it seems they are > doing some work in this area that may hopefully surface in the next > major release (though likely only as a "beta" proof of concept).
I think I am forced to wait unfortunately. Thanks a lot for this response. I will keep an eye on that bug report. Thanks, Barry On Thu, Apr 19, 2018 at 09:58:16AM +1000, Christopher Samuel wrote: > On 19/04/18 07:11, Barry Moore wrote: > > > My situation is similar. I have a GPU cluster with gres.conf entries > > which look like: > > > > NodeName=gpu-XX Name=gpu File=/dev/nvidia[0-1] CPUs=[0-5] > > NodeName=gpu-XX Name=gpu File=/dev/nvidia[2-3] CPUs=[6-11] > > > > However, as you can imagine 8 cores sit idle on these machines for no > > reason. Is there a way to easily set this up? > > We do this with overlapping partitions: > > PartitionName=skylake Default=YES State=DOWN [...] MaxCPUsPerNode=32 > PartitionName=skylake-gpu Default=NO State=DOWN [...] Priority=1000 > > Our submit filter then forces jobs that request gres=gpu into the > skylake-gpu partition and those that don't into the skylake partition. > > Our gres.conf has: > > NodeName=[...] Name=gpu Type=p100 File=/dev/nvidia0 Cores=0-17 > NodeName=[...] Name=gpu Type=p100 File=/dev/nvidia1 Cores=18-35 > > But of course the Cores= spec is just advisory to the scheduler, > the user can make that a hard requirement by specifying: > > --gres-flags=enforce-binding > > We do have the issue where the four free cores are on one socket, > rather than being equally distributed across the sockets. When I > solicited advice from SchedMD for our config it seems they are > doing some work in this area that may hopefully surface in the next > major release (though likely only as a "beta" proof of concept). > > https://bugs.schedmd.com/show_bug.cgi?id=4717 > > All the best, > Chris > -- > Chris Samuel : http://www.csamuel.org/ : Melbourne, VIC > -- Barry E Moore II, PhD E-mail: bmoor...@pitt.edu Assistant Research Professor Center for Research Computing University of Pittsburgh Pittsburgh, PA 15260