If everything is configured correctly, GridEngine will be aware that the GPU in node1 is in use, and schedule around it, ensuring that the 8 GPU job will get unused GPUs.
Ian On Mon, Apr 14, 2014 at 1:38 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: > Look at the info presented here: > > http://stackoverflow.com/questions/10557816/scheduling-gpu-resources-using-the-sun-grid-engine-sge > > Ian > > On Mon, Apr 14, 2014 at 1:29 PM, Feng Zhang <prod.f...@gmail.com> wrote: >> Thanks, Ian and Gowtham! >> >> >> This is a very nice instruction. One of my problem is, for example: >> >> node1, number of gpu=4 >> node2, number of gpu=4 >> node3, number of gpu=2 >> >> So in total I have 10 GPUs. >> >> Right now, user A has a serial GPU job, which takes one GPU on >> node1(Don't know which GPU though). So node1:3, node2:4 and node3:2 >> GPUs are still free for jobs. >> >> I submit one job with PE=8. SGE allocate all the 3 nodes to me with 8 >> GPU slots. The problem is now: how my job knows what GPUs it can get >> on node1? >> >> Best >> >> >> >> >> On Mon, Apr 14, 2014 at 4:13 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> wrote: >>> Again, look into using it as a consumable resource as Gowtham posted above. >>> >>> Ian >>> >>> On Mon, Apr 14, 2014 at 11:57 AM, Feng Zhang <prod.f...@gmail.com> wrote: >>>> Thanks, Reuti, >>>> >>>> The socket solution looks like only work fine for serial jobs, not PE >>>> jobs, right? >>>> >>>> Our cluster has different nodes, some nodes each has 2 GPUs, some >>>> others each has 4 GPUs. Most of the user jobs are PE jobs, some are >>>> serial. >>>> >>>> The socket solution can event work for PE jobs, but as my >>>> understanding, it is not efficient? Since each node has, for example, >>>> 4 queues. If one user submit a PE job to a queue, he/she can not use >>>> the other GPUs on the other queues? >>>> >>>> On Mon, Apr 14, 2014 at 2:16 PM, Reuti <re...@staff.uni-marburg.de> wrote: >>>>> Am 14.04.2014 um 20:06 schrieb Feng Zhang: >>>>> >>>>>> Thanks, Ian! >>>>>> >>>>>> I haven't checked the GPU load sensor in detail, either. It sounds to >>>>>> me it only handles the number of GPU allocated to a job, but the job >>>>>> doesn't know which GPUs it actually get and set the >>>>>> CUDA_VISIBLE_DEVICE(some programs need this env to be set). This can >>>>>> be done by writing some scripts/programs, but to me, it is not an >>>>>> accurate solution, since some jobs may still happen to collide to each >>>>>> other on the same GPU on a multiple GPU node. If GE can have the >>>>>> memory to record the GPUs allocated to a job, then this can be >>>>>> perfect. >>>>> >>>>> Like the option to request sockets instead of cores which I posted in the >>>>> last couple of days, you can use a similar approach to get the number of >>>>> the granted GPU out of the queue name. >>>>> >>>>> -- Reuti >>>>> >>>>> >>>>>> On Mon, Apr 14, 2014 at 1:46 PM, Ian Kaufman <ikauf...@eng.ucsd.edu> >>>>>> wrote: >>>>>>> I believe there already is support for GPUs - there is a GPU Load >>>>>>> Sensor in Open Grid Engine. You may have to build it yourself, I >>>>>>> haven't checked to see if it comes pre-packaged. >>>>>>> >>>>>>> Univa has Phi support, and I believe OGE/OGS has it as well, or at >>>>>>> least has been working on it. >>>>>>> >>>>>>> Ian >>>>>>> >>>>>>> On Mon, Apr 14, 2014 at 10:35 AM, Feng Zhang <prod.f...@gmail.com> >>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> Is there's any plan to implement the GPU resource management in SGE in >>>>>>>> the near future? Like Slurm or Torque? There are some ways to do this >>>>>>>> using scripts/programs, but I wonder that if the SGE itself can >>>>>>>> recognize and manage GPU(and Phi). Not need to be complicated and >>>>>>>> powerful, just do basic work. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> _______________________________________________ >>>>>>>> users mailing list >>>>>>>> users@gridengine.org >>>>>>>> https://gridengine.org/mailman/listinfo/users >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Ian Kaufman >>>>>>> Research Systems Administrator >>>>>>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu >>>>>> _______________________________________________ >>>>>> users mailing list >>>>>> users@gridengine.org >>>>>> https://gridengine.org/mailman/listinfo/users >>>>> >>> >>> >>> >>> -- >>> Ian Kaufman >>> Research Systems Administrator >>> UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu > > > > -- > Ian Kaufman > Research Systems Administrator > UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu -- Ian Kaufman Research Systems Administrator UC San Diego, Jacobs School of Engineering ikaufman AT ucsd DOT edu _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users