Known bug on GPFS, I have an open ticket with them through DDN for this. I can reproduce at will.
Workarounds are: 1) Space out the job submissions with sleep() 2) Put the temp log directory outside GPFS. Regards, Juan On Thu, 16 Feb 2017 at 13:43 -0000, Stuart Barkley wrote: > Is there a way to throttle job starts on Grid Engine (we are using Son > of Grid Engine)? > > i.e. I would like to limit the number of tasks started during each > scheduling cycle and spread the startup of large array jobs over a > longer (still short) period of time. I'm aware that this would be a > tradeoff against task throughput for very short tasks. > > We appear to be having some filesystem (GPFS) problems when 2000+ > tasks on 350+ nodes all start creating grid engine log files in the > same directory at the same time. These tasks are often for a single > user hitting an idle system so I can't use maxujobs. > > Ideally we fix the filesystem and/or network communications. I'm > looking for a workaround. > > These jobs tend to have the same runtime so I'm seeing periodic floods > of simultaneous file creation. I can get the user to add some random > sleep time in the jobs to spread later jobs out, but the idle->full > spike will still exist. > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone Mfg, Juan Jimenez System Administrator, HPC MDC Berlin / IT-Dept. Tel.: +49 30 9406 2800
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users