I believe the problem is the job log files (-j y -o DIRECTORY). Each node has a local private copy of grid engine and local grid engine logs (nothing shared).
On Thu, 16 Feb 2017 at 13:43 -0000, Stuart Barkley wrote: > Is there a way to throttle job starts on Grid Engine (we are using Son > of Grid Engine)? > > i.e. I would like to limit the number of tasks started during each > scheduling cycle and spread the startup of large array jobs over a > longer (still short) period of time. I'm aware that this would be a > tradeoff against task throughput for very short tasks. > > We appear to be having some filesystem (GPFS) problems when 2000+ > tasks on 350+ nodes all start creating grid engine log files in the > same directory at the same time. These tasks are often for a single > user hitting an idle system so I can't use maxujobs. > > Ideally we fix the filesystem and/or network communications. I'm > looking for a workaround. > > These jobs tend to have the same runtime so I'm seeing periodic floods > of simultaneous file creation. I can get the user to add some random > sleep time in the jobs to spread later jobs out, but the idle->full > spike will still exist. > > Thanks, > Stuart > -- > I've never been lost; I was once bewildered for three days, but never lost! > -- Daniel Boone _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users