Hi, > Am 25.08.2014 um 22:27 schrieb Michael Stauffer <mgsta...@gmail.com>: > > > Version OGS/GE 2011.11p1 (Rocks 6.1) > > Hi, > > I'm using h_vmem and s_vmem to limit memory usage for qsub and qlogin jobs. A > user's got some analyses running on nearly identical data sets that are > hitting memory limits and being killed, which is fine, but the messages are > inconsistent. Some instances report an exception from the app in question > saying that memory can't be allocated. This app (an in-house tool) sends > exceptions to stdout. Other instances just dump core and there's no message > about memory problems in either stdout or stderr logs. > > h_vmem is 6000M and s_vmem is 5900M. It might be that the instances are right > up against the s_vmem limit when the failing memory allocation occurs, and in > some cases the requested amount triggers only the soft limit, and in other it > triggers both. So perhpas the instances where it triggers the hard limit are > the ones without the exception messages? Unfortunately the stderr and stdout > log filenames don't contain job ids.
But you can include the job id in the filename of the generated stdout/-err file, or dump a `ps -e f` to stdout in the jobscript. The shepherd will also contain the job id as argument. Do you catch the sigxcpu in the job script? When the loglevel in SGE is set to log_info, it will also record the passed limits in the messages file of the execd on the node. This is another place to look at then. -- Reuti > However, in my first tests anyway, a qsub script that runs out of memory > shows an exception message, even when s_vmem is higher than h_vmem. So I'm > not sure about this line of reasoning. > > We're trying to figure it out and will run more tests, but I thought I'd > check here first to see if anyone's had this kind of experience. Thanks. > > -M > _______________________________________________ > users mailing list > users@gridengine.org > https://gridengine.org/mailman/listinfo/users _______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users