Version OGS/GE 2011.11p1 (Rocks 6.1) Hi,
I'm using h_vmem and s_vmem to limit memory usage for qsub and qlogin jobs. A user's got some analyses running on nearly identical data sets that are hitting memory limits and being killed, which is fine, but the messages are inconsistent. Some instances report an exception from the app in question saying that memory can't be allocated. This app (an in-house tool) sends exceptions to stdout. Other instances just dump core and there's no message about memory problems in either stdout or stderr logs. h_vmem is 6000M and s_vmem is 5900M. It might be that the instances are right up against the s_vmem limit when the failing memory allocation occurs, and in some cases the requested amount triggers only the soft limit, and in other it triggers both. So perhpas the instances where it triggers the hard limit are the ones without the exception messages? Unfortunately the stderr and stdout log filenames don't contain job ids. However, in my first tests anyway, a qsub script that runs out of memory shows an exception message, even when s_vmem is higher than h_vmem. So I'm not sure about this line of reasoning. We're trying to figure it out and will run more tests, but I thought I'd check here first to see if anyone's had this kind of experience. Thanks. -M
_______________________________________________ users mailing list users@gridengine.org https://gridengine.org/mailman/listinfo/users