I don't know much about Maui, but these lines from the log seem
relevant:
-
maui.log:05/29 09:27:21 INFO: job 2120 exceeds requested proc
limit (3.72 > 1.00)
maui.log:05/29 09:27:21 MSysRegEvent(JOBRESVIOLATION: job '2120' in
state 'Running' has exceeded PROC resource limit (372 >
I have verified that maui is killing the job. I actually ran into
this with another user all of a sudden. I don't know why its only
effecting a few currently. Here's the maui log extract for a current
run of this users' program:
---
[root@aeolus log]# grep 2120 *
maui.log:05/29 09:01:45
(I'm not a subscriber to the torqueusers or mauiusers lists -- I'm not
sure my post will get through)
I wonder if Jan's idea has merit -- if Torque is killing the job for
some other reason (i.e., not wallclock). The message printed by
mpirun ("mpirun: killing job...") is *only* displayed i
Yep. Wall time is no where near violation (dies about 2 minutes into
a 30 minute allocation). I did a ulimit -a through qsub and direct on
the node (as the same user in both cases), and the results were
identical (most items were unlimited).
Any other ideas?
--Jim
On Tue, May 27, 2008 at 9:25