On Jul 12, 2018, at 11:45 AM, Noam Bernstein <noam.bernst...@nrl.navy.mil> 
wrote:
> 
>> E.g., if you "ulimit -c" in your interactive shell and see "unlimited", but 
>> if you "ulimit -c" in a launched job and see "0", then the job scheduler is 
>> doing that to your environment somewhere.
> 
> I am using a scheduler (torque), but as I also told Åke off list in our 
> side-discussion about VASP, I’m already doing that.  I mpirun a script which 
> does a few things like ulimit -c and ulimit -s, and then runs the actual 
> executable with $* arguments.

That may not be sufficient.

Remember that your job script only runs on the Mother Superior node (i.e., the 
first node in the job).  Hence, while your job script may affect the corefile 
size settings in that shell (and its children), remember that the remote MPI 
processes are (effectively) launched via tm_spawn() -- not ssh.  I.e., Open MPI 
will end up calling tm_spawn() to launch orted processes on all the nodes in 
your job.  The TM daemons on the nodes in your job will then fork/exec the 
orteds, meaning that they inherit the environment (including corefile size 
restrictions) of the TM daemons.  The orteds eventually fork/exec your MPI 
processes.

This is a long way of saying: your shell startup files may not be executed, and 
the "ulimit -c" you did in your job script may not be propagated out to the 
other nodes.  Instead, your MPI processes may be inheriting the corefile size 
limitations from the Torque daemons.

In my SLURM cluster here at Cisco (which is a pretty ancient version at this 
point; I have no idea if things have changed), I had to put a "ulimit -c 
unlimited" in a relevant /etc/sysconfig/slurmd file so that that is executed 
before the slurmd (SLURM daemon) is executed.  Then my MPI processes start with 
unlimited corefile size restrictions.

(You may have already done this; I just want to make sure we're on the same 
sheet of music here...)

-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to