Prentice Bisbal wrote:
Depending on which OMPI release you're using, I think you need something like 4*np up to 7*np (plus a few) descriptors. So, with 256, you need 1000+ descriptors. You're quite possibly up against your limit, though I don't know for sure that that's the problem here.Eugene Loh wrote:Prentice Bisbal wrote:Is there a limit on how many MPI processes can run on a single host? You say you're running 1.2.8. That's "a while ago", so would you consider updating as a first step? Among other things, newer OMPIs will generate a much clearer error message if the descriptor limit is the problem. I have a user trying to test his code on the command-line on a single host before running it on our cluster like so:mpirun -np X foo When he tries to run it on large number of process (X = 256, 512), the program fails, and I can reproduce this with a simple "Hello, World" program: $ mpirun -np 256 mpihello mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu exited on signal 15 (Terminated). 252 additional processes aborted (not shown) I've done some testing and found that X <155 for this program to work. Is this a bug, part of the standard, or design/implementation decision?One possible issue is the limit on the number of descriptors. The error message should be pretty helpful and descriptive, but perhaps you're using an older version of OMPI. If this is your problem, one workaround is something like this: unlimit descriptors mpirun -np 256 mpihelloLooks like I'm not allowed to set that as a regular user: $ ulimit -n 2048 -bash: ulimit: open files: cannot modify limit: Operation not permitted Since I am the admin, I could change that elsewhere, but I'd rather not do that system-wide unless absolutely necessary.though I guess the syntax depends on what shell you're running. Another is to set the MCA parameter opal_set_max_sys_limits to 1.That didn't work either: $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu exited on signal 15 (Terminated). 252 additional processes aborted (not shown) |
- [OMPI users] Limit to number of processes on one node? Prentice Bisbal
- Re: [OMPI users] Limit to number of processes on one ... Ralph Castain
- Re: [OMPI users] Limit to number of processes on ... Prentice Bisbal
- Re: [OMPI users] Limit to number of processes on one ... Eugene Loh
- Re: [OMPI users] Limit to number of processes on ... Prentice Bisbal
- Re: [OMPI users] Limit to number of processes... Eugene Loh
- Re: [OMPI users] Limit to number of proce... Prentice Bisbal
- Re: [OMPI users] Limit to number of ... Ralph Castain
- Re: [OMPI users] Limit to number... Prentice Bisbal
- Re: [OMPI users] Limit to nu... Ralph Castain
- Re: [OMPI users] Limit to nu... Prentice Bisbal
- Re: [OMPI users] Limit to nu... Ralph Castain
- Re: [OMPI users] Limit to nu... Prentice Bisbal