Eugene Loh wrote: > Prentice Bisbal wrote: > >> Is there a limit on how many MPI processes can run on a single host? >> >> I have a user trying to test his code on the command-line on a single >> host before running it on our cluster like so: >> >> mpirun -np X foo >> >> When he tries to run it on large number of process (X = 256, 512), the >> program fails, and I can reproduce this with a simple "Hello, World" >> program: >> >> $ mpirun -np 256 mpihello >> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu >> exited on signal 15 (Terminated). >> 252 additional processes aborted (not shown) >> >> I've done some testing and found that X <155 for this program to work. >> Is this a bug, part of the standard, or design/implementation decision? >> >> > One possible issue is the limit on the number of descriptors. The error > message should be pretty helpful and descriptive, but perhaps you're > using an older version of OMPI. If this is your problem, one workaround > is something like this: > > unlimit descriptors > mpirun -np 256 mpihello
Looks like I'm not allowed to set that as a regular user: $ ulimit -n 2048 -bash: ulimit: open files: cannot modify limit: Operation not permitted Since I am the admin, I could change that elsewhere, but I'd rather not do that system-wide unless absolutely necessary. > > though I guess the syntax depends on what shell you're running. Another > is to set the MCA parameter opal_set_max_sys_limits to 1. That didn't work either: $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu exited on signal 15 (Terminated). 252 additional processes aborted (not shown) -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ