Ralph Castain wrote: > On Mar 4, 2010, at 7:51 AM, Prentice Bisbal wrote: > >> >> Ralph Castain wrote: >>> On Mar 4, 2010, at 7:27 AM, Prentice Bisbal wrote: >>> >>>> Ralph Castain wrote: >>>>> On Mar 3, 2010, at 12:16 PM, Prentice Bisbal wrote: >>>>> >>>>>> Eugene Loh wrote: >>>>>>> Prentice Bisbal wrote: >>>>>>>> Eugene Loh wrote: >>>>>>>> >>>>>>>>> Prentice Bisbal wrote: >>>>>>>>> >>>>>>>>>> Is there a limit on how many MPI processes can run on a single host? >>>>>>>>>> >>>>>>> Depending on which OMPI release you're using, I think you need something >>>>>>> like 4*np up to 7*np (plus a few) descriptors. So, with 256, you need >>>>>>> 1000+ descriptors. You're quite possibly up against your limit, though >>>>>>> I don't know for sure that that's the problem here. >>>>>>> >>>>>>> You say you're running 1.2.8. That's "a while ago", so would you >>>>>>> consider updating as a first step? Among other things, newer OMPIs will >>>>>>> generate a much clearer error message if the descriptor limit is the >>>>>>> problem. >>>>>> While 1.2.8 might be "a while ago", upgrading software just because it's >>>>>> "old" is not a valid argument. >>>>>> >>>>>> I can install the lastest version of OpenMPI, but it will take a little >>>>>> while. >>>>> Maybe not because it is "old", but Eugene is correct. The old versions of >>>>> OMPI required more file descriptors than the newer versions. >>>>> >>>>> That said, you'll still need a minimum of 4x the number of procs on the >>>>> node even with the latest release. I suggest talking to your sys admin >>>>> about getting the limit increased. It sounds like it has been set >>>>> unrealistically low. >>>>> >>>>> >>>> I *am* the system admin! ;) >>>> >>>> The file descriptor limit is the default for RHEL, 1024, so I would not >>>> characterize it as "unrealistically low". I assume someone with much >>>> more knowledge of OS design and administration than me came up with this >>>> default, so I'm hesitant to change it without good reason. If there was >>>> good reason, I'd have no problem changing it. I have read that setting >>>> it to more than 8192 can lead to system instability. >>> Never heard that, and most HPC systems have it set a great deal higher >>> without trouble. >> I just read that the other day. Not sure where, though. Probably a forum >> posting somewhere. I'll take your word for it that it's safe to increase >> if necessary. >>> However, the choice is yours. If you have a large SMP system, you'll >>> eventually be forced to change it or severely limit its usefulness for MPI. >>> RHEL sets it that low arbitrarily as a way of saving memory by keeping the >>> fd table small, not because the OS can't handle it. >>> >>> Anyway, that is the problem. Nothing we (or any MPI) can do about it as the >>> fd's are required for socket-based communications and to forward I/O. >> Thanks, Ralph, that's exactly the answer I was looking for - where this >> limit was coming from. >> >> I can see how on a large SMP system the fd limit would have to be >> increased. In normal circumstances, my cluster nodes should never have >> more than 8 MPI processes running at once (per node), so I shouldn't be >> hitting that limit on my cluster. > > Ah, okay! That helps a great deal in figuring out what to advise you. In your > earlier note, it sounded like you were running all 512 procs on one node, so > I assumed you had a large single-node SMP. > > In this case, though, the problem is solely that you are using the 1.2 > series. In that series, mpirun and each process opened many more sockets to > all processes in the job. That's why you are overrunning your limit. > > Starting with 1.3, the number of sockets being opened on each is only 3 times > the number of procs on the node, plus a couple for the daemon. If you are > using TCP for MPI communications, then each MPI connection will open another > socket as these messages are direct and not routed. > > Upgrading to the 1.4 series should resolve the problem you saw.
After upgrading to 1.4.1, I can start up to 253 processes on one host: mpirun -np 253 mpihello This is an increase of ~100 over 1.2.8. When it does fail, it gives more useful error message: $ mpirun -np 254 mpihello [juno.sns.ias.edu:22862] [[6399,0],0] ORTE_ERROR_LOG: The system limit on number of network connections a process can open was reached in file ../../../../../orte/mca/oob/tcp/oob_tcp.c at line 447 -------------------------------------------------------------------------- Error: system limit exceeded on number of network connections that can be open This can be resolved by setting the mca parameter opal_set_max_sys_limits to 1, increasing your limit descriptor setting (using limit or ulimit commands), or asking the system administrator to increase the system limit. -------------------------------------------------------------------------- Case closed, court adjourned. Thanks for all the help and explanations. Prentice > > HTH > Ralph > >>> >>>> This is admittedly unusual situation - in normal use, no one would ever >>>> want to run that many processes on a single system - so I don't see any >>>> justification for modifying that setting. >>>> >>>> Yesterday I spoke to the researcher who originally asked me this limit - >>>> he just wanted to know what the limit was, and doesn't actually plan to >>>> do any "real" work with that many processes on a single node, rendering >>>> this whole discussion academic. >>>> >>>> I did install OpenMPI 1.4.1 yesterday, but I haven't had a chance to >>>> test it yet. I'll post the results of testing here. >>>> >>>>>>>>>> I have a user trying to test his code on the command-line on a single >>>>>>>>>> host before running it on our cluster like so: >>>>>>>>>> >>>>>>>>>> mpirun -np X foo >>>>>>>>>> >>>>>>>>>> When he tries to run it on large number of process (X = 256, 512), >>>>>>>>>> the >>>>>>>>>> program fails, and I can reproduce this with a simple "Hello, World" >>>>>>>>>> program: >>>>>>>>>> >>>>>>>>>> $ mpirun -np 256 mpihello >>>>>>>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu >>>>>>>>>> exited on signal 15 (Terminated). >>>>>>>>>> 252 additional processes aborted (not shown) >>>>>>>>>> >>>>>>>>>> I've done some testing and found that X <155 for this program to >>>>>>>>>> work. >>>>>>>>>> Is this a bug, part of the standard, or design/implementation >>>>>>>>>> decision? >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> One possible issue is the limit on the number of descriptors. The >>>>>>>>> error >>>>>>>>> message should be pretty helpful and descriptive, but perhaps you're >>>>>>>>> using an older version of OMPI. If this is your problem, one >>>>>>>>> workaround >>>>>>>>> is something like this: >>>>>>>>> >>>>>>>>> unlimit descriptors >>>>>>>>> mpirun -np 256 mpihello >>>>>>>>> >>>>>>>> Looks like I'm not allowed to set that as a regular user: >>>>>>>> >>>>>>>> $ ulimit -n 2048 >>>>>>>> -bash: ulimit: open files: cannot modify limit: Operation not permitted >>>>>>>> >>>>>>>> Since I am the admin, I could change that elsewhere, but I'd rather not >>>>>>>> do that system-wide unless absolutely necessary. >>>>>>>> >>>>>>>>> though I guess the syntax depends on what shell you're running. >>>>>>>>> Another >>>>>>>>> is to set the MCA parameter opal_set_max_sys_limits to 1. >>>>>>>>> >>>>>>>> That didn't work either: >>>>>>>> >>>>>>>> $ mpirun -mca opal_set_max_sys_limits 1 -np 256 mpihello >>>>>>>> mpirun noticed that job rank 0 with PID 0 on node juno.sns.ias.edu >>>>>>>> exited on signal 15 (Terminated). >>>>>>>> 252 additional processes aborted (not shown) >>>> -- >>>> Prentice Bisbal >>>> Linux Software Support Specialist/System Administrator >>>> School of Natural Sciences >>>> Institute for Advanced Study >>>> Princeton, NJ >>>> _______________________________________________ >>>> users mailing list >>>> us...@open-mpi.org >>>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >>> _______________________________________________ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >>> >> -- >> Prentice Bisbal >> Linux Software Support Specialist/System Administrator >> School of Natural Sciences >> Institute for Advanced Study >> Princeton, NJ >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Prentice Bisbal Linux Software Support Specialist/System Administrator School of Natural Sciences Institute for Advanced Study Princeton, NJ