> On Sep 20, 2015, at 2:30 PM, Lev Givon <l...@columbia.edu> wrote: > > Received from Ralph Castain on Sun, Sep 20, 2015 at 05:08:10PM EDT: >>> On Sep 20, 2015, at 12:57 PM, Lev Givon <l...@columbia.edu> wrote: >>> >>> While debugging a problem that is causing emission of a non-fatal OpenMPI >>> error >>> message to stderr, the error message is followed by a line similar to the >>> following (I have help message aggregation turned on): >>> >>> [myhost:10008] 17 more processes have sent help message some_file.txt / >>> blah blah failed >>> >>> The job that I am running is started as a single process (via SLURM using >>> PMI) >>> that spawns 2 processes via MPI_Spawn; the number of processes reported in >>> the >>> above line, however, is much larger than 2. Why would the number of >>> processes >>> reporting an error be so big? When I examine the MPI processes in real time >>> as they >>> run (e.g., via top), there never appear to be that many processes running. >>> >>> I'm using OpenMPI 1.10.0 built on Ubuntu 14.04.3; as indicated by >>> ompi_info, I >>> don't have multiple MPI threads enabled: >>> >>> posix (MPI_THREAD_MULTIPLE: no, OPAL support: yes, OMPI progress: no, ORTE >>> progress: yes, Event lib: yes) > >> Just to be clear: you are starting the single process using “srun -n 1 >> ./app”, >> and the app calls MPI_Comm_spawn? > > Yes. > >> I’m not sure that’s really supported…I think there might be something in >> Slurm >> behind that call, but I have no idea if it really works. > > Well, the same question applies if I don't use SLURM and launch with mpiexec > -np > 1. > > On a closer look, it seems that the "17" corresponds to the number of times > the > error was emitted after its occurrence regardless of how many actual MPI > processes > were running (each of the MPI processes spawned by my program iterates a > certain > number of times and causes the error to occur during each iteration).
That is correct - if you tell us the error, we’d be happy to help diagnose. Otherwise, your analysis is correct. > -- > Lev Givon > Bionet Group | Neurokernel Project > http://www.columbia.edu/~lev/ > http://lebedov.github.io/ > http://neurokernel.github.io/ > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/09/27637.php