Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-06 Thread Roberto Fichera
Ralph Castain ha scritto: > Hi Roberto > > My time is somewhat limited, so I couldn't review the code in detail. > However, I think I got the gist of it. > > A few observations: > > 1. the code is rather inefficient, if all you want to do is spawn a > pattern of slave processes based on a file. Unl

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-06 Thread Ralph Castain
Hi Roberto My time is somewhat limited, so I couldn't review the code in detail. However, I think I got the gist of it. A few observations: 1. the code is rather inefficient, if all you want to do is spawn a pattern of slave processes based on a file. Unless there is some overriding reas

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Roberto Fichera
Ralph Castain ha scritto: > Interesting. I ran a loop calling comm_spawn 1000 times without a > problem. I suspect it is the threading that is causing the trouble here. I think so! My guessing is that at low level there is some trouble when handling *concurrent* orted spawning. Maybe > You are welc

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Ralph Castain
Interesting. I ran a loop calling comm_spawn 1000 times without a problem. I suspect it is the threading that is causing the trouble here. You are welcome to send me the code. You can find my loop code in your code distribution under orte/test/mpi - look for loop_spawn and loop_child. Ral

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Roberto Fichera
Ralph Castain ha scritto: > > On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote: > >> Ralph Castain ha scritto: >>> I committed something to the trunk yesterday. Given the complexity of >>> the fix, I don't plan to bring it over to the 1.3 branch until >>> sometime mid-to-end next week so it can be

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Ralph Castain
On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote: Ralph Castain ha scritto: I committed something to the trunk yesterday. Given the complexity of the fix, I don't plan to bring it over to the 1.3 branch until sometime mid-to-end next week so it can be adequately tested. Ok! So it means that

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Roberto Fichera
Ralph Castain ha scritto: > I committed something to the trunk yesterday. Given the complexity of > the fix, I don't plan to bring it over to the 1.3 branch until > sometime mid-to-end next week so it can be adequately tested. Ok! So it means that I can checkout from the SVN/trunk to get you fix, r

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Ralph Castain
I committed something to the trunk yesterday. Given the complexity of the fix, I don't plan to bring it over to the 1.3 branch until sometime mid-to-end next week so it can be adequately tested. Ralph On Oct 3, 2008, at 5:02 AM, Roberto Fichera wrote: Ralph Castain ha scritto: Actually, i

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-03 Thread Roberto Fichera
Ralph Castain ha scritto: > Actually, it just occurred to me that you may be seeing a problem in > comm_spawn itself that I am currently chasing down. It is in the 1.3 > branch and has to do with comm_spawning procs on subsets of nodes > (instead of across all nodes). Could be related to this - you

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Actually, it just occurred to me that you may be seeing a problem in > comm_spawn itself that I am currently chasing down. It is in the 1.3 > branch and has to do with comm_spawning procs on subsets of nodes > (instead of across all nodes). Could be related to this - you

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Afraid I am somewhat at a loss. The logs indicate that mpirun itself > is having problems, likely caused by the threading. Only thing I can > suggest is that you "unthread" the spawning loop and try it that way > first so we can see if some underlying problem exists. > >

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Actually, it just occurred to me that you may be seeing a problem in comm_spawn itself that I am currently chasing down. It is in the 1.3 branch and has to do with comm_spawning procs on subsets of nodes (instead of across all nodes). Could be related to this - you might want to give me a c

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Afraid I am somewhat at a loss. The logs indicate that mpirun itself is having problems, likely caused by the threading. Only thing I can suggest is that you "unthread" the spawning loop and try it that way first so we can see if some underlying problem exists. FWIW: I have run a loop over

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > 3. remove the threaded launch scenario and just call comm_spawn in a > loop. > Below you find how openmpi works, if I put the MPI_Comm_spawn() in a loop and I drive the rest of the communication in a thread. Basically it freeze in the same place as I see [roberto@master

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Okay, I believe I understand the problem. What this error is telling > you is that the Torque MOM is refusing our connection request because > it is already busy. So we cannot spawn another process. > > If I understand your application correctly, you are spinning off > m

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Ralph Castain
Okay, I believe I understand the problem. What this error is telling you is that the Torque MOM is refusing our connection request because it is already busy. So we cannot spawn another process. If I understand your application correctly, you are spinning off multiple threads, each attempti

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-10-01 Thread Roberto Fichera
Ralph Castain ha scritto: > Hi Roberto > > There is something wrong with this cmd line - perhaps it wasn't copied > correctly? > > mpirun --verbose --debug-daemons --mca obl -np 1 -wdir `pwd` > testmaster 1 $PBS_NODEFILE > > Specifically, the following is incomplete: --mca obl > > I'm not sure

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-09-30 Thread Ralph Castain
Hi Roberto There is something wrong with this cmd line - perhaps it wasn't copied correctly? mpirun --verbose --debug-daemons --mca obl -np 1 -wdir `pwd` testmaster 1 $PBS_NODEFILE Specifically, the following is incomplete: --mca obl I'm not sure if this is the problem or not, but I

Re: [OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-09-30 Thread Roberto Fichera
Roberto Fichera ha scritto: > Hi All on the list, > > I'm trying to execute dynamic MPI applications using MPI_Comm_spawn(). > The application I'm using for tests, basically is > composed by a master, which spawn a slave in each assigned node in a > multithreading fashion. The master is started wit

[OMPI users] Running application with MPI_Comm_spawn() in multithreaded environment

2008-09-30 Thread Roberto Fichera
Hi All on the list, I'm trying to execute dynamic MPI applications using MPI_Comm_spawn(). The application I'm using for tests, basically is composed by a master, which spawn a slave in each assigned node in a multithreading fashion. The master is started with a number of jobs to perform and a fil