Ralph Castain ha scritto:
> Hi Roberto
>
> My time is somewhat limited, so I couldn't review the code in detail.
> However, I think I got the gist of it.
>
> A few observations:
>
> 1. the code is rather inefficient, if all you want to do is spawn a
> pattern of slave processes based on a file. Unl
Hi Roberto
My time is somewhat limited, so I couldn't review the code in detail.
However, I think I got the gist of it.
A few observations:
1. the code is rather inefficient, if all you want to do is spawn a
pattern of slave processes based on a file. Unless there is some
overriding reas
Ralph Castain ha scritto:
> Interesting. I ran a loop calling comm_spawn 1000 times without a
> problem. I suspect it is the threading that is causing the trouble here.
I think so! My guessing is that at low level there is some trouble when
handling *concurrent*
orted spawning. Maybe
> You are welc
Interesting. I ran a loop calling comm_spawn 1000 times without a
problem. I suspect it is the threading that is causing the trouble here.
You are welcome to send me the code. You can find my loop code in your
code distribution under orte/test/mpi - look for loop_spawn and
loop_child.
Ral
Ralph Castain ha scritto:
>
> On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote:
>
>> Ralph Castain ha scritto:
>>> I committed something to the trunk yesterday. Given the complexity of
>>> the fix, I don't plan to bring it over to the 1.3 branch until
>>> sometime mid-to-end next week so it can be
On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote:
Ralph Castain ha scritto:
I committed something to the trunk yesterday. Given the complexity of
the fix, I don't plan to bring it over to the 1.3 branch until
sometime mid-to-end next week so it can be adequately tested.
Ok! So it means that
Ralph Castain ha scritto:
> I committed something to the trunk yesterday. Given the complexity of
> the fix, I don't plan to bring it over to the 1.3 branch until
> sometime mid-to-end next week so it can be adequately tested.
Ok! So it means that I can checkout from the SVN/trunk to get you fix,
r
I committed something to the trunk yesterday. Given the complexity of
the fix, I don't plan to bring it over to the 1.3 branch until
sometime mid-to-end next week so it can be adequately tested.
Ralph
On Oct 3, 2008, at 5:02 AM, Roberto Fichera wrote:
Ralph Castain ha scritto:
Actually, i
Ralph Castain ha scritto:
> Actually, it just occurred to me that you may be seeing a problem in
> comm_spawn itself that I am currently chasing down. It is in the 1.3
> branch and has to do with comm_spawning procs on subsets of nodes
> (instead of across all nodes). Could be related to this - you
Ralph Castain ha scritto:
> Actually, it just occurred to me that you may be seeing a problem in
> comm_spawn itself that I am currently chasing down. It is in the 1.3
> branch and has to do with comm_spawning procs on subsets of nodes
> (instead of across all nodes). Could be related to this - you
Ralph Castain ha scritto:
> Afraid I am somewhat at a loss. The logs indicate that mpirun itself
> is having problems, likely caused by the threading. Only thing I can
> suggest is that you "unthread" the spawning loop and try it that way
> first so we can see if some underlying problem exists.
>
>
Actually, it just occurred to me that you may be seeing a problem in
comm_spawn itself that I am currently chasing down. It is in the 1.3
branch and has to do with comm_spawning procs on subsets of nodes
(instead of across all nodes). Could be related to this - you might
want to give me a c
Afraid I am somewhat at a loss. The logs indicate that mpirun itself
is having problems, likely caused by the threading. Only thing I can
suggest is that you "unthread" the spawning loop and try it that way
first so we can see if some underlying problem exists.
FWIW: I have run a loop over
Ralph Castain ha scritto:
> 3. remove the threaded launch scenario and just call comm_spawn in a
> loop.
>
Below you find how openmpi works, if I put the MPI_Comm_spawn() in a
loop and I drive
the rest of the communication in a thread. Basically it freeze in the
same place as I see
[roberto@master
Ralph Castain ha scritto:
> Okay, I believe I understand the problem. What this error is telling
> you is that the Torque MOM is refusing our connection request because
> it is already busy. So we cannot spawn another process.
>
> If I understand your application correctly, you are spinning off
> m
Okay, I believe I understand the problem. What this error is telling
you is that the Torque MOM is refusing our connection request because
it is already busy. So we cannot spawn another process.
If I understand your application correctly, you are spinning off
multiple threads, each attempti
Ralph Castain ha scritto:
> Hi Roberto
>
> There is something wrong with this cmd line - perhaps it wasn't copied
> correctly?
>
> mpirun --verbose --debug-daemons --mca obl -np 1 -wdir `pwd`
> testmaster 1 $PBS_NODEFILE
>
> Specifically, the following is incomplete: --mca obl
>
> I'm not sure
Hi Roberto
There is something wrong with this cmd line - perhaps it wasn't copied
correctly?
mpirun --verbose --debug-daemons --mca obl -np 1 -wdir `pwd`
testmaster 1 $PBS_NODEFILE
Specifically, the following is incomplete: --mca obl
I'm not sure if this is the problem or not, but I
Roberto Fichera ha scritto:
> Hi All on the list,
>
> I'm trying to execute dynamic MPI applications using MPI_Comm_spawn().
> The application I'm using for tests, basically is
> composed by a master, which spawn a slave in each assigned node in a
> multithreading fashion. The master is started wit
Hi All on the list,
I'm trying to execute dynamic MPI applications using MPI_Comm_spawn().
The application I'm using for tests, basically is
composed by a master, which spawn a slave in each assigned node in a
multithreading fashion. The master is started with a
number of jobs to perform and a fil
20 matches
Mail list logo