ths are set there.
Basically I need to start *one* master application which will handle all
the things for managing slave applications. The communication is *only*
master <-> slave and never collective, at moment.
The test program is available on request.
Does any one have an idea wha
Roberto Fichera ha scritto:
> Hi All on the list,
>
> I'm trying to execute dynamic MPI applications using MPI_Comm_spawn().
> The application I'm using for tests, basically is
> composed by a master, which spawn a slave in each assigned node in a
> multithreading fa
200
[master.tekno-soft.it:05407] [[11340,0],0] node[3].name cluster2 daemon
INVALID arch ffc91200
[master.tekno-soft.it:05407] [[11340,0],0] node[4].name cluster1 daemon
INVALID arch ffc91200
[cluster3.tekno-soft.it:09487] [[11340,0],1] orted: up and running -
waiting for commands!
>
> Ralph
>
&
e not
thread safe.
> 3. remove the threaded launch scenario and just call comm_spawn in a
> loop.
>
> In truth, the threaded approach to spawning all these procs isn't
> gaining you anything. Torque will only do one launch at a time anyway,
> so you will launch them serially no matt
Ralph Castain ha scritto:
> 3. remove the threaded launch scenario and just call comm_spawn in a
> loop.
>
Below you find how openmpi works, if I put the MPI_Comm_spawn() in a
loop and I drive
the rest of the communication in a thread. Basically it freeze in the
same place as I see
[roberto@master
Ralph Castain ha scritto:
> Afraid I am somewhat at a loss. The logs indicate that mpirun itself
> is having problems, likely caused by the threading. Only thing I can
> suggest is that you "unthread" the spawning loop and try it that way
> first so we can see if some underlying problem exists.
>
>
t those at some point - we try to report
>> that as a separate error when we see it, but it isn't always easy to
>> catch.
>>
>> Like I said, we really don't support threaded operations like this
>> right now, so I have no idea what your app may be triggering
hit those at some point - we try to report
>> that as a separate error when we see it, but it isn't always easy to
>> catch.
>>
>> Like I said, we really don't support threaded operations like this
>> right now, so I have no idea what your app may b
to get you fix,
right?
> Ralph
>
> On Oct 3, 2008, at 5:02 AM, Roberto Fichera wrote:
>
>> Ralph Castain ha scritto:
>>> Actually, it just occurred to me that you may be seeing a problem in
>>> comm_spawn itself that I am currently chasing down. It is in the 1.3
&g
Ralph Castain ha scritto:
>
> On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote:
>
>> Ralph Castain ha scritto:
>>> I committed something to the trunk yesterday. Given the complexity of
>>> the fix, I don't plan to bring it over to the 1.3 branch until
>&
nnect()
concurrently with a MPI_Comm_spawn().
>
> Ralph
>
> On Oct 3, 2008, at 9:11 AM, Roberto Fichera wrote:
>
>> Ralph Castain ha scritto:
>>>
>>> On Oct 3, 2008, at 7:14 AM, Roberto Fichera wrote:
>>>
>>>> Ralph Castain ha scri
east we locally decide if we
need more than one slave for crunching data, in that case the spawn will
be instrumented to spawn say 10
nodes for a single computation. So only that case we "fall back" in the
"standard usage" ;-)!
>
> Hope that helps
> Ralph
>
> On Oct 3
12 matches
Mail list logo