[OMPI users] Broadcast problem

2013-04-30 Thread Randolph Pullen
I have a number of processes split into sender and receivers.
Senders read large quantities of randomly organised data into buffers for 
transmission to receivers.
When a buffer is full it needs to be transmitted to all receivers this repeats 
until all the data is transmitted.

Problem is that MPI_Bcast must know the root it is to receive from and 
therefore cant receive 'blind' from the first full sender.
Scatter would be inneffieienct because a few senders wont have anything to send 
- so its wasteful to transmit those empty buffers repeatedly. 

Any ideas?
Can Bcast recievers be promiscuous?

Thanks Randolph

Re: [OMPI users] Broadcast problem

2013-04-30 Thread Randolph Pullen
Oops,I think I meant gather not scatter...

Re: [OMPI users] Broadcast and root process

2013-04-30 Thread giggzounet
ok. Thx for your answer. The documentation was not clear on this subject.

Cheers
Guillaume

Le 29/04/2013 17:49, George Bosilca a écrit :
> No, the root processor can be different for every broadcast, but for a same 
> broadcast every process involved must know who the root is. That's the only 
> condition MPI imposes.
> 
>   George.
> 
> On Apr 29, 2013, at 13:15 , giggzounet  wrote:
> 
>> Hi,
>>
>> I'm new on this list. I'm using MPI for years but I don't have written a
>> lot of code with MPI. Therefore is my question perhaps ridiculous:
>>
>> I'm using a Computational Fluid Mechanics (CFD) Solver. This Solver uses
>> MPI to exchange the data between the different partitions. In this
>> solver the "root processor" is always the processor 1. So this proc
>> reads the input, broadcast a lot of things and writes the output.
>>
>> During a time step the solver computes the reference pressure at a
>> point. This computation is done on a processor, which may not be the
>> root processor. Therefore after the computation a broadcast of the value
>> is necessary. For the moment in the code the broadcast is done with the
>> processor, where the reference pressure is computed, as root processor
>> (and not with the standard "root processor").
>>
>> Is it false ? Must the root processor be the same during a computation
>> for all broadcasts ?
>>
>> Best regards,
>> Guillaume
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Broadcast problem

2013-04-30 Thread George Bosilca
You can't use gather either. Same for gatherv, as you need to know the amount 
you will receive in advance.

If I understand correctly your scenario (a random process is doing a broadcast 
at random time steps), using MPI collectives is not the best approach as they 
need global knowledge (there is a broadcast with a known root). Even if you 
implement your own protocol to broadcast the root of the next operation, 
without an agreement protocol this might lead to unmatched collectives. The 
most straightforward way to reach an agreement with MPI today is by focusing 
all broadcast decisions through a single process (a leader), who will decide in 
which order to available broadcast operations will be executed.

  George.

On Apr 30, 2013, at 09:40 , Randolph Pullen  
wrote:

> Oops,I think I meant gather not scatter...
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] QLogic HCA random crash after prolonged use

2013-04-30 Thread Dave Love
Ralph Castain  writes:

>> Dropped CR is definitely reason not to use OMPI past 1.6.  [By the way,
>> the release notes are confusing, saying that DMTCP is supported, but CR
>> is dropped.]  I'd have hoped a vendor who needs to support CR would
>> contribute, but I suppose changes just become proprietary if they move
>> the base past 1.6 :-(.
>
> Not necessarily

It looks so from here, but I'd be glad if not.

>> For general information, what makes the CR support difficult to maintain
>> -- is it just a question of effort?
>
> Largely a lack of interest. Very few (i.e., a handful) of people
> around the world use it,

That's surprising; I've certainly felt the lack of it from using PSM.

> and it is hard to justify putting in the
> effort for that small a user group.

Perhaps it's chicken and egg with use and support.




Re: [OMPI users] Problem with Openmpi-1.4.0 and qlogic-ofed-1.5.4.1

2013-04-30 Thread Dave Love
Padma Pavani  writes:

> Hi Team,
>
> I am facing some problem while running HPL benchmark.
>
>
>
> I am using Intel mpi -4.0.1 with Qlogic-OFED-1.5.4.1  to run benchmark and
> also tried with openmpi-1.4.0 but getting same error.
>
>
> Error File :
>
> [compute-0-1.local:06936] [[14544,1],25] ORTE_ERROR_LOG: A message is
> attempting to be sent to a process whose contact information is unknown in
> file rml_oob_send.c at line 105
> [compute-0-1.local:06936] [[14544,1],25] could not get route to
> [[INVALID],INVALID]

I'm not sure, but that looks like what I think you get from not running
the binary under mpirun.



Re: [OMPI users] multithreaded jobs

2013-04-30 Thread Dave Love
Ralph Castain  writes:

> On Apr 25, 2013, at 5:33 PM, Vladimir Yamshchikov  wrote:
>
>> $NSLOTS is what requested by -pe openmpi  in the script, my 
>> understanding that by default it is threads.

Is there something in the documentation
 that suggests that?  [It
currently incorrectly says processes, rather than slots, in at least one
place I'll fix.]

> What you want to do is:
>
> 1. request a number of slots = the number of application processes * the 
> number of threads each process will run

[If really necessary, maybe use a job submission verifier to fiddle what
the user supplies.]

> 2. execute mpirun with the --cpus-per-proc N option, where N = the number of 
> threads each process will run.
>
> This will ensure you have one core for each thread. Note, however,
> that we don't actually bind a thread to the core - so having more
> threads than there are cores on a socket can cause a thread to bounce
> across sockets and (therefore) potentially across NUMA regions.

Does that mean that binding is suppressed in that case, as opposed to
binding N cores per process, which is what I thought it did?  (I can't
immediately test it.)

I don't understand the problem in this specific case which causes
over-subscription.  However, if the program's runtime needs instruction,
you can do things like setting OMP_NUM_THREADS with an SGE JSV; see
archives of the gridengine list.  (The SGE_BINDING variable that recent
SGE provides to the job can be converted to GOMP_CPU_AFFINITY etc., but
that's probably only useful for single-process jobs.)

There may be a case for OMPI to support this sort of thing for DRMs like
SGE which don't start the MPI processes themselves; you potentially need
to export the binding information per-process.

-- 
Community Grid Engine:  http://arc.liv.ac.uk/SGE/


Re: [OMPI users] multithreaded jobs

2013-04-30 Thread Ralph Castain

On Apr 30, 2013, at 7:52 AM, Dave Love  wrote:

> Ralph Castain  writes:
> 
>> On Apr 25, 2013, at 5:33 PM, Vladimir Yamshchikov  wrote:
>> 
>>> $NSLOTS is what requested by -pe openmpi  in the script, my 
>>> understanding that by default it is threads.
> 
> Is there something in the documentation
>  that suggests that?  [It
> currently incorrectly says processes, rather than slots, in at least one
> place I'll fix.]
> 
>> What you want to do is:
>> 
>> 1. request a number of slots = the number of application processes * the 
>> number of threads each process will run
> 
> [If really necessary, maybe use a job submission verifier to fiddle what
> the user supplies.]

?? We have no way of knowing how many threads a process will start, so the user 
has to take that responsibility

> 
>> 2. execute mpirun with the --cpus-per-proc N option, where N = the number of 
>> threads each process will run.
>> 
>> This will ensure you have one core for each thread. Note, however,
>> that we don't actually bind a thread to the core - so having more
>> threads than there are cores on a socket can cause a thread to bounce
>> across sockets and (therefore) potentially across NUMA regions.
> 
> Does that mean that binding is suppressed in that case, as opposed to
> binding N cores per process, which is what I thought it did?  (I can't
> immediately test it.)

No, we do what the user requests. We will bind the process to the N cores - if 
those cores span sockets, that is the responsibility of the user. We try to 
keep it all together, but if you ask for too many...

> 
> I don't understand the problem in this specific case which causes
> over-subscription.  However, if the program's runtime needs instruction,
> you can do things like setting OMP_NUM_THREADS with an SGE JSV; see
> archives of the gridengine list.  (The SGE_BINDING variable that recent
> SGE provides to the job can be converted to GOMP_CPU_AFFINITY etc., but
> that's probably only useful for single-process jobs.)
> 
> There may be a case for OMPI to support this sort of thing for DRMs like
> SGE which don't start the MPI processes themselves; you potentially need
> to export the binding information per-process.

I'm unaware of any OS that currently binds at the process thread level. Can you 
refer us to something?

> 
> -- 
> Community Grid Engine:  http://arc.liv.ac.uk/SGE/
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
Hello,



My recent job started normally but after a few hours of running died with
the following message:



--

A daemon (pid 19390) died unexpectedly with status 137 while attempting

to launch so we are aborting.



There may be more information reported by the environment (see above).



This may be because the daemon was unable to find all the needed shared

libraries on the remote node. You may set your LD_LIBRARY_PATH to have the

location of the shared libraries on the remote nodes and this will

automatically be forwarded to the remote nodes.

--

--

mpirun noticed that the job aborted, but has no info as to the process

that caused that situation.



The scheduling script is below:



#$ -S /bin/bash

#$ -cwd

#$ -N SC3blastx_64-96thr

#$ -pe openmpi* 64-96

#$ -l h_rt=24:00:00,vf=3G

#$ -j y

#$ -M yaxi...@gmail.com

#$ -m eas

#

# Load the appropriate module files

# Should be loaded already

#$ -V



mpirun -np $NSLOTS blastx -query
$UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta
-db nr -out
$UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out
-evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20
-lcase_masking -num_threads $NSLOTS

What caused this termination? It does not seem scheduling problem as the
program run several hours with 96 threads. My $LD_LIBRARY_PATH does have
/share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it?

Vladimir


Re: [OMPI users] job termination on grid

2013-04-30 Thread Reuti
Hi,

Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:

> My recent job started normally but after a few hours of running died with the 
> following message:
>  
> --
> A daemon (pid 19390) died unexpectedly with status 137 while attempting
> to launch so we are aborting.

I wonder why it rose the failure only after running for hours. As 137 = 128 + 9 
it was killed, maybe by the queuing system due to the set time limit? If you 
check the accouting, what is the output of:

$ qacct -j 

-- Reuti


> There may be more information reported by the environment (see above).
>  
> This may be because the daemon was unable to find all the needed shared
> libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> location of the shared libraries on the remote nodes and this will
> automatically be forwarded to the remote nodes.
> --
> --
> mpirun noticed that the job aborted, but has no info as to the process
> that caused that situation.
>  
> The scheduling script is below:
>  
> #$ -S /bin/bash
> #$ -cwd
> #$ -N SC3blastx_64-96thr
> #$ -pe openmpi* 64-96
> #$ -l h_rt=24:00:00,vf=3G
> #$ -j y
> #$ -M yaxi...@gmail.com
> #$ -m eas
> #
> # Load the appropriate module files
> # Should be loaded already
> #$ -V
>  
> mpirun -np $NSLOTS blastx -query 
> $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta 
> -db nr -out 
> $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out 
> -evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20 
> -lcase_masking -num_threads $NSLOTS
>  
> What caused this termination? It does not seem scheduling problem as the 
> program run several hours with 96 threads. My $LD_LIBRARY_PATH does have 
> /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it?
>  
> Vladimir
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
I asked grid IT and they said they had to kill it as the job was
overloading nodes. They saw loads up to 180 instead of close to 12 on
12-core nodes. They think that blastx is not an openmpi application, so openMPI
is spawning between 64-96 blastx processes, each of which is then starting
up 96 worker threads. Or if blastx can work with openmpi, my blastx synthax
mpirun syntax is wrong. Any advice?

I was advised earlier to use –pe openmpi [ARG} , where  ARG =
number_of_processes x number_of_threads , and then pass desired number of
threads as ‘ mpirun –np $NSLOTS cpus-per-proc [ number_of_threads]’. When I
did that, I got an error that more threads were requested than number of
physical cores.





On Tue, Apr 30, 2013 at 2:35 PM, Reuti  wrote:

> Hi,
>
> Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:
>
> > My recent job started normally but after a few hours of running died
> with the following message:
> >
> >
> --
> > A daemon (pid 19390) died unexpectedly with status 137 while attempting
> > to launch so we are aborting.
>
> I wonder why it rose the failure only after running for hours. As 137 =
> 128 + 9 it was killed, maybe by the queuing system due to the set time
> limit? If you check the accouting, what is the output of:
>
> $ qacct -j 
>
> -- Reuti
>
>
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
> the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> >
> --
> >
> --
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> > The scheduling script is below:
> >
> > #$ -S /bin/bash
> > #$ -cwd
> > #$ -N SC3blastx_64-96thr
> > #$ -pe openmpi* 64-96
> > #$ -l h_rt=24:00:00,vf=3G
> > #$ -j y
> > #$ -M yaxi...@gmail.com
> > #$ -m eas
> > #
> > # Load the appropriate module files
> > # Should be loaded already
> > #$ -V
> >
> > mpirun -np $NSLOTS blastx -query
> $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta
> -db nr -out
> $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out
> -evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20
> -lcase_masking -num_threads $NSLOTS
> >
> > What caused this termination? It does not seem scheduling problem as the
> program run several hours with 96 threads. My $LD_LIBRARY_PATH does have
> /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it?
> >
> > Vladimir
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain

On Apr 30, 2013, at 1:34 PM, Vladimir Yamshchikov  wrote:

> I asked grid IT and they said they had to kill it as the job was overloading 
> nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. They 
> think that blastx is not an openmpi application, so openMPI is spawning 
> between 64-96 blastx processes, each of which is then starting up 96 worker 
> threads. Or if blastx can work with openmpi, my blastx synthax mpirun syntax 
> is wrong. Any advice?
> I was advised earlier to use –pe openmpi [ARG} , where  ARG = 
> number_of_processes x number_of_threads , and then pass desired number of 
> threads as ‘ mpirun –np $NSLOTS cpus-per-proc [ number_of_threads]’. When I 
> did that, I got an error that more threads were requested than number of 
> physical cores.
> 

How many threads are you trying to launch?? If it is a 12-core node, then you 
can't have more than 12 - sounds like you are trying to startup 96!

>  
> 
> 
> 
> 
> On Tue, Apr 30, 2013 at 2:35 PM, Reuti  wrote:
> Hi,
> 
> Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:
> 
> > My recent job started normally but after a few hours of running died with 
> > the following message:
> >
> > --
> > A daemon (pid 19390) died unexpectedly with status 137 while attempting
> > to launch so we are aborting.
> 
> I wonder why it rose the failure only after running for hours. As 137 = 128 + 
> 9 it was killed, maybe by the queuing system due to the set time limit? If 
> you check the accouting, what is the output of:
> 
> $ qacct -j 
> 
> -- Reuti
> 
> 
> > There may be more information reported by the environment (see above).
> >
> > This may be because the daemon was unable to find all the needed shared
> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
> > location of the shared libraries on the remote nodes and this will
> > automatically be forwarded to the remote nodes.
> > --
> > --
> > mpirun noticed that the job aborted, but has no info as to the process
> > that caused that situation.
> >
> > The scheduling script is below:
> >
> > #$ -S /bin/bash
> > #$ -cwd
> > #$ -N SC3blastx_64-96thr
> > #$ -pe openmpi* 64-96
> > #$ -l h_rt=24:00:00,vf=3G
> > #$ -j y
> > #$ -M yaxi...@gmail.com
> > #$ -m eas
> > #
> > # Load the appropriate module files
> > # Should be loaded already
> > #$ -V
> >
> > mpirun -np $NSLOTS blastx -query 
> > $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta 
> > -db nr -out 
> > $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out 
> > -evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20 
> > -lcase_masking -num_threads $NSLOTS
> >
> > What caused this termination? It does not seem scheduling problem as the 
> > program run several hours with 96 threads. My $LD_LIBRARY_PATH does have 
> > /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it?
> >
> > Vladimir
> > ___
> > users mailing list
> > us...@open-mpi.org
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] job termination on grid

2013-04-30 Thread Vladimir Yamshchikov
This is the question I am trying to answer - how many threads I can use
with blastx on a grid? If I could request resources by_node, use -pernode
option to have one process per node, and then specify the correct number of
threads for each node. But I cannot, resurces (slots) are requested
per-core (per_process), so I was instructed to request total number of
slots. However, as allocated cores are spread across the nodes, looks
like it messes scheduling up causing overload.


On Tue, Apr 30, 2013 at 3:46 PM, Ralph Castain  wrote:

>
> On Apr 30, 2013, at 1:34 PM, Vladimir Yamshchikov 
> wrote:
>
> I asked grid IT and they said they had to kill it as the job was
> overloading nodes. They saw loads up to 180 instead of close to 12 on
> 12-core nodes. They think that blastx is not an openmpi application, so 
> openMPI
> is spawning between 64-96 blastx processes, each of which is then starting
> up 96 worker threads. Or if blastx can work with openmpi, my blastx synthax
> mpirun syntax is wrong. Any advice?
>
> I was advised earlier to use –pe openmpi [ARG} , where  ARG =
> number_of_processes x number_of_threads , and then pass desired number of
> threads as ‘ mpirun –np $NSLOTS cpus-per-proc [ number_of_threads]’. When I
> did that, I got an error that more threads were requested than number of
> physical cores.
>
>
> How many threads are you trying to launch?? If it is a 12-core node, then
> you can't have more than 12 - sounds like you are trying to startup 96!
>
>
>
>
>
>
> On Tue, Apr 30, 2013 at 2:35 PM, Reuti  wrote:
>
>> Hi,
>>
>> Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:
>>
>> > My recent job started normally but after a few hours of running died
>> with the following message:
>> >
>> >
>> --
>> > A daemon (pid 19390) died unexpectedly with status 137 while attempting
>> > to launch so we are aborting.
>>
>> I wonder why it rose the failure only after running for hours. As 137 =
>> 128 + 9 it was killed, maybe by the queuing system due to the set time
>> limit? If you check the accouting, what is the output of:
>>
>> $ qacct -j 
>>
>> -- Reuti
>>
>>
>> > There may be more information reported by the environment (see above).
>> >
>> > This may be because the daemon was unable to find all the needed shared
>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have
>> the
>> > location of the shared libraries on the remote nodes and this will
>> > automatically be forwarded to the remote nodes.
>> >
>> --
>> >
>> --
>> > mpirun noticed that the job aborted, but has no info as to the process
>> > that caused that situation.
>> >
>> > The scheduling script is below:
>> >
>> > #$ -S /bin/bash
>> > #$ -cwd
>> > #$ -N SC3blastx_64-96thr
>> > #$ -pe openmpi* 64-96
>> > #$ -l h_rt=24:00:00,vf=3G
>> > #$ -j y
>> > #$ -M yaxi...@gmail.com
>> > #$ -m eas
>> > #
>> > # Load the appropriate module files
>> > # Should be loaded already
>> > #$ -V
>> >
>> > mpirun -np $NSLOTS blastx -query
>> $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta
>> -db nr -out
>> $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out
>> -evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20
>> -lcase_masking -num_threads $NSLOTS
>> >
>> > What caused this termination? It does not seem scheduling problem as
>> the program run several hours with 96 threads. My $LD_LIBRARY_PATH does
>> have /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify
>> it?
>> >
>> > Vladimir
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>


Re: [OMPI users] job termination on grid

2013-04-30 Thread Ralph Castain

On Apr 30, 2013, at 1:54 PM, Vladimir Yamshchikov  wrote:

> This is the question I am trying to answer - how many threads I can use with 
> blastx on a grid? If I could request resources by_node, use -pernode option 
> to have one process per node, and then specify the correct number of threads 
> for each node. But I cannot, resurces (slots) are requested per-core 
> (per_process),

I don't believe that is true - resources are requested for the entire job, not 
for each process

> so I was instructed to request total number of slots. However, as allocated 
> cores are spread across the nodes, looks like it messes scheduling up causing 
> overload.

I suggest you look at the SGE documentation - I don't think you are using it 
correctly


> 
> 
> On Tue, Apr 30, 2013 at 3:46 PM, Ralph Castain  wrote:
> 
> On Apr 30, 2013, at 1:34 PM, Vladimir Yamshchikov  wrote:
> 
>> I asked grid IT and they said they had to kill it as the job was overloading 
>> nodes. They saw loads up to 180 instead of close to 12 on 12-core nodes. 
>> They think that blastx is not an openmpi application, so openMPI is spawning 
>> between 64-96 blastx processes, each of which is then starting up 96 worker 
>> threads. Or if blastx can work with openmpi, my blastx synthax mpirun syntax 
>> is wrong. Any advice?
>> I was advised earlier to use –pe openmpi [ARG} , where  ARG = 
>> number_of_processes x number_of_threads , and then pass desired number of 
>> threads as ‘ mpirun –np $NSLOTS cpus-per-proc [ number_of_threads]’. When I 
>> did that, I got an error that more threads were requested than number of 
>> physical cores.
>> 
> 
> How many threads are you trying to launch?? If it is a 12-core node, then you 
> can't have more than 12 - sounds like you are trying to startup 96!
> 
>>  
>> 
>> 
>> 
>> 
>> On Tue, Apr 30, 2013 at 2:35 PM, Reuti  wrote:
>> Hi,
>> 
>> Am 30.04.2013 um 21:26 schrieb Vladimir Yamshchikov:
>> 
>> > My recent job started normally but after a few hours of running died with 
>> > the following message:
>> >
>> > --
>> > A daemon (pid 19390) died unexpectedly with status 137 while attempting
>> > to launch so we are aborting.
>> 
>> I wonder why it rose the failure only after running for hours. As 137 = 128 
>> + 9 it was killed, maybe by the queuing system due to the set time limit? If 
>> you check the accouting, what is the output of:
>> 
>> $ qacct -j 
>> 
>> -- Reuti
>> 
>> 
>> > There may be more information reported by the environment (see above).
>> >
>> > This may be because the daemon was unable to find all the needed shared
>> > libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
>> > location of the shared libraries on the remote nodes and this will
>> > automatically be forwarded to the remote nodes.
>> > --
>> > --
>> > mpirun noticed that the job aborted, but has no info as to the process
>> > that caused that situation.
>> >
>> > The scheduling script is below:
>> >
>> > #$ -S /bin/bash
>> > #$ -cwd
>> > #$ -N SC3blastx_64-96thr
>> > #$ -pe openmpi* 64-96
>> > #$ -l h_rt=24:00:00,vf=3G
>> > #$ -j y
>> > #$ -M yaxi...@gmail.com
>> > #$ -m eas
>> > #
>> > # Load the appropriate module files
>> > # Should be loaded already
>> > #$ -V
>> >
>> > mpirun -np $NSLOTS blastx -query 
>> > $UABGRID_SCRATCH/SC/AdQ30/fasta/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.fasta 
>> > -db nr -out 
>> > $UABGRID_SCRATCH/SC/blastx/SC/SC1-IS4-Ind1-153ngFr1sep1run1R1AdQ30.out 
>> > -evalue 0.001 -max_intron_length 10 -outfmt 5 -num_alignments 20 
>> > -lcase_masking -num_threads $NSLOTS
>> >
>> > What caused this termination? It does not seem scheduling problem as the 
>> > program run several hours with 96 threads. My $LD_LIBRARY_PATH does have 
>> > /share/apps/openmpi/1.6.4-gcc/lib entry, so how else should I modify it?
>> >
>> > Vladimir
>> > ___
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users



Re: [OMPI users] Strange "All-to-All" behavior

2013-04-30 Thread Number Cruncher
This sounds a bit like the All_to_allv algorithm change I complained 
about when 1.6.1 was released.


Original post: 
http://www.open-mpi.org/community/lists/users/2012/11/20722.php
Everything waits for "rank 0" observation: 
http://www.open-mpi.org/community/lists/users/2013/01/21219.php


Does switching to the older algorithm help?:
mpiexec --mca coll_tuned_use_dynamic_rules 1 --mca 
coll_tuned_alltoallv_algorithm 1


Simon

On 26/04/2013 23:14, Stephan Wolf wrote:

Hi,

I have encountered really bad performance when all the nodes send data
to all the other nodes. I use Isend and Irecv with multiple
outstanding sends per node. I debugged the behavior and came to the
following conclusion: It seems that one sender locks out all other
senders for one receiver. This sender releases the receiver only when
there are no more sends posted or a node with lower rank, wants to
send to this node (deadlock prevention). As a consequence, node 0
sends all its data to all nodes, while all others are waiting, then
node 1 sends all the data, …

What is the rationale behind this behaviour and can I change it by
some MCA parameter?

Thanks

Stephan

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




Re: [OMPI users] Strange "All-to-All" behavior

2013-04-30 Thread Number Cruncher
Sorry, I seem to have misread your post. You're not actually invoking 
MPI_Alltoall or MPI_Alltoallv. Please disregard my last post.


Simon.

On 26/04/2013 23:14, Stephan Wolf wrote:

Hi,

I have encountered really bad performance when all the nodes send data
to all the other nodes. I use Isend and Irecv with multiple
outstanding sends per node. I debugged the behavior and came to the
following conclusion: It seems that one sender locks out all other
senders for one receiver. This sender releases the receiver only when
there are no more sends posted or a node with lower rank, wants to
send to this node (deadlock prevention). As a consequence, node 0
sends all its data to all nodes, while all others are waiting, then
node 1 sends all the data, …

What is the rationale behind this behaviour and can I change it by
some MCA parameter?

Thanks

Stephan

___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users