Network saturation could produce arbitrary long delays the total data load
we are talking about is really small. It is the responsibility of an MPI
library to do one of the following:
1) Use a reliable message protocol for each message (e.g. Infiniband RC or
TCP/IP)
2) detect lost packets and
Nope - none of them will work with 1.4.2. Sorry - bug not discovered until
after release
On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote:
> Hi Jeff,
> thanks for the quick reply.
>
> Would using '--cpus-per-proc N' in place of '-npernode N' or just '-bynode'
> do the trick?
>
> It
Hi Jeff,
thanks for the quick reply.
Would using '--cpus-per-proc /N/' in place of '-npernode /N/' or just
'-bynode' do the trick?
It seems that using '--loadbalance' also crashes mpirun.
best ...
Michael
On 08/23/10 19:30, Jeff Squyres wrote:
Yes, the -npernode segv is a known issue.
W
I have had a similar load related problem with Bcast. I don't know what caused
it though. With this one, what about the possibility of a buffer overrun or
network saturation?
--- On Tue, 24/8/10, Richard Treumann wrote:
From: Richard Treumann
Subject: Re: [OMPI users] IMB-MPI broadcast tes
Yes, the -npernode segv is a known issue.
We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see
if that fixes your problem?
http://www.open-mpi.org/nightly/v1.4/
On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote:
> Hello OMPI:
>
> We have installed OMPI
Hello OMPI:
We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4.
OMPI was built uisng Intel compilers 11.1.072. I am attaching the
configuration log and output from ompi_info -a.
The problem we are encountering is that whenever we use option
'-npernode N' in the mpirun comm
It is hard to imagine how a total data load of 41,943,040 bytes could be a
problem. That is really not much data. By the time the BCAST is done, each
task (except root) will have received a single half meg message form one
sender. That is not much.
IMB does shift the root so some tasks may be i
On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen <
randolph_pul...@yahoo.com.au> wrote:
> Its a long shot but could it be related to the total data volume ?
> ie 524288 * 80 = 41943040 bytes active in the cluster
>
> Can you exceed this 41943040 data volume with a smaller message repeated
> more
sure. I took a guess at ppn and nodes for the case where 2 processes
are on the same node... I dont claim these are the right values ;-)
c0301b10e1 ~/mpi> env|grep OMPI
OMPI_MCA_orte_nodes=c0301b10e1
OMPI_MCA_orte_rank=0
OMPI_MCA_orte_ppn=2
OMPI_MCA_orte_num_procs=2
OMPI_MCA_oob_tcp_static_ports
Can you send me the values you are using for the relevant envars? That way I
can try to replicate here
On Aug 23, 2010, at 1:15 PM, Philippe wrote:
> I took a look at the code but I'm afraid I dont see anything wrong.
>
> p.
>
> On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote:
>> Yes, t
I took a look at the code but I'm afraid I dont see anything wrong.
p.
On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote:
> Yes, that is correct - we reserve the first port in the range for a daemon,
> should one exist.
> The problem is clearly that get_node_rank is returning the wrong value
11 matches
Mail list logo