date:20100823

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Richard Treumann

Network saturation could produce arbitrary long delays the total data load we are talking about is really small. It is the responsibility of an MPI library to do one of the following: 1) Use a reliable message protocol for each message (e.g. Infiniband RC or TCP/IP) 2) detect lost packets and

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Ralph Castain

Nope - none of them will work with 1.4.2. Sorry - bug not discovered until after release On Aug 23, 2010, at 7:45 PM, Michael E. Thomadakis wrote: > Hi Jeff, > thanks for the quick reply. > > Would using '--cpus-per-proc N' in place of '-npernode N' or just '-bynode' > do the trick? > > It

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Michael E. Thomadakis

Hi Jeff, thanks for the quick reply. Would using '--cpus-per-proc /N/' in place of '-npernode /N/' or just '-bynode' do the trick? It seems that using '--loadbalance' also crashes mpirun. best ... Michael On 08/23/10 19:30, Jeff Squyres wrote: Yes, the -npernode segv is a known issue. W

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Randolph Pullen

I have had a similar load related problem with Bcast. I don't know what caused it though. With this one, what about the possibility of a buffer overrun or network saturation? --- On Tue, 24/8/10, Richard Treumann wrote: From: Richard Treumann Subject: Re: [OMPI users] IMB-MPI broadcast tes

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Jeff Squyres

Yes, the -npernode segv is a known issue. We have it fixed in the 1.4.x nightly tarballs; can you give it a whirl and see if that fixes your problem? http://www.open-mpi.org/nightly/v1.4/ On Aug 23, 2010, at 8:20 PM, Michael E. Thomadakis wrote: > Hello OMPI: > > We have installed OMPI

[OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

2010-08-23 Thread Michael E. Thomadakis

Hello OMPI: We have installed OMPI V1.4.2 on a Nehalem cluster running CentOS5.4. OMPI was built uisng Intel compilers 11.1.072. I am attaching the configuration log and output from ompi_info -a. The problem we are encountering is that whenever we use option '-npernode N' in the mpirun comm

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Richard Treumann

It is hard to imagine how a total data load of 41,943,040 bytes could be a problem. That is really not much data. By the time the BCAST is done, each task (except root) will have received a single half meg message form one sender. That is not much. IMB does shift the root so some tasks may be i

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

2010-08-23 Thread Rahul Nabar

On Sun, Aug 22, 2010 at 9:57 PM, Randolph Pullen < randolph_pul...@yahoo.com.au> wrote: > Its a long shot but could it be related to the total data volume ? > ie 524288 * 80 = 41943040 bytes active in the cluster > > Can you exceed this 41943040 data volume with a smaller message repeated > more

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe

sure. I took a guess at ppn and nodes for the case where 2 processes are on the same node... I dont claim these are the right values ;-) c0301b10e1 ~/mpi> env|grep OMPI OMPI_MCA_orte_nodes=c0301b10e1 OMPI_MCA_orte_rank=0 OMPI_MCA_orte_ppn=2 OMPI_MCA_orte_num_procs=2 OMPI_MCA_oob_tcp_static_ports

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Ralph Castain

Can you send me the values you are using for the relevant envars? That way I can try to replicate here On Aug 23, 2010, at 1:15 PM, Philippe wrote: > I took a look at the code but I'm afraid I dont see anything wrong. > > p. > > On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: >> Yes, t

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-08-23 Thread Philippe

I took a look at the code but I'm afraid I dont see anything wrong. p. On Thu, Aug 19, 2010 at 2:32 PM, Ralph Castain wrote: > Yes, that is correct - we reserve the first port in the range for a daemon, > should one exist. > The problem is clearly that get_node_rank is returning the wrong value

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

[OMPI users] Open-MPI 1.4.2 : mpirun core-dumps when "-npernode N" is used at command line

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] IMB-MPI broadcast test stalls for large core counts: debug ideas?

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

11 matches

Site Navigation

Mail list logo

Footer information