Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
Thank you for the help so far. Here is the information that the debugging gives me. Looks like the daemon on on the non-local node never makes contact. If I step NP back two though, it does. Dan [root@compute-2-1 etc]# /home/apps/openmpi-1.6.3/bin/mpirun -host compute-2-0,compute-2-1 -v -

Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Ralph Castain
Sorry - I forgot that you built from a tarball, and so debug isn't enabled by default. You need to configure --enable-debug. On Dec 14, 2012, at 1:52 PM, Daniel Davidson wrote: > Oddly enough, adding this debugging info, lowered the number of processes > that can be used down to 42 from 46. W

Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
Oddly enough, adding this debugging info, lowered the number of processes that can be used down to 42 from 46. When I run the MPI, it fails giving only the information that follows: [root@compute-2-1 ssh]# /home/apps/openmpi-1.6.3/bin/mpirun -host compute-2-0,compute-2-1 -v -np 44 --leave-se

[OMPI users] Possible memory error

2012-12-14 Thread Handerson, Steven
Folks, I'm trying to track down an instance of openMPI writing to a freed block of memory. This occurs with the most recent release (1.6.3) as well as 1.6, on a 64 bit intel architecture, fedora 14. It occurs with a very simple reduction (allreduce minimum), over a single int value. Has anyone

Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Ralph Castain
It wouldn't be ssh - in both cases, only one ssh is being done to each node (to start the local daemon). The only difference is the number of fork/exec's being done on each node, and the number of file descriptors being opened to support those fork/exec's. It certainly looks like your limits ar

[OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
I have had to cobble together two machines in our rocks cluster without using the standard installation, they have efi only bios on them and rocks doesnt like that, so it is the only workaround. Everything works great now, except for one thing. MPI jobs (openmpi or mpich) fail when started fr

Re: [OMPI users] Problems with shared libraries while launching jobs

2012-12-14 Thread Ralph Castain
Add -mca plm_base_verbose 5 --leave-session-attached to the cmd line - that will show the ssh command being used to start each orted. On Dec 14, 2012, at 12:17 PM, "Blosch, Edwin L" wrote: > I am having a weird problem launching cases with OpenMPI 1.4.3. It is most > likely a problem with a p

[OMPI users] Problems with shared libraries while launching jobs

2012-12-14 Thread Blosch, Edwin L
I am having a weird problem launching cases with OpenMPI 1.4.3. It is most likely a problem with a particular node of our cluster, as the jobs will run fine on some submissions, but not other submissions. It seems to depend on the node list. I just am having trouble diagnosing which node, and

Re: [OMPI users] questions to some open problems

2012-12-14 Thread Ralph Castain
Hi Siegmar On Dec 14, 2012, at 5:54 AM, Siegmar Gross wrote: > Hi, > > some weeks ago (mainly in the beginning of October) I reported > several problems and I would be grateful if you can tell me if > and probably when somebody will try to solve them. > > 1) I don't get the expected results,

Re: [OMPI users] problem with data transfer in a heterogeneous environment

2012-12-14 Thread Ralph Castain
Disturbing, but I don't know if/when someone will address it. The problem really is that few, if any, of the developers have access to hetero systems. So developing and testing hetero support is difficult to impossible. I'll file a ticket about it and direct it to the attention of the person who

[OMPI users] questions to some open problems

2012-12-14 Thread Siegmar Gross
Hi, some weeks ago (mainly in the beginning of October) I reported several problems and I would be grateful if you can tell me if and probably when somebody will try to solve them. 1) I don't get the expected results, when I try to send or scatter the columns of a matrix in Java. The received

[OMPI users] problem with data transfer in a heterogeneous environment

2012-12-14 Thread Siegmar Gross
Hi, some weeks ago I reported a problem with my matrix multiplication program in a heterogeneous environment (little endian and big endian machines). The problem occurs in openmpi-1.6.x, openmpi-1.7, and openmpi-1.9. Now I implemented a small program which only scatters the columns of an integer m