[OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
I am using openmpi 1.7.4 on Ubuntu 12.04 x64 and I have a very odd problem. I have 4 nodes, all of which are defined in the hostfile and in /etc/hosts. I can log into each node using ssh and certificate method from the shell that is running the mpi job, by sing their name as defined in /etc/hosts

[OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Ross Boylan
I took the advice here and built a personal copy of the current openmpi, to see if the problems I was having with Rmpi were a result of the old version on the system. When I do ldd on the relevant libraries (Rmpi.so is loaded dynamically by R) everything looks fine; path references that should be

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
I "fixed it" by finding the message regarding tree spawn in a thread from November 2013. When I run the job with -mca plm_rsh_no_tree_spawn 1 the job works over 4 nodes. I cannot identify any errors in ssh key setup and since I am only using 4 nodes I am not concerned about somewhat slower launch

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Reuti
Hi, Am 12.03.2014 um 07:37 schrieb Victor: > I am using openmpi 1.7.4 on Ubuntu 12.04 x64 and I have a very odd problem. > > I have 4 nodes, all of which are defined in the hostfile and in /etc/hosts. > > I can log into each node using ssh and certificate method from the shell that > is runnin

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
Hostname no I use lower case, but for some reason while I was writing the email I thought that upper case is clearer... The same version of Ubuntu (12.04 x64) is on all nodes and openmpi and the executable are shared via nfs. On 12 March 2014 16:01, Reuti wrote: > Hi, > > Am 12.03.2014 um

[OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Hi Ralph, I installed openmpi-1.7.5rc2 and applied r31019 to it. As far as I confirmed, rmaps framework worked fine. However, by chance, I noticed that single ctrl+c typing could not terminate a running job. Twice typing was necessary. Is this your expected behavior? I didn't use ctrl+c to abo

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Jeff Squyres (jsquyres)
Are all names resolvable from all servers? I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work? On Mar 12, 2014, at 4:07 AM, Victor wrote: > Hostname no I use lower case, but for some reason while I was writing the > email I thought that upper case is clearer... > > The s

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread Jeff Squyres (jsquyres)
This all seems to be a side-effect of r30942 -- see: https://svn.open-mpi.org/trac/ompi/ticket/4365 On Mar 12, 2014, at 5:13 AM, wrote: > > > Hi Ralph, > > I installed openmpi-1.7.5rc2 and applied r31019 to it. > As far as I confirmed, rmaps framework worked fine. > > However, by chanc

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Thanks, Jeff. I really understood the situation. Tetsuya > This all seems to be a side-effect of r30942 -- see: > > https://svn.open-mpi.org/trac/ompi/ticket/4365 > > > On Mar 12, 2014, at 5:13 AM, wrote: > > > > > > > Hi Ralph, > > > > I installed openmpi-1.7.5rc2 and applied r31019 to it. >

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Jeff Squyres (jsquyres)
Generally, all you need to ensure that your personal copy of OMPI is used is to set the PATH and LD_LIBRARY_PATH to point to your new Open MPI installation. I do this all the time on my development cluster (where I have something like 6 billion different installations of OMPI available... mmm..

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Reuti
Am 12.03.2014 um 11:39 schrieb Jeff Squyres (jsquyres): > Generally, all you need to ensure that your personal copy of OMPI is used is > to set the PATH and LD_LIBRARY_PATH to point to your new Open MPI > installation. I do this all the time on my development cluster (where I have > something

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Victor
Yes they are. Can resolve and log into each node, from each node, using their "friendly" name, not IP. On 12 March 2014 18:15, Jeff Squyres (jsquyres) wrote: > Are all names resolvable from all servers? > > I.e., if you "ssh Node4" from Node1, Node2, and Node3, does it work? > > > On Mar 12, 20

Re: [OMPI users] Cannot run a job with more than 3 nodes

2014-03-12 Thread Jeff Squyres (jsquyres)
Can you verify that for all 4 nodes? I.e., something like this: foreach node (Node1 Node2 Node3 Node4) foreach other (Node1 Node2 Node3 Node 4) echo from $node to $other ssh $node ssh $other hostname On Mar 12, 2014, at 7:34 AM, Victor wrote: > Yes they are. Can resolve and log i

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Dave Goodell (dgoodell)
Perhaps there's an RPATH issue here? I don't fully understand the structure of Rmpi, but is there both an app and a library (or two separate libraries) that are linking against MPI? I.e., what we want is: app -> ~ross/OMPI \ / --> library -- But what we'r

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Bennet Fauber
My experience with Rmpi and OpenMPI is that it doesn't seem to do well with the dlopen or dynamic loading. I recently installed R 3.0.3, and Rmpi, which failed when built against our standard OpenMPI but succeeded using the following 'secret recipe'. Perhaps there is something here that will be h

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Ross Boylan
On Wed, 2014-03-12 at 11:50 +0100, Reuti wrote: > Am 12.03.2014 um 11:39 schrieb Jeff Squyres (jsquyres): > > > Generally, all you need to ensure that your personal copy of OMPI is used > > is to set the PATH and LD_LIBRARY_PATH to point to your new Open MPI > > installation. I do this all the

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Ralph Castain
I remember having a conversation with someone from R at Supercomputing last year, and this was one of the issues we discussed. The problem is that you have to ensure that R is built against the OMPI you are going to use, and it is usually better to have configured OMPI --disable-dlopen --enable-

Re: [OMPI users] trying to use personal copy of 1.7.4

2014-03-12 Thread Ross Boylan
On Wed, 2014-03-12 at 14:34 +, Dave Goodell (dgoodell) wrote: > Perhaps there's an RPATH issue here? I don't fully understand the structure > of Rmpi, but is there both an app and a library (or two separate libraries) > that are linking against MPI? > > I.e., what we want is: > > app -

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread tmishima
Hi Ralph, this problem is not fixed completely by today's latest ticket #4383, I guess ... https://svn.open-mpi.org/trac/ompi/ticket/4383 For example, in case of returing with ORTE_ERR_SILENT from the line 514 in rmaps_rr_mapper.c file, the problem still occurs. I executed the job under the unm

Re: [OMPI users] ctrl+c to abort a job with openmpi-1.7.5rc2

2014-03-12 Thread Ralph Castain
Yes, I know - I am just finishing the fix now. On Mar 12, 2014, at 8:48 PM, tmish...@jcity.maeda.co.jp wrote: > > > Hi Ralph, this problem is not fixed completely by today's latest > ticket #4383, I guess ... > > https://svn.open-mpi.org/trac/ompi/ticket/4383 > > For example, in case of retur