Re: [OMPI users] mpi problems/many cpus per node

2012-12-19 Thread Daniel Davidson
I figured this out. ssh was working, but scp was not due to an mtu mismatch between the systems. Adding MTU=1500 to my /etc/sysconfig/network-scripts/ifcfg-eth2 fixed the problem. Dan On 12/17/2012 04:12 PM, Daniel Davidson wrote: Yes, it does. Dan [root@compute-2-1 ~]# ssh compute-2-0

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
, Does passwordless ssh work. You need to make sure that it is. Doug On Dec 17, 2012, at 2:24 PM, Daniel Davidson wrote: I would also add that scp seems to be creating the file in the /tmp directory of compute-2-0, and that /var/log secure is showing ssh connections being accepted. Is there

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
:01 compute-2-0 sshd[24868]: pam_unix(sshd:session): session opened for user root by (uid=0) On 12/17/2012 11:16 AM, Daniel Davidson wrote: A very long time (15 mintues or so) I finally received the following in addition to what I just sent earlier: [compute-2-0.local:24659] [[32341,0],1

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
, we are going to attempt to send a message from tnode 2-0 to node 2-1 on the 10.1.255.226 address. Is that going to work? Anything preventing it? On Dec 17, 2012, at 8:56 AM, Daniel Davidson wrote: These nodes have not been locked down yet so that jobs cannot be launched from the backend, at

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
n it could be that launch from a backend node isn't allowed (e.g., on gridengine). On Dec 17, 2012, at 8:28 AM, Daniel Davidson wrote: This looks to be having issues as well, and I cannot get any number of processors to give me a different result with the new version. [root@compute-2-1

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
, Daniel Davidson wrote: I will give this a try, but wouldn't that be an issue as well if the process was run on the head node or another node? So long as the mpi job is not started on either of these two nodes, it works fine. Dan On 12/14/2012 11:46 PM, Ralph Castain wrote: It must be making co

Re: [OMPI users] mpi problems/many cpus per node

2012-12-17 Thread Daniel Davidson
might try running this with the 1.7 release candidate, or even the developer's nightly build. Both use a different timing mechanism intended to resolve such situations. On Dec 14, 2012, at 2:49 PM, Daniel Davidson wrote: Thank you for the help so far. Here is the information that the debugg

Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
] odls:kill_local_proc working on WILDCARD On 12/14/2012 04:11 PM, Ralph Castain wrote: Sorry - I forgot that you built from a tarball, and so debug isn't enabled by default. You need to configure --enable-debug. On Dec 14, 2012, at 1:52 PM, Daniel Davidson wrote: Oddly enough, adding this debugging

Re: [OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
d line - this will report all the local proc launch debug and hopefully show you a more detailed error report. On Dec 14, 2012, at 12:29 PM, Daniel Davidson wrote: I have had to cobble together two machines in our rocks cluster without using the standard installation, they have efi only bios

[OMPI users] mpi problems/many cpus per node

2012-12-14 Thread Daniel Davidson
I have had to cobble together two machines in our rocks cluster without using the standard installation, they have efi only bios on them and rocks doesnt like that, so it is the only workaround. Everything works great now, except for one thing. MPI jobs (openmpi or mpich) fail when started fr