Hooray!! Great to hear - I was running out of ideas :-)
On Dec 19, 2012, at 2:01 PM, Daniel Davidson wrote:
> I figured this out.
>
> ssh was working, but scp was not due to an mtu mismatch between the systems.
> Adding MTU=1500 to my /etc/sysconfig/network-scripts/ifcfg-eth2 fixed the
> pro
I figured this out.
ssh was working, but scp was not due to an mtu mismatch between the
systems. Adding MTU=1500 to my
/etc/sysconfig/network-scripts/ifcfg-eth2 fixed the problem.
Dan
On 12/17/2012 04:12 PM, Daniel Davidson wrote:
Yes, it does.
Dan
[root@compute-2-1 ~]# ssh compute-2-0
W
Yes, it does.
Dan
[root@compute-2-1 ~]# ssh compute-2-0
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
Last login: Mon Dec 17 16:13:00 2012 from compute-2-1.local
[root@compute-2-0 ~]# ssh co
Daniel,
Does passwordless ssh work. You need to make sure that it is.
Doug
On Dec 17, 2012, at 2:24 PM, Daniel Davidson wrote:
> I would also add that scp seems to be creating the file in the /tmp directory
> of compute-2-0, and that /var/log secure is showing ssh connections being
> accepted.
I would also add that scp seems to be creating the file in the /tmp
directory of compute-2-0, and that /var/log secure is showing ssh
connections being accepted. Is there anything in ssh that can limit
connections that I need to look out for? My guess is that it is part of
the client prefs an
A very long time (15 mintues or so) I finally received the following in
addition to what I just sent earlier:
[compute-2-0.local:24659] [[32341,0],1] odls:kill_local_proc working on
WILDCARD
[compute-2-0.local:24659] [[32341,0],1] odls:kill_local_proc working on
WILDCARD
[compute-2-0.local:246
Hmmm...and that is ALL the output? If so, then it never succeeded in sending a
message back, which leads one to suspect some kind of firewall in the way.
Looking at the ssh line, we are going to attempt to send a message from tnode
2-0 to node 2-1 on the 10.1.255.226 address. Is that going to wo
These nodes have not been locked down yet so that jobs cannot be
launched from the backend, at least on purpose anyway. The added
logging returns the information below:
[root@compute-2-1 /]# /home/apps/openmpi-1.7rc5/bin/mpirun -host
compute-2-0,compute-2-1 -v -np 10 --leave-session-attached
?? That was all the output? If so, then something is indeed quite wrong as it
didn't even attempt to launch the job.
Try adding -mca plm_base_verbose 5 to the cmd line.
I was assuming you were using ssh as the launcher, but I wonder if you are in
some managed environment? If so, then it could b
This looks to be having issues as well, and I cannot get any number of
processors to give me a different result with the new version.
[root@compute-2-1 /]# /home/apps/openmpi-1.7rc5/bin/mpirun -host
compute-2-0,compute-2-1 -v -np 50 --leave-session-attached -mca
odls_base_verbose 5 hostname
[
I will give this a try, but wouldn't that be an issue as well if the
process was run on the head node or another node? So long as the mpi
job is not started on either of these two nodes, it works fine.
Dan
On 12/14/2012 11:46 PM, Ralph Castain wrote:
It must be making contact or ORTE wouldn'
It must be making contact or ORTE wouldn't be attempting to launch your
application's procs. Looks more like it never received the launch command.
Looking at the code, I suspect you're getting caught in a race condition that
causes the message to get "stuck".
Just to see if that's the case, you
Thank you for the help so far. Here is the information that the
debugging gives me. Looks like the daemon on on the non-local node
never makes contact. If I step NP back two though, it does.
Dan
[root@compute-2-1 etc]# /home/apps/openmpi-1.6.3/bin/mpirun -host
compute-2-0,compute-2-1 -v -
Sorry - I forgot that you built from a tarball, and so debug isn't enabled by
default. You need to configure --enable-debug.
On Dec 14, 2012, at 1:52 PM, Daniel Davidson wrote:
> Oddly enough, adding this debugging info, lowered the number of processes
> that can be used down to 42 from 46. W
Oddly enough, adding this debugging info, lowered the number of
processes that can be used down to 42 from 46. When I run the MPI, it
fails giving only the information that follows:
[root@compute-2-1 ssh]# /home/apps/openmpi-1.6.3/bin/mpirun -host
compute-2-0,compute-2-1 -v -np 44 --leave-se
It wouldn't be ssh - in both cases, only one ssh is being done to each node (to
start the local daemon). The only difference is the number of fork/exec's being
done on each node, and the number of file descriptors being opened to support
those fork/exec's.
It certainly looks like your limits ar
I have had to cobble together two machines in our rocks cluster without
using the standard installation, they have efi only bios on them and
rocks doesnt like that, so it is the only workaround.
Everything works great now, except for one thing. MPI jobs (openmpi or
mpich) fail when started fr
17 matches
Mail list logo