OK,
Trying
mpirun -v -np 2 --debug-daemons --host talisker4 hostname
yields the error
[talisker4.phy.queensu.ca:00682] [0,0,1]-[0,0,0]
mca_oob_tcp_peer_try_connect: connect to 130.15.29.85:33821 failed,
connecting over all interfaces failed!
[talisker2.phy.queensu.ca:28538] ERROR: A daemon on node talisker4
failed to start as expected.
[talisker2.phy.queensu.ca:28538] ERROR: There may be more information
available from
[talisker2.phy.queensu.ca:28538] ERROR: the remote shell (see above).
[talisker2.phy.queensu.ca:28538] ERROR: The daemon exited unexpectedly
with status 255.
So apparently, the error is a result of talisker4 (remote) being unable
to open a connection with talisker2 (local) in this case. Trying the reverse
mpirun -v -np 2 --debug-daemons --host talisker2 hostname
executed from talisker4 yields the same error message reversed (ie 2
cant connect to 4). This makes me think its a firewall problem...
- Dave
Tim Prins wrote:
David,
Have you tried something like
mpirun -np 1 --host talisker4 hostname
If that hangs, try adding '--debug-daemons' to the command line and
see if the output from that helps. If not, please send the output to
the list.
Thanks,
Tim
On Mar 19, 2007, at 1:59 AM, David Burns wrote:
I neglected to mention that the test is currently running on 100 Mbps
ethernet. I have also tested the setup using a simple "hello world my
rank is_" program and get the same hanging problem.
3d...@qlink.queensu.ca wrote:
If anyone could help me out with this I would greatly appreciate
it. I
have already read through the entire FAQ and havent seen anyone
with a
similar problem.
I have successfully tested and run the ompi application I've coded
locally
on both computers talisker2 and talisker4
mpirun -np 1 --host localhost fdtd : -np 2 --host localhost rnode
However, when attempting to execute processes remotely, eg
mpirun -np 1 --host localhost fdtd : -np 2 --host talisker4 rnode
Nothing happens. The shell just sits there, nothing prints (despite
stdouts), and does not return until I kill it. I have set up ssh with
rsa-authentication, no passphrase. The paths are all set; I have
tried
purposefully missetting them and the error is reported and returns as
expected (so it isnt that).
More info about the system- fedora core 5, (Open MPI) 1.1.4.
config.log
and ompi_info outputs attached. Any help or ideas of where to go next
would be greatly appreciated.
Thanks,
David
---------------------------------------------------------------------
---
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
---------------------------------------------------------------------
---
No virus found in this incoming message.
Checked by AVG Free Edition.
Version: 7.5.446 / Virus Database: 268.18.13/725 - Release Date:
17/03/2007 12:33 PM
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users