Let me explain in detail, when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node (192.168.45.65) my openmpi-default-hostfile looked like* 192.168.67.18 slots=2 192.168.45.65 slots=2*
after this on running the command *miprun /work/Pi* on master node we got * # root@192.168.45.65 password :* after entering the password the program ran on both de nodes. Now after connecting a second compute node, and editing the hostfile: *192.168.67.18 slots=2 192.168.45.65 slots=2* *192.168.67.241 slots=2 *and then running the command *miprun /work/Pi* on master node we got # root@192.168.45.65's password: root@192.168.67.241's password: which does not accept the password. Although we are trying to implement the passwordless cluster. i wud like to know what this problem is occuring? On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <g...@ldeo.columbia.edu> wrote: > Ankush > > You need to setup passwordless connections with ssh to the node you just > added. You (or somebody else) probably did this already on the first > compute node, otherwise the MPI programs wouldn't run > across the network. > > See the very last sentence on this FAQ: > > http://www.open-mpi.org/faq/?category=running#run-prereqs > > And try this recipe (if you use RSA keys instead of DSA, replace all "dsa" > by "rsa"): > > > http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3 > > I hope this helps. > > Gus Correa > --------------------------------------------------------------------- > Gustavo Correa > Lamont-Doherty Earth Observatory - Columbia University > Palisades, NY, 10964-8000 - USA > --------------------------------------------------------------------- > > > Ankush Kaul wrote: > >> Thank you, i m reading up on de tools u suggested. >> >> I am facing another problem, my cluster is working fine with 2 hosts (1 >> master + 1 compute node) but when i tried 2 add another node (1 master + 2 >> compute node) its not working. it works fine when i give de command mpirun >> -host <hostname> /work/Pi >> >> but when i try to run >> mpirun /work/Pi it gives following error: >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >> >> Permission denied, please try again. <The password i provide is correct> >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> >> Permission denied, please try again. >> >> root@192.168.45.65 <mailto:root@192.168.45.65>'s password: >> >> Permission denied (publickey,gssapi-with-mic,password). >> >> >> Permission denied, please try again. >> >> root@192.168.67.241 <mailto:root@192.168.67.241>'s password: >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 275 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1166 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> errmgr_hnp.c at line 90 >> >> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to >> start as expected. >> >> [ccomp1.cluster:03503] ERROR: There may be more information available from >> >> [ccomp1.cluster:03503] ERROR: the remote shell (see above). >> >> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status >> 255. >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> base/pls_base_orted_cmds.c at line 188 >> >> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file >> pls_rsh_module.c at line 1198 >> >> >> What is the problem here? >> >> -------------------------------------------------------------------------- >> >> mpirun was unable to cleanly terminate the daemons for this job. Returned >> value Timeout instead of ORTE_SUCCESS >> >> >> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene....@sun.com <mailto: >> eugene....@sun.com>> wrote: >> >> Ankush Kaul wrote: >> >> Finally, after mentioning the hostfiles the cluster is working >> fine. We downloaded few benchmarking softwares but i would like >> to know if there is any GUI based benchmarking software so that >> its easier to demonstrate the working of our cluster while >> displaying our cluster. >> >> >> I'm confused what you're looking for here, but thought I'd venture a >> suggestion. >> >> There are GUI-based performance analysis and tracing tools. E.g., >> run a program, [[semi-]automatically] collect performance data, run >> a GUI-based analysis tool on the data, visualize what happened on >> your cluster. Would this suit your purposes? >> >> If so, there are a variety of tools out there you could try. Some >> are platform-specific or cost money. Some are widely/freely >> available. Examples of these tools include Intel Trace Analyzer, >> Jumpshot, Vampir, TAU, etc. I do know that Sun Studio (Performance >> Analyzer) is available via free download on x86 and SPARC and Linux >> and Solaris and works with OMPI. Possibly the same with Jumpshot. >> VampirTrace instrumentation is already in OMPI, but then you need >> to figure out the analysis-tool part. (I think the Vampir GUI tool >> requires a license, but I'm not sure. Maybe you can convert to TAU, >> which is probably available for free download.) >> >> Anyhow, I don't even know if that sort of thing fits your >> requirements. Just an idea. >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> ------------------------------------------------------------------------ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >