Let me explain in detail,

when we had only 2 nodes, 1 master (192.168.67.18) + 1 compute node
(192.168.45.65)
my openmpi-default-hostfile looked like*
192.168.67.18 slots=2
192.168.45.65 slots=2*

after this on running the command *miprun /work/Pi* on master node we got
*
# root@192.168.45.65 password :*

after entering the password the program ran on both de nodes.

Now after connecting a second compute node, and editing the hostfile:

*192.168.67.18 slots=2
192.168.45.65 slots=2*
*192.168.67.241 slots=2

*and then running the command *miprun /work/Pi* on master node we got

# root@192.168.45.65's password: root@192.168.67.241's password:

which does not accept the password.

Although we are trying to implement the passwordless cluster. i wud like to
know what this problem is occuring?


On Sat, Apr 18, 2009 at 3:40 AM, Gus Correa <g...@ldeo.columbia.edu> wrote:

> Ankush
>
> You need to setup passwordless connections with ssh to the node you just
> added.  You (or somebody else) probably did this already on the first
> compute node, otherwise the MPI programs wouldn't run
> across the network.
>
> See the very last sentence on this FAQ:
>
> http://www.open-mpi.org/faq/?category=running#run-prereqs
>
> And try this recipe (if you use RSA keys instead of DSA, replace all "dsa"
> by "rsa"):
>
>
> http://www.sshkeychain.org/mirrors/SSH-with-Keys-HOWTO/SSH-with-Keys-HOWTO-4.html#ss4.3
>
> I hope this helps.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> Ankush Kaul wrote:
>
>> Thank you, i m reading up on de tools u suggested.
>>
>> I am facing another problem, my cluster is working fine with 2 hosts (1
>> master + 1 compute node) but when i tried 2 add another node (1 master + 2
>> compute node) its not working. it works fine when i give de command mpirun
>> -host <hostname> /work/Pi
>>
>> but when i try to run
>> mpirun  /work/Pi it gives following error:
>>
>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password:
>> root@192.168.67.241 <mailto:root@192.168.67.241>'s password:
>>
>> Permission denied, please try again. <The password i provide is correct>
>>
>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password:
>>
>> Permission denied, please try again.
>>
>> root@192.168.45.65 <mailto:root@192.168.45.65>'s password:
>>
>> Permission denied (publickey,gssapi-with-mic,password).
>>
>>
>> Permission denied, please try again.
>>
>> root@192.168.67.241 <mailto:root@192.168.67.241>'s password:
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 275
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1166
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> errmgr_hnp.c at line 90
>>
>> [ccomp1.cluster:03503] ERROR: A daemon on node 192.168.45.65 failed to
>> start as expected.
>>
>> [ccomp1.cluster:03503] ERROR: There may be more information available from
>>
>> [ccomp1.cluster:03503] ERROR: the remote shell (see above).
>>
>> [ccomp1.cluster:03503] ERROR: The daemon exited unexpectedly with status
>> 255.
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> base/pls_base_orted_cmds.c at line 188
>>
>> [ccomp1.cluster:03503] [0,0,0] ORTE_ERROR_LOG: Timeout in file
>> pls_rsh_module.c at line 1198
>>
>>
>> What is the problem here?
>>
>> --------------------------------------------------------------------------
>>
>> mpirun was unable to cleanly terminate the daemons for this job. Returned
>> value Timeout instead of ORTE_SUCCESS
>>
>>
>> On Tue, Apr 14, 2009 at 7:15 PM, Eugene Loh <eugene....@sun.com <mailto:
>> eugene....@sun.com>> wrote:
>>
>>    Ankush Kaul wrote:
>>
>>        Finally, after mentioning the hostfiles the cluster is working
>>        fine. We downloaded few benchmarking softwares but i would like
>>        to know if there is any GUI based benchmarking software so that
>>        its easier to demonstrate the working of our cluster while
>>        displaying our cluster.
>>
>>
>>    I'm confused what you're looking for here, but thought I'd venture a
>>    suggestion.
>>
>>    There are GUI-based performance analysis and tracing tools.  E.g.,
>>    run a program, [[semi-]automatically] collect performance data, run
>>    a GUI-based analysis tool on the data, visualize what happened on
>>    your cluster.  Would this suit your purposes?
>>
>>    If so, there are a variety of tools out there you could try.  Some
>>    are platform-specific or cost money.  Some are widely/freely
>>    available.  Examples of these tools include Intel Trace Analyzer,
>>    Jumpshot, Vampir, TAU, etc.  I do know that Sun Studio (Performance
>>    Analyzer) is available via free download on x86 and SPARC and Linux
>>    and Solaris and works with OMPI.  Possibly the same with Jumpshot.
>>     VampirTrace instrumentation is already in OMPI, but then you need
>>    to figure out the analysis-tool part.  (I think the Vampir GUI tool
>>    requires a license, but I'm not sure.  Maybe you can convert to TAU,
>>    which is probably available for free download.)
>>
>>    Anyhow, I don't even know if that sort of thing fits your
>>    requirements.  Just an idea.
>>
>>    _______________________________________________
>>    users mailing list
>>    us...@open-mpi.org <mailto:us...@open-mpi.org>
>>    http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> ------------------------------------------------------------------------
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to