On Apr 19, 2007, at 11:27 PM, Babu Bhai wrote:

I have already seen this faq. Nodes in cluster does not have multiple IP addresses. One thing i forgot to mention is that systems in cluster does not have static IPs and get IP address through DHCP.

Ok, that should be fine.

Also if there is a print statement (printf("hello world\n"); ) in slave it is correctly printed on masters consoles but none of MPI commands work.

I'm not sure I follow -- which MPI commands are you referring to, mpirun? Something else?

I think you're saying that the MPI job starts up, printf works fine, but then something goes bad...? Are you saying that MPI *functions* don't seem to work (like MPI_SEND)? (I'm a little confused by your use of the word "command")

If that is the case, then this is a bit more odd because it means that OMPI started up, launched your job, and did some "out of band" communication, but then failed the first time it tried to establish MPI communications.

Are you running any firewall or port-blocking software on either of the nodes? Is each node routable from the other? (in Linux, at least, errno 113 is "no route to host", which would tend to imply that one host could not open a socket to another because it couldn't route there)


regards,

Abhishek

>I need to make that error string be google-able -- I'll add it to the
>faq. :-)

>The problem is likely that you have multiple IP addresses, some of
>which are not routable to each other (but fail OMPI's routability
>assumptions). Check out these FAQ entries:

>http://www.open-mpi.org/faq/?category=tcp#tcp-routability
>http://www.open-mpi.org/faq/?category=tcp#tcp-selection

>Does this help?

>On Apr 19, 2007, at 11:07 AM, Babu Bhai wrote:

>> I have migrated from LAM/MPI to OpenMPI. I am not able to
>> execute simple mpi code in which master sends an integer to slave.
>> If i execute code on single machine i.e start 2 instance on same
>> machine (mpirun -np 2 hello) this works fine.
>>
>> If i execute in cluster using mpirun --prefix /usr /local -
>> np 2 --host 199.63.34.154,199.63.34.36 hello
>> it gives following error "btl_tcp_endpoint.c:
>> 572:mca_btl_tcp_endpoint_complete_connect] connect() failed with
>> errno=113"
>>
> >I am using openmpi-1.2
>>
> >regards,
> >Abhishek
> >_______________________________________________
> >users mailing list
> >users_at_[hidden]
> >http://www.open-mpi.org/mailman/listinfo.cgi/users

>--
>Jeff Squyres
>Cisco Systems
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


--
Jeff Squyres
Cisco Systems

Reply via email to