Thank you Ralph, I found the problem. it is because I wrongly configured the second node's selinux property (which is set to be enforced). After it is disabled, the parallel-hello works fine.
regards, -andria On Tue, Mar 17, 2009 at 8:08 PM, Ralph Castain <r...@lanl.gov> wrote: > Hi Andria > > The problem is a permissions one - your system has been setup so that only > root has permission to open a TCP socket. I don't know what system you are > running - you might want to talk to your system admin or someone > knowledgeable on that operating system to ask them how to revise the > required permissions. > > Ralph > > > > On Mar 17, 2009, at 3:12 AM, -andria- wrote: > > Dear all, >> >> I am still learning how to create a parallel program with open-mpi. >> >> I try to run a mpihello program on my cluster, but it gives error when it >> is executed as ordinary (public) user. however, it gives the correct result >> when it is run by root user. >> >> why this happen? how can it be solved? >> >> attached you can find ompi_info --all output. >> >> the code: >> >> #include "mpi.h" >> #include "stdio.h" >> >> int main(int argc, char** argv) { >> int numprocs, rank, namelen; >> char processor_name[MPI_MAX_PROCESSOR_NAME]; >> >> MPI_Init(&argc, &argv); >> MPI_Comm_size(MPI_COMM_WORLD, &numprocs); >> MPI_Comm_rank(MPI_COMM_WORLD, &rank); >> MPI_Get_processor_name(processor_name, &namelen); >> printf("Process %d on %s out of %d\n", rank, processor_name, numprocs); >> MPI_Finalize(); >> >> return 0; >> } >> >> output: >> [public@cisitu01 ~]$ mpicc mpihello.c -o mpihello >> >> ### as public ### >> [public@cisitu01 ~]$ mpirun -np 4 -hostfile nodes.lst mpihello >> [cisitu02:02897] mca_oob_tcp_create_listen: bind() failed: Permission >> denied (13) >> [cisitu02:02897] mca_oob_tcp_init: unable to create listen socket >> [cisitu02:02898] mca_oob_tcp_create_listen: bind() failed: Permission >> denied (13) >> [cisitu02:02898] mca_oob_tcp_init: unable to create listen socket >> [cisitu02][0,1,1][btl_tcp_component.c:412:mca_btl_tcp_component_create_listen] >> bind() failed with errno=13 >> [cisitu02][0,1,3][btl_tcp_component.c:412:mca_btl_tcp_component_create_listen] >> bind() failed with errno=13 >> [cisitu02:02897] [0,1,1] ORTE_ERROR_LOG: Not found in file >> gpr_proxy_deliver_notify_msg.c at line 139 >> [cisitu02:02898] [0,1,3] ORTE_ERROR_LOG: Not found in file >> gpr_proxy_deliver_notify_msg.c at line 139 >> ^Cmpirun: killing job... >> >> mpirun noticed that job rank 0 with PID 2976 on node cisitu01 exited on >> signal 15 (Terminated). >> 3 additional processes aborted (not shown) >> >> ### as root ### >> -bash-3.2# mpirun -np 4 -hostfile nodes.lst mpihello >> Process 0 on cisitu01 out of 4 >> Process 1 on cisitu02 out of 4 >> Process 3 on cisitu02 out of 4 >> Process 2 on cisitu01 out of 4 >> -bash-3.2# >> >> thank you in advance, >> >> regards, >> -andria >> <ompi_info.all>_______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >