Thank you Ralph,

I found the problem. it is because I wrongly configured the second node's
selinux property (which is set to be enforced).
After it is disabled, the parallel-hello works fine.

regards,
-andria


On Tue, Mar 17, 2009 at 8:08 PM, Ralph Castain <r...@lanl.gov> wrote:

> Hi Andria
>
> The problem is a permissions one - your system has been setup so that only
> root has permission to open a TCP socket. I don't know what system you are
> running - you might want to talk to your system admin or someone
> knowledgeable on that operating system to ask them how to revise the
> required permissions.
>
> Ralph
>
>
>
> On Mar 17, 2009, at 3:12 AM, -andria- wrote:
>
>  Dear all,
>>
>> I am still learning how to create a parallel program with open-mpi.
>>
>> I try to run a mpihello program on my cluster, but it gives error when it
>> is executed as ordinary (public) user. however, it gives the correct result
>> when it is run by root user.
>>
>> why this happen? how can it be solved?
>>
>> attached you can find ompi_info --all output.
>>
>> the code:
>>
>> #include "mpi.h"
>> #include "stdio.h"
>>
>> int main(int argc, char** argv) {
>>   int numprocs, rank, namelen;
>>   char processor_name[MPI_MAX_PROCESSOR_NAME];
>>
>>   MPI_Init(&argc, &argv);
>>   MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>   MPI_Get_processor_name(processor_name, &namelen);
>>   printf("Process %d on %s out of %d\n", rank, processor_name, numprocs);
>>   MPI_Finalize();
>>
>>   return 0;
>> }
>>
>> output:
>> [public@cisitu01 ~]$ mpicc mpihello.c -o mpihello
>>
>> ### as public ###
>> [public@cisitu01 ~]$ mpirun -np 4 -hostfile nodes.lst mpihello
>> [cisitu02:02897] mca_oob_tcp_create_listen: bind() failed: Permission
>> denied (13)
>> [cisitu02:02897] mca_oob_tcp_init: unable to create listen socket
>> [cisitu02:02898] mca_oob_tcp_create_listen: bind() failed: Permission
>> denied (13)
>> [cisitu02:02898] mca_oob_tcp_init: unable to create listen socket
>> [cisitu02][0,1,1][btl_tcp_component.c:412:mca_btl_tcp_component_create_listen]
>> bind() failed with errno=13
>> [cisitu02][0,1,3][btl_tcp_component.c:412:mca_btl_tcp_component_create_listen]
>> bind() failed with errno=13
>> [cisitu02:02897] [0,1,1] ORTE_ERROR_LOG: Not found in file
>> gpr_proxy_deliver_notify_msg.c at line 139
>> [cisitu02:02898] [0,1,3] ORTE_ERROR_LOG: Not found in file
>> gpr_proxy_deliver_notify_msg.c at line 139
>> ^Cmpirun: killing job...
>>
>> mpirun noticed that job rank 0 with PID 2976 on node cisitu01 exited on
>> signal 15 (Terminated).
>> 3 additional processes aborted (not shown)
>>
>> ### as root ###
>> -bash-3.2# mpirun -np 4 -hostfile nodes.lst mpihello
>> Process 0 on cisitu01 out of 4
>> Process 1 on cisitu02 out of 4
>> Process 3 on cisitu02 out of 4
>> Process 2 on cisitu01 out of 4
>> -bash-3.2#
>>
>> thank you in advance,
>>
>> regards,
>> -andria
>> <ompi_info.all>_______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to