Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine
Hello, I figured out the problem, which is described herein, it might be useful for someone else. The problem stems from ompi_local_slave option being set on its own in the MPI_Info structure. It seems that MPI_Info_create is using a shift or more likely a masking operation (depending upon the size of some type, which in turn depends upon the underlying architecture), which sets the ompi_local_slave bit to high. As a result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set high, see plm_rsh_module.c (line 1065) for the problem. I took the easy solution and set the ompi_local_slave to "no" in the Info structure and that solves the problem. Maybe this needs further investigation. Regards, On 1/21/11 7:22 PM, Avinash Malik wrote: Hello, I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit architecture. I have a problem using MPI_Comm_spawn and MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL (MPI_INFO_NULL) parameter. I get a segmentation fault. I have the exact same code running fine on a 32-bit machine. I cannot use the 64-bit openmpi due to problems with other software, which uses openmpi, but can only be compiled in the 32-bit mode. I am attaching all the information, in a .tgz file. The .tgz file consists of: (1) The c-code for a small example two files parent.c and child.c (2) The compile_command that I ran on a 64-bit machine. (3) The run command to run the system compiling openmpi-1.5.1. (4) ompi_info_all (5) The error that I get, it's a segmentation fault. Regards, ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] A problem with running a 32-bit program on a 64-bit machine
Hello, I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit architecture. I have a problem using MPI_Comm_spawn and MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL (MPI_INFO_NULL) parameter. I get a segmentation fault. I have the exact same code running fine on a 32-bit machine. I cannot use the 64-bit openmpi due to problems with other software, which uses openmpi, but can only be compiled in the 32-bit mode. I am attaching all the information, in a .tgz file. The .tgz file consists of: (1) The c-code for a small example two files parent.c and child.c (2) The compile_command that I ran on a 64-bit machine. (3) The run command to run the system compiling openmpi-1.5.1. (4) ompi_info_all (5) The error that I get, it's a segmentation fault. Regards, information.tgz Description: Binary data -- Avinash Malik
[OMPI users] Problem with running MPI programs on machines with multiple interfaces
Hello, I have two mahcines each having 3 live interfaces: lo, eth0 (interanet) and usb0 (internet). eth0 cannot access usb0 on the other machine (and vice-veras). Now, when I try to run the MPI program with these two hosts I cannot get any output, even --mca btl_base_verbose 30 does not give any output. If I set hostfile to have only localhost, then everything runs fine. I tried out the same code and hostfile with two other machines with two interfaces: lo and eth1, which can access each other. The program runs fine on these machines. Next, I setup btl_tcp_if_exclude to lo,usb0 (on the first arch) and also the ip-address/mask, but this does not work either. When I run the program on one machine and do "ps aux | grep mpi" on the other I can see --hnp-uri being set to the usb0's ip-address, which it should not, because I have set usb0 to be exluded in the btl_tcp_if_exclude list. So, what exactly am I doing wrong here? I read the otimization FAQ and saw how openmpi builds the bipartite graphs for connection. But, as I said before, eth0 cannot access usb0's ip and vice-versa, how can I get rid of the usb0 ip-address showing up in --hnp-uri, because this is the only difference between the working and the non-working archs. Regards, -- Avinash Malik
Re: [OMPI users] Problem with running MPI programs on machines with multiple interfaces
Hello, Please don't worry about this for now, the problem stems from iptable rules. But, I still think putting usb0 into the reject list should disable the ip-address associated with it. Regards, >>>>> "Avinash" == Avinash Malik writes: Avinash> Hello, Avinash> I have two mahcines each having 3 live interfaces: Avinash> lo, eth0 (interanet) and usb0 (internet). eth0 cannot Avinash> access usb0 on the other machine (and vice-veras). Now, Avinash> when I try to run the MPI program with these two hosts I Avinash> cannot get any output, even --mca btl_base_verbose 30 does Avinash> not give any output. If I set hostfile to have only Avinash> localhost, then everything runs fine. Avinash> I tried out the same code and hostfile with two Avinash> other machines with two interfaces: lo and eth1, which can Avinash> access each other. The program runs fine on these machines. Avinash> Next, I setup btl_tcp_if_exclude to lo,usb0 (on the Avinash> first arch) and also the ip-address/mask, but this does not Avinash> work either. When I run the program on one machine and do Avinash> "ps aux | grep mpi" on the other I can see --hnp-uri being Avinash> set to the usb0's ip-address, which it should not, because Avinash> I have set usb0 to be exluded in the btl_tcp_if_exclude Avinash> list. So, what exactly am I doing wrong here? Avinash> I read the otimization FAQ and saw how openmpi Avinash> builds the bipartite graphs for connection. But, as I said Avinash> before, eth0 cannot access usb0's ip and vice-versa, how Avinash> can I get rid of the usb0 ip-address showing up in Avinash> --hnp-uri, because this is the only difference between the Avinash> working and the non-working archs. Avinash> Regards, -- Avinash Malik Avinash> ___ users Avinash> mailing list us...@open-mpi.org Avinash> http://www.open-mpi.org/mailman/listinfo.cgi/users -- Avinash Malik