Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine

2011-01-22 Thread Avinash

Hello,
	I figured out the problem, which is described herein, it might be 
useful for someone else. The problem stems from ompi_local_slave option 
being set on its own in the MPI_Info structure. It seems that 
MPI_Info_create is using a shift or more likely a masking operation 
(depending upon the size of some type, which in turn depends upon the 
underlying architecture), which sets the ompi_local_slave bit to high. 
As a result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit 
set high, see plm_rsh_module.c (line 1065) for the problem. I took the 
easy solution and set the ompi_local_slave to "no" in the Info structure 
and that solves the problem. Maybe this needs further investigation.


Regards,

On 1/21/11 7:22 PM, Avinash Malik wrote:


Hello,

 I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
 architecture. I have a problem using MPI_Comm_spawn and
 MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
 (MPI_INFO_NULL) parameter. I get a segmentation fault. I have
 the exact same code running fine on a 32-bit machine. I cannot
 use the 64-bit openmpi due to problems with other software,
 which uses openmpi, but can only be compiled in the 32-bit mode.

 I am attaching all the information, in a .tgz file. The .tgz
 file consists of:

 (1) The c-code for a small example two files parent.c and
 child.c
 (2) The compile_command that I ran on a 64-bit machine.
 (3) The run command to run the system
compiling openmpi-1.5.1.
 (4) ompi_info_all
 (5) The error that I get, it's a segmentation fault.

Regards,








___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


[OMPI users] A problem with running a 32-bit program on a 64-bit machine

2011-01-21 Thread Avinash Malik

Hello,

I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
architecture. I have a problem using MPI_Comm_spawn and
MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
(MPI_INFO_NULL) parameter. I get a segmentation fault. I have
the exact same code running fine on a 32-bit machine. I cannot
use the 64-bit openmpi due to problems with other software,
which uses openmpi, but can only be compiled in the 32-bit mode.

I am attaching all the information, in a .tgz file. The .tgz
file consists of: 

(1) The c-code for a small example two files parent.c and
child.c
(2) The compile_command that I ran on a 64-bit machine.
(3) The run command to run the system
   compiling openmpi-1.5.1.
(4) ompi_info_all
(5) The error that I get, it's a segmentation fault.

Regards,



information.tgz
Description: Binary data


-- 
Avinash Malik


[OMPI users] Problem with running MPI programs on machines with multiple interfaces

2011-01-24 Thread Avinash Malik

Hello,

I have two mahcines each having 3 live interfaces: lo, eth0
(interanet) and usb0 (internet). eth0 cannot access usb0 on the
other machine (and vice-veras). Now, when I try to run the MPI
program with these two hosts I cannot get any output, even --mca
btl_base_verbose 30 does not give any output. If I set hostfile
to have only localhost, then everything runs fine.

I tried out the same code and hostfile with two other machines
with two interfaces: lo and eth1, which can access each
other. The program runs fine on these machines.

Next, I setup btl_tcp_if_exclude to lo,usb0 (on the first arch)
and also the ip-address/mask, but this does not work
either. When I run the program on one machine and do "ps aux |
grep mpi" on the other I can see --hnp-uri being set to the
usb0's ip-address, which it should not, because I have set usb0
to be exluded in the btl_tcp_if_exclude list. So, what exactly
am I doing wrong here? 

I read the otimization FAQ and saw how openmpi builds the
bipartite graphs for connection. But, as I said before, eth0
cannot access usb0's ip and vice-versa, how can I get rid of the
usb0 ip-address showing up in --hnp-uri, because this is the
only difference between the working and the non-working archs.

Regards,
-- 
Avinash Malik


Re: [OMPI users] Problem with running MPI programs on machines with multiple interfaces

2011-01-24 Thread Avinash Malik

Hello,

Please don't worry about this for now, the problem stems from
iptable rules. But, I still think putting usb0 into the reject
list should disable the ip-address associated with it.

Regards,

>>>>> "Avinash" == Avinash Malik  writes:

    Avinash> Hello,

Avinash> I have two mahcines each having 3 live interfaces:
Avinash> lo, eth0 (interanet) and usb0 (internet). eth0 cannot
Avinash> access usb0 on the other machine (and vice-veras). Now,
Avinash> when I try to run the MPI program with these two hosts I
Avinash> cannot get any output, even --mca btl_base_verbose 30 does
Avinash> not give any output. If I set hostfile to have only
Avinash> localhost, then everything runs fine.

Avinash> I tried out the same code and hostfile with two
Avinash> other machines with two interfaces: lo and eth1, which can
Avinash> access each other. The program runs fine on these machines.

Avinash>     Next, I setup btl_tcp_if_exclude to lo,usb0 (on the
Avinash> first arch) and also the ip-address/mask, but this does not
Avinash> work either. When I run the program on one machine and do
Avinash> "ps aux | grep mpi" on the other I can see --hnp-uri being
Avinash> set to the usb0's ip-address, which it should not, because
Avinash> I have set usb0 to be exluded in the btl_tcp_if_exclude
Avinash> list. So, what exactly am I doing wrong here?

Avinash> I read the otimization FAQ and saw how openmpi
Avinash> builds the bipartite graphs for connection. But, as I said
Avinash> before, eth0 cannot access usb0's ip and vice-versa, how
Avinash> can I get rid of the usb0 ip-address showing up in
Avinash> --hnp-uri, because this is the only difference between the
Avinash> working and the non-working archs.

Avinash> Regards, -- Avinash Malik
Avinash> ___ users
Avinash> mailing list us...@open-mpi.org
Avinash> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Avinash Malik