Re: [OMPI users] MPI_Comm_Spawn intercommunication

2011-01-22 Thread Jeff Squyres
Try using MPI_COMM_REMOTE_SIZE to get the size of the remote group in an 
intercommunicator.  MPI_COMM_SIZE returns the size of the local group.


On Jan 7, 2011, at 6:22 PM, Pierre Chanial wrote:

> Hello,
> 
> When I run this code:
> 
> program testcase
> 
> use mpi
> implicit none
> 
> integer :: rank, lsize, rsize, code
> integer :: intercomm
> 
> call MPI_INIT(code)
> 
> call MPI_COMM_GET_PARENT(intercomm, code)
> if (intercomm == MPI_COMM_NULL) then
> call MPI_COMM_SPAWN ("./testcase", MPI_ARGV_NULL, 1, MPI_INFO_NULL, &
>  0, MPI_COMM_WORLD, intercomm, MPI_ERRCODES_IGNORE, code)
> call MPI_COMM_RANK(MPI_COMM_WORLD, rank, code)
> call MPI_COMM_SIZE(MPI_COMM_WORLD, lsize, code)
> call MPI_COMM_SIZE(intercomm, rsize, code)
> if (rank == 0) then
> print *, 'from parent: local size is ', lsize
> print *, 'from parent: remote size is ', rsize
> end if
> else
> call MPI_COMM_SIZE(MPI_COMM_WORLD, lsize, code)
> call MPI_COMM_SIZE(intercomm, rsize, code)
> print *, 'from child: local size is ', lsize
> print *, 'from child: remote size is ', rsize
> end if
> 
> call MPI_FINALIZE (code)
> 
> end program testcase
> 
> I get the following results with openmpi 1.4.1 and two processes:
>  from parent: local size is2  
>
>  from parent: remote size is2 
>
>  from child: local size is1   
>
>  from child: remote size is1  
>
> 
> I would have expected:
>  from parent: local size is2  
> 
>  from parent: remote size is1 
>   
>  from child: local size is1   
> 
>  from child: remote size is2  
>  
> 
> Could anyone tell me what's going on ? It's not a fortran issue, I can also 
> replicate it using mpi4py.
> Probably related to the universe size: I haven't found a way to hand it to 
> mpirun.
> 
> Cheers,
> Pierre
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine

2011-01-22 Thread Avinash

Hello,
	I figured out the problem, which is described herein, it might be 
useful for someone else. The problem stems from ompi_local_slave option 
being set on its own in the MPI_Info structure. It seems that 
MPI_Info_create is using a shift or more likely a masking operation 
(depending upon the size of some type, which in turn depends upon the 
underlying architecture), which sets the ompi_local_slave bit to high. 
As a result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit 
set high, see plm_rsh_module.c (line 1065) for the problem. I took the 
easy solution and set the ompi_local_slave to "no" in the Info structure 
and that solves the problem. Maybe this needs further investigation.


Regards,

On 1/21/11 7:22 PM, Avinash Malik wrote:


Hello,

 I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
 architecture. I have a problem using MPI_Comm_spawn and
 MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
 (MPI_INFO_NULL) parameter. I get a segmentation fault. I have
 the exact same code running fine on a 32-bit machine. I cannot
 use the 64-bit openmpi due to problems with other software,
 which uses openmpi, but can only be compiled in the 32-bit mode.

 I am attaching all the information, in a .tgz file. The .tgz
 file consists of:

 (1) The c-code for a small example two files parent.c and
 child.c
 (2) The compile_command that I ran on a 64-bit machine.
 (3) The run command to run the system
compiling openmpi-1.5.1.
 (4) ompi_info_all
 (5) The error that I get, it's a segmentation fault.

Regards,








___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine

2011-01-22 Thread Jeff Squyres
Thanks for that info!

I was literally just digging into this myself; I am able to replicate the 
problem on a 1.5.1 tarball, but not on a nightly 1.5.2 snapshot tarball.  Would 
you mind trying to replicate the issue on a recent 1.5.2 snapshot?

http://www.open-mpi.org/nightly/v1.5/


On Jan 22, 2011, at 10:58 AM, Avinash wrote:

> Hello,
>   I figured out the problem, which is described herein, it might be 
> useful for someone else. The problem stems from ompi_local_slave option being 
> set on its own in the MPI_Info structure. It seems that MPI_Info_create is 
> using a shift or more likely a masking operation (depending upon the size of 
> some type, which in turn depends upon the underlying architecture), which 
> sets the ompi_local_slave bit to high. As a result, "jdata->controls" has 
> it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set high, see plm_rsh_module.c (line 
> 1065) for the problem. I took the easy solution and set the ompi_local_slave 
> to "no" in the Info structure and that solves the problem. Maybe this needs 
> further investigation.
> 
> Regards,
> 
> On 1/21/11 7:22 PM, Avinash Malik wrote:
>> 
>> Hello,
>> 
>> I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
>> architecture. I have a problem using MPI_Comm_spawn and
>> MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
>> (MPI_INFO_NULL) parameter. I get a segmentation fault. I have
>> the exact same code running fine on a 32-bit machine. I cannot
>> use the 64-bit openmpi due to problems with other software,
>> which uses openmpi, but can only be compiled in the 32-bit mode.
>> 
>> I am attaching all the information, in a .tgz file. The .tgz
>> file consists of:
>> 
>> (1) The c-code for a small example two files parent.c and
>> child.c
>> (2) The compile_command that I ran on a 64-bit machine.
>> (3) The run command to run the system
>>compiling openmpi-1.5.1.
>> (4) ompi_info_all
>> (5) The error that I get, it's a segmentation fault.
>> 
>> Regards,
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> 
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine

2011-01-22 Thread Jeff Squyres
Ah, wait -- based on your mail, I checked the MPI_Info-checking code in our 
underlying spawn implementation and basically found an uninitialized variable. 
Hence, it's probably non-deterministic behavior.

Patch coming to the development trunk soon; I'll get it QA checked by someone 
more expert in that code area than me, and if correct, I'll get it into the 
next releases of 1.5.x and 1.4.x.



On Jan 22, 2011, at 11:12 AM, Jeff Squyres wrote:

> Thanks for that info!
> 
> I was literally just digging into this myself; I am able to replicate the 
> problem on a 1.5.1 tarball, but not on a nightly 1.5.2 snapshot tarball.  
> Would you mind trying to replicate the issue on a recent 1.5.2 snapshot?
> 
>http://www.open-mpi.org/nightly/v1.5/
> 
> 
> On Jan 22, 2011, at 10:58 AM, Avinash wrote:
> 
>> Hello,
>>  I figured out the problem, which is described herein, it might be 
>> useful for someone else. The problem stems from ompi_local_slave option 
>> being set on its own in the MPI_Info structure. It seems that 
>> MPI_Info_create is using a shift or more likely a masking operation 
>> (depending upon the size of some type, which in turn depends upon the 
>> underlying architecture), which sets the ompi_local_slave bit to high. As a 
>> result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set 
>> high, see plm_rsh_module.c (line 1065) for the problem. I took the easy 
>> solution and set the ompi_local_slave to "no" in the Info structure and that 
>> solves the problem. Maybe this needs further investigation.
>> 
>> Regards,
>> 
>> On 1/21/11 7:22 PM, Avinash Malik wrote:
>>> 
>>> Hello,
>>> 
>>>I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit
>>>architecture. I have a problem using MPI_Comm_spawn and
>>>MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL
>>>(MPI_INFO_NULL) parameter. I get a segmentation fault. I have
>>>the exact same code running fine on a 32-bit machine. I cannot
>>>use the 64-bit openmpi due to problems with other software,
>>>which uses openmpi, but can only be compiled in the 32-bit mode.
>>> 
>>>I am attaching all the information, in a .tgz file. The .tgz
>>>file consists of:
>>> 
>>>(1) The c-code for a small example two files parent.c and
>>>child.c
>>>(2) The compile_command that I ran on a 64-bit machine.
>>>(3) The run command to run the system
>>>   compiling openmpi-1.5.1.
>>>(4) ompi_info_all
>>>(5) The error that I get, it's a segmentation fault.
>>> 
>>> Regards,
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> 
>>> ___
>>> users mailing list
>>> us...@open-mpi.org
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> ___
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> ___
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/