Re: [OMPI users] MPI_Comm_Spawn intercommunication
Try using MPI_COMM_REMOTE_SIZE to get the size of the remote group in an intercommunicator. MPI_COMM_SIZE returns the size of the local group. On Jan 7, 2011, at 6:22 PM, Pierre Chanial wrote: > Hello, > > When I run this code: > > program testcase > > use mpi > implicit none > > integer :: rank, lsize, rsize, code > integer :: intercomm > > call MPI_INIT(code) > > call MPI_COMM_GET_PARENT(intercomm, code) > if (intercomm == MPI_COMM_NULL) then > call MPI_COMM_SPAWN ("./testcase", MPI_ARGV_NULL, 1, MPI_INFO_NULL, & > 0, MPI_COMM_WORLD, intercomm, MPI_ERRCODES_IGNORE, code) > call MPI_COMM_RANK(MPI_COMM_WORLD, rank, code) > call MPI_COMM_SIZE(MPI_COMM_WORLD, lsize, code) > call MPI_COMM_SIZE(intercomm, rsize, code) > if (rank == 0) then > print *, 'from parent: local size is ', lsize > print *, 'from parent: remote size is ', rsize > end if > else > call MPI_COMM_SIZE(MPI_COMM_WORLD, lsize, code) > call MPI_COMM_SIZE(intercomm, rsize, code) > print *, 'from child: local size is ', lsize > print *, 'from child: remote size is ', rsize > end if > > call MPI_FINALIZE (code) > > end program testcase > > I get the following results with openmpi 1.4.1 and two processes: > from parent: local size is2 > > from parent: remote size is2 > > from child: local size is1 > > from child: remote size is1 > > > I would have expected: > from parent: local size is2 > > from parent: remote size is1 > > from child: local size is1 > > from child: remote size is2 > > > Could anyone tell me what's going on ? It's not a fortran issue, I can also > replicate it using mpi4py. > Probably related to the universe size: I haven't found a way to hand it to > mpirun. > > Cheers, > Pierre > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine
Hello, I figured out the problem, which is described herein, it might be useful for someone else. The problem stems from ompi_local_slave option being set on its own in the MPI_Info structure. It seems that MPI_Info_create is using a shift or more likely a masking operation (depending upon the size of some type, which in turn depends upon the underlying architecture), which sets the ompi_local_slave bit to high. As a result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set high, see plm_rsh_module.c (line 1065) for the problem. I took the easy solution and set the ompi_local_slave to "no" in the Info structure and that solves the problem. Maybe this needs further investigation. Regards, On 1/21/11 7:22 PM, Avinash Malik wrote: Hello, I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit architecture. I have a problem using MPI_Comm_spawn and MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL (MPI_INFO_NULL) parameter. I get a segmentation fault. I have the exact same code running fine on a 32-bit machine. I cannot use the 64-bit openmpi due to problems with other software, which uses openmpi, but can only be compiled in the 32-bit mode. I am attaching all the information, in a .tgz file. The .tgz file consists of: (1) The c-code for a small example two files parent.c and child.c (2) The compile_command that I ran on a 64-bit machine. (3) The run command to run the system compiling openmpi-1.5.1. (4) ompi_info_all (5) The error that I get, it's a segmentation fault. Regards, ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine
Thanks for that info! I was literally just digging into this myself; I am able to replicate the problem on a 1.5.1 tarball, but not on a nightly 1.5.2 snapshot tarball. Would you mind trying to replicate the issue on a recent 1.5.2 snapshot? http://www.open-mpi.org/nightly/v1.5/ On Jan 22, 2011, at 10:58 AM, Avinash wrote: > Hello, > I figured out the problem, which is described herein, it might be > useful for someone else. The problem stems from ompi_local_slave option being > set on its own in the MPI_Info structure. It seems that MPI_Info_create is > using a shift or more likely a masking operation (depending upon the size of > some type, which in turn depends upon the underlying architecture), which > sets the ompi_local_slave bit to high. As a result, "jdata->controls" has > it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set high, see plm_rsh_module.c (line > 1065) for the problem. I took the easy solution and set the ompi_local_slave > to "no" in the Info structure and that solves the problem. Maybe this needs > further investigation. > > Regards, > > On 1/21/11 7:22 PM, Avinash Malik wrote: >> >> Hello, >> >> I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit >> architecture. I have a problem using MPI_Comm_spawn and >> MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL >> (MPI_INFO_NULL) parameter. I get a segmentation fault. I have >> the exact same code running fine on a 32-bit machine. I cannot >> use the 64-bit openmpi due to problems with other software, >> which uses openmpi, but can only be compiled in the 32-bit mode. >> >> I am attaching all the information, in a .tgz file. The .tgz >> file consists of: >> >> (1) The c-code for a small example two files parent.c and >> child.c >> (2) The compile_command that I ran on a 64-bit machine. >> (3) The run command to run the system >>compiling openmpi-1.5.1. >> (4) ompi_info_all >> (5) The error that I get, it's a segmentation fault. >> >> Regards, >> >> >> >> >> >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] A problem with running a 32-bit program on a 64-bit machine
Ah, wait -- based on your mail, I checked the MPI_Info-checking code in our underlying spawn implementation and basically found an uninitialized variable. Hence, it's probably non-deterministic behavior. Patch coming to the development trunk soon; I'll get it QA checked by someone more expert in that code area than me, and if correct, I'll get it into the next releases of 1.5.x and 1.4.x. On Jan 22, 2011, at 11:12 AM, Jeff Squyres wrote: > Thanks for that info! > > I was literally just digging into this myself; I am able to replicate the > problem on a 1.5.1 tarball, but not on a nightly 1.5.2 snapshot tarball. > Would you mind trying to replicate the issue on a recent 1.5.2 snapshot? > >http://www.open-mpi.org/nightly/v1.5/ > > > On Jan 22, 2011, at 10:58 AM, Avinash wrote: > >> Hello, >> I figured out the problem, which is described herein, it might be >> useful for someone else. The problem stems from ompi_local_slave option >> being set on its own in the MPI_Info structure. It seems that >> MPI_Info_create is using a shift or more likely a masking operation >> (depending upon the size of some type, which in turn depends upon the >> underlying architecture), which sets the ompi_local_slave bit to high. As a >> result, "jdata->controls" has it's ORTE_JOB_CONTROL_LOCAL_SLAVE bit set >> high, see plm_rsh_module.c (line 1065) for the problem. I took the easy >> solution and set the ompi_local_slave to "no" in the Info structure and that >> solves the problem. Maybe this needs further investigation. >> >> Regards, >> >> On 1/21/11 7:22 PM, Avinash Malik wrote: >>> >>> Hello, >>> >>>I have compiled openmpi-1.5.1 as a 32-bit binary on a 64-bit >>>architecture. I have a problem using MPI_Comm_spawn and >>>MPI_Comm_spawn_multiple, when MPI_Info is used as a non NULL >>>(MPI_INFO_NULL) parameter. I get a segmentation fault. I have >>>the exact same code running fine on a 32-bit machine. I cannot >>>use the 64-bit openmpi due to problems with other software, >>>which uses openmpi, but can only be compiled in the 32-bit mode. >>> >>>I am attaching all the information, in a .tgz file. The .tgz >>>file consists of: >>> >>>(1) The c-code for a small example two files parent.c and >>>child.c >>>(2) The compile_command that I ran on a 64-bit machine. >>>(3) The run command to run the system >>> compiling openmpi-1.5.1. >>>(4) ompi_info_all >>>(5) The error that I get, it's a segmentation fault. >>> >>> Regards, >>> >>> >>> >>> >>> >>> >>> >>> >>> ___ >>> users mailing list >>> us...@open-mpi.org >>> http://www.open-mpi.org/mailman/listinfo.cgi/users >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/