Re: [OMPI users] Oversubscribing a subset of a machine's cores
Hi, Thanks for the heads up Joseph, you sent me in the right direction. Very helpful indeed, although the command that seems to be doing the trick on my system is $taskset -c X ... Best regards, Torje Henriksen On Feb 7, 2008, at 2:47 PM, Joe Landman wrote: Torje Henriksen wrote: [...] Still, all eight cores are being used. I can see why you would want to use all cores, and I can see that oversubscribing a sub-set of the cores might seem silly. My question is, is it possible to do what I want to do without hacking the open mpi code? Could you get numactl to help you do what you want? That is, for the code, somehow tweak the launcher to run numactl --physcpubind=X ... or similar? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: land...@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi Tim, Many thanks for the fix! Everything works fine now with the current trunk version. Best regards, stefan | Tim Prins wrote: | The fix I previously sent to the list has been committed in r17400. | | Thanks, | | Tim | Tim Prins wrote: | Hi Stefan, | | I was able to verify the problem. Turns out this is a problem with other | onesided operations as well. Attached is a simple test case I made in c | using MPI_Put that also fails. | | The problem is that the target count and displacements are both sent as | signed 32 bit integers. Then, the receiver multiplies them together and | adds them to the window base. However, this multiplication is done using | the signed 32 bit integers, which overflows. This is then added to the | 64 bit pointer. This, of course, results in a bad address. | | I have attached a patch against a recent development version that fixes | this for me. I am also copying Brian Barrett, who did all the work on | the onesided code. | | Brian: if possible, please take a look at the attached patch and test case. | | Thanks for the report! | | Tim Prins | | Stefan Knecht wrote: | Hi all, | | I encounter a problem with the routine MPI_ACCUMULATE trying to sum up | MPI_REAL8's on a large memory window with a large offset. | My program running (on a single processor, x86_64 architecture) | crashes with | an error message like: | | [node14:16236] *** Process received signal *** | [node14:16236] Signal: Segmentation fault (11) | [node14:16236] Signal code: Address not mapped (1) | [node14:16236] Failing at address: 0x2aaa32b16000 | [node14:16236] [ 0] /lib64/libpthread.so.0 [0x32e080de00] | [node14:16236] [ 1] | /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(ompi_mpi_op_sum_double+0x10) | [0x2af15530] | [node14:16236] [ 2] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_process_op+0x2d7) | | [0x2aaab1a47257] | [node14:16236] [ 3] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so | [0x2aaab1a45432] | [node14:16236] [ 4] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0x93) | | [0x2aaab1a48243] | [node14:16236] [ 5] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so | [0x2aaab1a43436] | [node14:16236] [ 6] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0xff) | | [0x2aaab1a42e0f] | [node14:16236] [ 7] | /home/stefan/bin/openmpi-1.2.5/lib/libopen-pal.so.0(opal_progress+0x4a) | [0x2b3dfa0a] | [node14:16236] [ 8] | /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_module_unlock+0x2a9) | | [0x2aaab1a48629] | [node14:16236] [ 9] | /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(PMPI_Win_unlock+0xe1) | [0x2af4a291] | [node14:16236] [10] | /home/stefan/bin/openmpi-1.2.5/lib/libmpi_f77.so.0(mpi_win_unlock_+0x25) | [0x2acdd8c5] | [node14:16236] [11] /home/stefan/calc/mpi2_test/a.out(MAIN__+0x809) | [0x401851] | [node14:16236] [12] /home/stefan/calc/mpi2_test/a.out(main+0xe) | [0x401bbe] | [node14:16236] [13] /lib64/libc.so.6(__libc_start_main+0xf4) | [0x32dfc1dab4] | [node14:16236] [14] /home/stefan/calc/mpi2_test/a.out [0x400f99] | [node14:16236] *** End of error message *** | mpirun noticed that job rank 0 with PID 16236 on node node14 exited on | signal 11 (Segmentation fault). | | | The relevant part of my FORTRAN source code reads as: | | ~ program accumulate_test | ~ IMPLICIT REAL*8 (A-H,O-Z) | ~ include 'mpif.h' | ~ INTEGER(KIND=MPI_OFFSET_KIND) MX_SIZE_M | C dummy size parameter | ~ PARAMETER (MX_SIZE_M = 1 000 000) | ~ INTEGER MPIerr, MYID, NPROC | ~ INTEGER ITARGET, MY_X_WIN, JCOUNT, JCOUNT_T | ~ INTEGER(KIND=MPI_ADDRESS_KIND) MEM_X, MEM_Y | ~ INTEGER(KIND=MPI_ADDRESS_KIND) IDISPL_WIN | ~ INTEGER(KIND=MPI_ADDRESS_KIND) PTR1, PTR2 | ~ INTEGER(KIND=MPI_INTEGER_KIND) ISIZE_REAL8 | ~ INTEGER*8 NELEMENT_X, NELEMENT_Y | ~ POINTER (PTR1, XMAT(MX_SIZE_M)) | ~ POINTER (PTR2, YMAT(MX_SIZE_M)) | C | ~ CALL MPI_INIT( MPIerr ) | ~ CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, MPIerr) | ~ CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NPROC, MPIerr) | C | ~ NELEMENT_X = 400 000 000 | ~ NELEMENT_Y = 10 000 | C | ~ CALL MPI_TYPE_EXTENT(MPI_REAL8, ISIZE_REAL8, MPIerr) | ~ MEM_X = NELEMENT_X * ISIZE_REAL8 | ~ MEM_Y = NELEMENT_Y * ISIZE_REAL8 | C | C allocate memory | C | ~ CALL MPI_ALLOC_MEM( MEM_X, MPI_INFO_NULL, PTR1, MPIerr) | ~ CALL MPI_ALLOC_MEM( MEM_Y, MPI_INFO_NULL, PTR2, MPIerr) | C | C fill vectors with 0.0D0 and 1.0D0 | C | ~ CALL DZERO(XMAT,NELEMENT_X) | ~ CALL DONE(YMAT,NELEMENT_Y) | C | C open memory window | C | ~ CALL MPI_WIN_CREATE( XMAT, MEM_X, ISIZE_REAL8, | ~ & MPI_INFO_NULL, MPI_COMM_WORLD, | ~ & MY_X_WIN, MPIerr ) | C lock window (MPI_LOCK_SHARED mode) | C select target ==
[OMPI users] Info needed for building Open-MPI against external ROMIO
We have a number of patches and files to be added to ROMIO to make it work with recent releases of the Panasas file system. We have reached a point where the stock ROMIO included in Open-MPI no longer works for what we need. I know that the version of ROMIO forged into the bowels of OMPI is a beast to try and patch or mend so that is something we won't attempt. Thus we have two choices here at LANL. Either we drop support and no longer provide OMPI to our user community and switch to MVAPICH2 for our only MPI on systems, or we can try and build OMPI against an externally maintained ROMIO. In an August 2007 email Jeff Squyres hinted that there is a way to do the latter: | Continual re-integration of ROMIO is definitely a logistics problem | that we have not solved. And it's becoming a bigger problem. :-( | | Normally, we're quite open to accepting patches to Open MPI to put | them into the main distribution to ease the whole "millions of MPI | distros" issue, but with ROMIO it becomes quite difficult because we | have to source from Argonne's copy. Trying to manage what patches | need to go in is already quite difficult because: | | - ROMIO is not on our release schedule | - OMPI adds its own integration patches to ROMIO | - All the OMPI developers have other work to do ;-) | | Adding 3rd party patches in there for something that we already know | is complex and understaffed has unfortunately been a low priority. :-( | | One thing that may make things a little better is that Brian recently | integrated some work onto the OMPI trunk that allows ROMIO to be | built outside of OMPI. Hence, if you have a standalone ROMIO, OMPI | can use it. I don't know the details (i.e., if you can still use | mpi.h / MPI_Request / MPI_Test / MPI_Wait like you can with the | default OMPI ROMIO integration) -- Brian will have to chime in here... | | So I don't know what the real solution is here -- I'm just trying to | give some of the OMPI perspective. Suggestions are welcome. | Probably the best solution would be someone to volunteer to actually | spend the cycles to maintain ROMIO in Open MPI (I am pretty sure that | Brian simply does not have them)... | | -- | Jeff Squyres | Cisco Systems Since Brian no longer works on these issues I'm wondering if and how it is possible. Thanks, david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboraotry
Re: [OMPI users] Info needed for building Open-MPI against external ROMIO
David - It looks like the code to do this was never pushed into the v1.2 release (although it is in the trunk). I have no idea what time frame you're looking at, but if you need an updated ROMIO before 1.3 is available, someone would need to bring over the changes and do a 1.2.6 release. In v1.3, you'll be able to use the --disable-mpi-io option to configure to completely remove any traces of MPI I/O support from the stock Open MPI build (so that you could have an external ROMIO package). Brian On Mon, 11 Feb 2008, David Gunter wrote: We have a number of patches and files to be added to ROMIO to make it work with recent releases of the Panasas file system. We have reached a point where the stock ROMIO included in Open-MPI no longer works for what we need. I know that the version of ROMIO forged into the bowels of OMPI is a beast to try and patch or mend so that is something we won't attempt. Thus we have two choices here at LANL. Either we drop support and no longer provide OMPI to our user community and switch to MVAPICH2 for our only MPI on systems, or we can try and build OMPI against an externally maintained ROMIO. In an August 2007 email Jeff Squyres hinted that there is a way to do the latter: | Continual re-integration of ROMIO is definitely a logistics problem | that we have not solved. And it's becoming a bigger problem. :-( | | Normally, we're quite open to accepting patches to Open MPI to put | them into the main distribution to ease the whole "millions of MPI | distros" issue, but with ROMIO it becomes quite difficult because we | have to source from Argonne's copy. Trying to manage what patches | need to go in is already quite difficult because: | | - ROMIO is not on our release schedule | - OMPI adds its own integration patches to ROMIO | - All the OMPI developers have other work to do ;-) | | Adding 3rd party patches in there for something that we already know | is complex and understaffed has unfortunately been a low priority. :-( | | One thing that may make things a little better is that Brian recently | integrated some work onto the OMPI trunk that allows ROMIO to be | built outside of OMPI. Hence, if you have a standalone ROMIO, OMPI | can use it. I don't know the details (i.e., if you can still use | mpi.h / MPI_Request / MPI_Test / MPI_Wait like you can with the | default OMPI ROMIO integration) -- Brian will have to chime in here... | | So I don't know what the real solution is here -- I'm just trying to | give some of the OMPI perspective. Suggestions are welcome. | Probably the best solution would be someone to volunteer to actually | spend the cycles to maintain ROMIO in Open MPI (I am pretty sure that | Brian simply does not have them)... | | -- | Jeff Squyres | Cisco Systems Since Brian no longer works on these issues I'm wondering if and how it is possible. Thanks, david -- David Gunter HPC-3: Parallel Tools Team Los Alamos National Laboraotry ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
[OMPI users] trouble building
HI, I checked FAQ very carefully twice, googled extensivelly and cannot find the error I am trying to reproduce. It is a vanilla linux system with ifort compile so it really should work. I configure with F77=ifort FC=ifort FFLAGS=-O FCFLAGS=-O ./configure --prefix=/home/slosar/local/ --enable-static which works fine, no errors reported. Then on make it tells me: make[3]: Entering directory `/group/cosmos/slosar/openmpi-1.2.4/opal/libltdl' make[3]: *** No rule to make target `lt__strl/home/slosar/util.lo', needed by `libltdlc.la'. Stop. This looks essentially like a makefile problem, or am I missing something? Any help much appreciated. Best, a phone: +1 (510) 495 2488, mobile: +1 (510) 289 9395, fax: +1 (510) 486 7149 -- "Laugh and the world laughs with you. Smile and they wonder what you are up to."