[OMPI users] Oversubscribing a subset of a machine's cores
Hi, I have a slightly odd problem, that you might not think is important at all. Anyways, here it goes: I'm using a single eight-core machine. I want to oversubscribe four of the cores and leave the other four idle. My approach is to make a hostfile: localhost slot=4 # shouldn't this limit the core count to 4? and run the command: $mpirun -np 8 --hostfile my_hostfile ./my_mpiprog or the command: $mpirun -np 8 --host localhost,localhost,localhost,localhost ./ my_mpiprog Still, all eight cores are being used. I can see why you would want to use all cores, and I can see that oversubscribing a sub-set of the cores might seem silly. My question is, is it possible to do what I want to do without hacking the open mpi code? Guess I just wanted to know is there is a solution I overlooked before I start hacking like a madman :) Thanks Torje Henriksen
Re: [OMPI users] MPI_CART_CREATE and Fortran 90 Interface
Michal - You are absolutely right; sorry about that. I have fixed the bug in the OMPI development trunk which means that it will be incorporated in the upcoming v1.3 series (see https://svn.open-mpi.org/trac/ompi/changeset/17395) . I also filed a change request for the v1.2 branch; if we ever do a v1.2.6 release, this fix will be included in it. Thanks for reporting this problem! On Feb 6, 2008, at 8:49 AM, Michal Charemza wrote: Hi, I'm having a bit of trouble getting MPI_CART_CREATE to work with the Fortran 90 Interface, i.e. if I "use mpi", I get an error at compilation time (of my program) of: There is no specific subroutine for the generic 'mpi_cart_create' However, if I include mpif.h this error does not occur. After looking into the source, I see that in my mpi-f90-interfaces.h, there is a part: interface MPI_Cart_create subroutine MPI_Cart_create(old_comm, ndims, dims, periods, reorder, & comm_cart, ierr) integer, intent(in) :: old_comm integer, intent(in) :: ndims integer, dimension(*), intent(in) :: dims integer, dimension(*), intent(in) :: periods integer, intent(in) :: reorder integer, intent(out) :: comm_cart integer, intent(out) :: ierr end subroutine MPI_Cart_create end interface MPI_Cart_create I thought according to the MPI specs, periods should be a logical array, and reorder should be a logical scalar. Is this a bug in the Fortran 90 interface? Michal. ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)
The whole question of how to invoke xterms for gdb via mpirun keeps coming up, so when this thread is done, I'll add a pile of this information to the FAQ. More below. On Feb 6, 2008, at 10:52 AM, jody wrote: I now solved the "ssh" part of my Problem The XServer is being started with the nolisten option (thanks Allen). In Fedora (Gnome) this can easily be changed by choosing the the "Login Screen" tool from the System|Administration Menu. There, under the tab "Security", remove the checkmark from "Deny TCP connections from xserver" Of course, this needs root access - fortunately, i am the boss of my computer ;) Additionally, at least the port 6000 should be open. This leaves me with my second problem $mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm -hold -e ./MPITest Opens 2 xterms from nano (remote) and 3 xterms from plankton(local). The local screens display the message: ./MPITest: error while loading shared libraries: libmpi_cxx.so.0: cannot open shared object file: No such file or directory Which is unbelievably strange, since for all xterms (local & remote) the output of $mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm -hold -e printenv contains the PATH variable containing the path to openmpi/bin and the LD_LIBRARY_PATH containing the path to openmpi/lib The results of these two commands do seem to contradict each other; hmm. Just to be absolutely sure, did you cut-n-paste the LD_LIBRARY_PATH directory output from printenv and try to "ls" it to ensure that it's completely spelled right, etc.? I suspect that it's right since your other commands work, but at this point, it's worth checking the "obvious" things as well... What shell are you using? You might want to add some echo statements to your shell startup scripts to ensure that all the right parts are being run in each of the cases -- perhaps, for some weird reason, they aren't in the problematic cases...? [shrug] Doing $mpirun -np 5 -hostfile testhosts -x DISPLAY=plankton:0.0 xterm -hold -e locate libmpi_cxx returns on all xterms (local & remote) /opt/openmpi/lib/libmpi_cxx.la /opt/openmpi/lib/libmpi_cxx.so /opt/openmpi/lib/libmpi_cxx.so.0 /opt/openmpi/lib/libmpi_cxx.so.0.0.0 On the other hand, the application has no problem when being called without xterms: $mpirun -np 5 -hostfile testhosts ./MPITest Does anybody have an idea why that should happen? Thanks Jody ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Infinipath context limit
On Wed, 6 Feb 2008, Christian Bell wrote: > Hi Daniel -- > > PSM should determine your node setup and enable shared contexts > accordingly, but it looks like something isn't working right. You > can apply the patch I've attached to this e-mail and things should > work again. Alas, it doesn't compile (patch was applied to OpenMPI 1.2.5): mtl_psm.c(109): error: struct "orte_proc_info_t" has no field "num_local_procs" if (orte_process_info.num_local_procs > 0) { ^ mtl_psm.c(111): error: struct "orte_proc_info_t" has no field "num_local_procs" snprintf(buf, sizeof buf - 1, "%d", orte_process_info.num_local_procs); ^ mtl_psm.c(113): error: struct "orte_proc_info_t" has no field "local_rank" snprintf(buf, sizeof buf - 1, "%d", orte_process_info.local_rank); ^ compilation aborted for mtl_psm.c (code 2) > However, it would be useful to identify what's going wrong. Can > you compile a hello world program and run it with the machinefile > you're trying to use. Send me the output from: > > mpirun -machinefile env PSM_TRACEMASK=0x101 ./hello_world > > I understand your failure mode only if somehow the 8-core node is > detected to be a 4-core node. The output should tell us this. Attached. It seems it does try to enable context sharing but for some reason /dev/ipath still returns a busy code. Daniëlnode017.23692env IPATH_DISABLE_MMAP_MALLOC Disable mmap for malloc() => NO node017.23692env IPATH_NO_CPUAFFINITY Prevent PSM from setting affinity => NO node017.23692env IPATH_UNITDevice Unit number (-1 autodetects) => -1 node017.23692env PSM_DEVICES Ordered list of PSM-level devices => shm,ipath (default was self,shm,ipath) node017.23692psmi_parse_devices: PSM Device allocation order: amsh,ips node017.23692env PSM_MEMORYMemory usage mode (normal or large) => normal node017.23692env PSM_SHAREDCONTEXTSEnable shared contexts => YES (default was YES) node017.23692ipath_setaffinity: PORT_INFO returned unit_id=0/1,port=1/4,hwports=4,subport=0/0,nproc=8 node017.23692ipath_setaffinity: Set CPU affinity to 0, port 0:1:0 (1 active chips) node017.23692ipath_userinit: Driver is not QLogic-built node017.23692ipath_userinit: Runtime flags are 0x46, explicit mallopt mmap disable in malloc is off node017.23692psmi_port_open: Opened port 1.0 on device /dev/ipath (LID=14,epid=e0001,flags=46) node017.23692env PSM_RCVTHREAD Recv thread flags (0 disables thread)=> 0x1 node017:1.0.env PSM_MQ_SENDREQS_MAX Max num of isend requests in flight => 1048576 node017:1.0.env PSM_MQ_RECVREQS_MAX Max num of irecv requests in flight => 1048576 node017:1.0.env PSM_MQ_RNDV_IPATH_THRESH ipath eager-to-rendezvous switchover => 64000 node017:1.0.env PSM_MQ_RNDV_SHM_THRESHshm eager-to-rendezvous switchover => 16000 node017:1.0.ips_spio_init: PIO copy uses forced ordering node017:1.0.env PSM_TID Tid proto flags (0 disables protocol)=> 0x1 node017:1.0.ips_protoexp_init: Tid control message settings: timeout min=200us/max=1000us, interrupt when trying attempt #2 node017:1.0.ips_proto_init: Tid error control: warning every 30 secs, fatal error after 250 tid errors node017:1.0.ips_proto_init: Ethernet Host IP=10.141.0.17 and PID=23692 node017:1.0.psmi_shm_attach: Registered as master to key /psm_shm.d999e196-868e-c6e6-0d4a-bc2c78de85f1 node017:1.0.psmi_shm_attach: Mapped shm control object at 0x2b25a000 node017:1.0.psmi_shm_attach: Mapped and initalized shm object control page at 0x2b25a000,size=4096 node017:1.0.psmi_shm_attach: Grabbed shmidx 0 node017:1.0.amsh_init_segment: Grew shared segment for 1 procs, size=5.93 MB node017:1.0.am_remap_segment: shm segment remap from 0x2b25a000..4096 to 0x2aaab26b3000..6217728 (relocated=YES) node017:1.0.ips_ptl_pollintr: Enabled communication thread on URG packets node017.23691env IPATH_DISABLE_MMAP_MALLOC Disable mmap for malloc() => NO node017.23691env IPATH_NO_CPUAFFINITY Prevent PSM from setting affinity => NO node017.23691env IPATH_UNITDevice Unit number (-1 autodetects) => -1 node017.23691env PSM_DEVICES Ordered list of PSM-level devices => shm,ipath (default was self,shm,ipath) node017.23691psmi_parse_devices: PSM Device allocation order: amsh,ips node017.23691env PSM_MEMORYMemory usage mode (normal or large) => normal node017.23691env PSM_SHAREDCONTEXTSEnable shared contexts => YES (default was YES) node017.23691ipath_setaffinity: PORT_INFO returned unit_id=0/1,port=2/4,hwports=4,subport=0/0,nproc=8 node017.2
Re: [OMPI users] Oversubscribing a subset of a machine's cores
Torje Henriksen wrote: [...] Still, all eight cores are being used. I can see why you would want to use all cores, and I can see that oversubscribing a sub-set of the cores might seem silly. My question is, is it possible to do what I want to do without hacking the open mpi code? Could you get numactl to help you do what you want? That is, for the code, somehow tweak the launcher to run numactl --physcpubind=X ... or similar? -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: land...@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615
Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)
Hi Jeff > The results of these two commands do seem to contradict each other; > hmm. Just to be absolutely sure, did you cut-n-paste the > LD_LIBRARY_PATH directory output from printenv and try to "ls" it to > ensure that it's completely spelled right, etc.? I suspect that it's > right since your other commands work, but at this point, it's worth > checking the "obvious" things as well... I wrote a little command called envliblist which consists of this line: printenv | grep PATH | gawk -F "_PATH=" '{ print $2 }' | gawk -F ":" '{ print $1 }' | xargs ls -al When i do mpirun -np 5 -hostfile testhosts -x DISPLAY xterm -hold -e ./envliblist all xterms (local & remote) display the contents of the openmpi/lib directory. Another strange result: I have a shell script for launching the debugger in an xterm: [jody]:/mnt/data1/neander:$cat run_gdb.sh #!/bin/sh # # save the program name export PROG="$1" # shift away program name (leaves program params) shift # create a command file for gdb, to start it automatically echo run $* > gdb.cmd # do the term xterm -e gdb -x gdb.cmd $PROG exit 0 When i run mpirun -np 5 --hostfile testhosts -x DISPLAY ./run_gdb.sh ./MPITest it works! Just to compare mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e ./MPITest does not work. I notice the only difference between the to above commands is that in the run_gdb script xterm has no "-hold" parameter! Indeed, mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -e ./MPITest does work. To actually see that it works (MPITest is simple Hello MPI app) i had to do mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e "./MPITest >> output.txt" and check output.txt. Does anybody have an explanation for this weird happening? Jody
Re: [OMPI users] mpirun, paths and xterm again (xserver problem solved; library problem still there)
On Feb 7, 2008, at 10:07 AM, jody wrote: I wrote a little command called envliblist which consists of this line: printenv | grep PATH | gawk -F "_PATH=" '{ print $2 }' | gawk -F ":" '{ print $1 }' | xargs ls -al When i do mpirun -np 5 -hostfile testhosts -x DISPLAY xterm -hold -e ./ envliblist all xterms (local & remote) display the contents of the openmpi/lib directory. Ok, good. Another strange result: I have a shell script for launching the debugger in an xterm: [jody]:/mnt/data1/neander:$cat run_gdb.sh #!/bin/sh # # save the program name export PROG="$1" # shift away program name (leaves program params) shift # create a command file for gdb, to start it automatically echo run $* > gdb.cmd # do the term xterm -e gdb -x gdb.cmd $PROG exit 0 When i run mpirun -np 5 --hostfile testhosts -x DISPLAY ./run_gdb.sh ./MPITest it works! Just to compare mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e ./MPITest does not work. It seems that if you launch shell scripts, things work. But if you run xterm without a shell script, it does not work. I do not think it is a difference of -hold vs. no -hold. Indeed, I can run both of these commands just fine on my system: % mpirun -np 1 --hostfile h -x DISPLAY=.cisco.com:0 xterm - hold -e gdb ~/mpi/hello % mpirun -np 1 --hostfile h -x DISPLAY=.cisco.com:0 xterm -e gdb ~/mpi/hello Note that my setup is a little different than yours; I'm using a Mac laptop and ssh'ing to a server where I'm invoking mpirun. The hostfile "h" contains a 2nd server where xterm/gdb/hello are running. I notice the only difference between the to above commands is that in the run_gdb script xterm has no "-hold" parameter! Indeed, mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -e ./MPITest does work. To actually see that it works (MPITest is simple Hello MPI app) i had to do mpirun -np 5 --hostfile testhosts -x DISPLAY xterm -hold -e "./MPITest >> output.txt" and check output.txt. Does anybody have an explanation for this weird happening? Jody ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5
Hi Stefan, I was able to verify the problem. Turns out this is a problem with other onesided operations as well. Attached is a simple test case I made in c using MPI_Put that also fails. The problem is that the target count and displacements are both sent as signed 32 bit integers. Then, the receiver multiplies them together and adds them to the window base. However, this multiplication is done using the signed 32 bit integers, which overflows. This is then added to the 64 bit pointer. This, of course, results in a bad address. I have attached a patch against a recent development version that fixes this for me. I am also copying Brian Barrett, who did all the work on the onesided code. Brian: if possible, please take a look at the attached patch and test case. Thanks for the report! Tim Prins Stefan Knecht wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I encounter a problem with the routine MPI_ACCUMULATE trying to sum up MPI_REAL8's on a large memory window with a large offset. My program running (on a single processor, x86_64 architecture) crashes with an error message like: [node14:16236] *** Process received signal *** [node14:16236] Signal: Segmentation fault (11) [node14:16236] Signal code: Address not mapped (1) [node14:16236] Failing at address: 0x2aaa32b16000 [node14:16236] [ 0] /lib64/libpthread.so.0 [0x32e080de00] [node14:16236] [ 1] /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(ompi_mpi_op_sum_double+0x10) [0x2af15530] [node14:16236] [ 2] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_process_op+0x2d7) [0x2aaab1a47257] [node14:16236] [ 3] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so [0x2aaab1a45432] [node14:16236] [ 4] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0x93) [0x2aaab1a48243] [node14:16236] [ 5] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so [0x2aaab1a43436] [node14:16236] [ 6] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0xff) [0x2aaab1a42e0f] [node14:16236] [ 7] /home/stefan/bin/openmpi-1.2.5/lib/libopen-pal.so.0(opal_progress+0x4a) [0x2b3dfa0a] [node14:16236] [ 8] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_module_unlock+0x2a9) [0x2aaab1a48629] [node14:16236] [ 9] /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(PMPI_Win_unlock+0xe1) [0x2af4a291] [node14:16236] [10] /home/stefan/bin/openmpi-1.2.5/lib/libmpi_f77.so.0(mpi_win_unlock_+0x25) [0x2acdd8c5] [node14:16236] [11] /home/stefan/calc/mpi2_test/a.out(MAIN__+0x809) [0x401851] [node14:16236] [12] /home/stefan/calc/mpi2_test/a.out(main+0xe) [0x401bbe] [node14:16236] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x32dfc1dab4] [node14:16236] [14] /home/stefan/calc/mpi2_test/a.out [0x400f99] [node14:16236] *** End of error message *** mpirun noticed that job rank 0 with PID 16236 on node node14 exited on signal 11 (Segmentation fault). The relevant part of my FORTRAN source code reads as: ~ program accumulate_test ~ IMPLICIT REAL*8 (A-H,O-Z) ~ include 'mpif.h' ~ INTEGER(KIND=MPI_OFFSET_KIND) MX_SIZE_M C dummy size parameter ~ PARAMETER (MX_SIZE_M = 1 000 000) ~ INTEGER MPIerr, MYID, NPROC ~ INTEGER ITARGET, MY_X_WIN, JCOUNT, JCOUNT_T ~ INTEGER(KIND=MPI_ADDRESS_KIND) MEM_X, MEM_Y ~ INTEGER(KIND=MPI_ADDRESS_KIND) IDISPL_WIN ~ INTEGER(KIND=MPI_ADDRESS_KIND) PTR1, PTR2 ~ INTEGER(KIND=MPI_INTEGER_KIND) ISIZE_REAL8 ~ INTEGER*8 NELEMENT_X, NELEMENT_Y ~ POINTER (PTR1, XMAT(MX_SIZE_M)) ~ POINTER (PTR2, YMAT(MX_SIZE_M)) C ~ CALL MPI_INIT( MPIerr ) ~ CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, MPIerr) ~ CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NPROC, MPIerr) C ~ NELEMENT_X = 400 000 000 ~ NELEMENT_Y = 10 000 C ~ CALL MPI_TYPE_EXTENT(MPI_REAL8, ISIZE_REAL8, MPIerr) ~ MEM_X = NELEMENT_X * ISIZE_REAL8 ~ MEM_Y = NELEMENT_Y * ISIZE_REAL8 C C allocate memory C ~ CALL MPI_ALLOC_MEM( MEM_X, MPI_INFO_NULL, PTR1, MPIerr) ~ CALL MPI_ALLOC_MEM( MEM_Y, MPI_INFO_NULL, PTR2, MPIerr) C C fill vectors with 0.0D0 and 1.0D0 C ~ CALL DZERO(XMAT,NELEMENT_X) ~ CALL DONE(YMAT,NELEMENT_Y) C C open memory window C ~ CALL MPI_WIN_CREATE( XMAT, MEM_X, ISIZE_REAL8, ~ & MPI_INFO_NULL, MPI_COMM_WORLD, ~ & MY_X_WIN, MPIerr ) C lock window (MPI_LOCK_SHARED mode) C select target ==> if itarget == myid: no 1-sided communication C ~ ITARGET = MYID ~ CALL MPI_WIN_LOCK( MPI_LOCK_SHARED, ITARGET, MPI_MODE_NOCHECK, ~ & MY_X_WIN, MPIerr) C C transfer data to target ITARGET C ~ JCOUNT_T = 10 000 ~ JCOUNT = JCOUNT_T C set displacement in memory window ~ IDISPL_WIN = 300 000 000 C ~ CALL MPI_ACCUMULATE( YMAT, JCOUNT, MPI_REAL8, ITARGET, IDISPL_WIN, ~ &
Re: [OMPI users] process placement with toruqe and OpenMP
Hi Brock, As far as I know there is no way to do this with Open MPI and torque. I believe people usually use hostfiles to do this sort of thing, but hostfiles do not work with torque. You may want to look into the launcher commands to see if torque will do it for you. Slurm has an option '--cpus-per-task', but I just realized we don't handle that properly... Tim Brock Palen wrote: Ok so I ask the mpirun masters how would you do the following: I submit a job with torque (we use --with-tm) like the following: nodes=4:ppn=2 My desired outcome is to have 1 mpi process per 2 cpus and use threaded blas (or my own OpenMP take your pick) Our cluster has some 4 core machines thus the above job sometimes ends up looking like nodes=1:ppn=4+nodes=2:ppn=2 The mpirun -bynode command will work in the case i get 4 nodes with only 2 cpus free. But if any machine other than the first machine is my node with 4 cores free given to me by moab, I would end up starting a extra process on the first node, where mpirun thinks another cpu is free, but that cpu is really to be used by OpenMP, and that the last process should be placed on the node that has 4 cpus free. I hope that wasn't to confusing, Its how to i launch hybrid jobs and make sure the process started by mpirun go where i want when my nodes have different core counts, and I am running via torque so using -H wont work, Also would prefer that all processes be started via TM. Is this posable ? Brock Palen Center for Advanced Computing bro...@umich.edu (734)936-1985 ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Bad behavior in Allgatherv when a count is 0
Kenneth, Have you tried the 1.2.5 version? There were some fixes to the vector collectives that could have resolved your problem. On Feb 4, 2008 5:36 PM, George Bosilca wrote: > Kenneth, > > I cannot replicate this weird behavior with the current version in the > trunk. I guess it has been fixed since 1.2.4. > >Thanks, > george. > > > On Dec 13, 2007, at 6:58 PM, Moreland, Kenneth wrote: > > > I have found that on rare occasion Allgatherv fails to pass the data > > to > > all processes. Given some magical combination of receive counts and > > displacements, one or more processes are missing some or all of some > > arrays in their receive buffer. A necessary, but not sufficient, > > condition seems to be that one of the receive counts is 0. Beyond > > that > > I have not figured out any real pattern, but the example program > > listed > > below demonstrates the failure. I have tried it on OpenMPI version > > 1.2.3 and 1.2.4; it fails on both. However, it works fine with > > version > > 1.1.2, so the problem must have been introduced since then. > > > > -Ken > > > > Kenneth Moreland > >*** Sandia National Laboratories > > *** > > *** *** *** email: kmo...@sandia.gov > > ** *** ** phone: (505) 844-8919 > >*** fax: (505) 845-0833 > > > > > > > > #include > > > > #include > > #include > > > > int main(int argc, char **argv) > > { > > int rank; > > int size; > > MPI_Comm smallComm; > > int senddata[5], recvdata[100]; > > int lengths[3], offsets[3]; > > int i, j; > > > > MPI_Init(&argc, &argv); > > > > MPI_Comm_rank(MPI_COMM_WORLD, &rank); > > MPI_Comm_size(MPI_COMM_WORLD, &size); > > if (size != 3) > >{ > >printf("Need 3 processes."); > >MPI_Abort(MPI_COMM_WORLD, 1); > >} > > > > for (i = 0; i < 100; i++) recvdata[i] = -1; > > for (i = 0; i < 5; i++) senddata[i] = rank*10 + i; > > lengths[0] = 5; lengths[1] = 0; lengths[2] = 5; > > offsets[0] = 3; offsets[1] = 9; offsets[2] = 10; > > MPI_Allgatherv(senddata, lengths[rank], MPI_INT, > > recvdata, lengths, offsets, MPI_INT, MPI_COMM_WORLD); > > > > for (i = 0; i < size; i++) > >{ > >for (j = 0; j < lengths[i]; j++) > > { > > if (recvdata[offsets[i]+j] != 10*i+j) > >{ > >printf("%d: Got bad data from rank %d, index %d: %d\n", rank, > > i, > > j, > > recvdata[offsets[i]+j]); > >break; > >} > > } > >} > > > > MPI_Finalize(); > > > > return 0; > > } > > > > > > > > ___ > > users mailing list > > us...@open-mpi.org > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/
Re: [OMPI users] bug in MPI_ACCUMULATE for window offsets > 2**31 - 1 bytes? openmpi v1.2.5
The fix I previously sent to the list has been committed in r17400. Thanks, Tim Tim Prins wrote: Hi Stefan, I was able to verify the problem. Turns out this is a problem with other onesided operations as well. Attached is a simple test case I made in c using MPI_Put that also fails. The problem is that the target count and displacements are both sent as signed 32 bit integers. Then, the receiver multiplies them together and adds them to the window base. However, this multiplication is done using the signed 32 bit integers, which overflows. This is then added to the 64 bit pointer. This, of course, results in a bad address. I have attached a patch against a recent development version that fixes this for me. I am also copying Brian Barrett, who did all the work on the onesided code. Brian: if possible, please take a look at the attached patch and test case. Thanks for the report! Tim Prins Stefan Knecht wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi all, I encounter a problem with the routine MPI_ACCUMULATE trying to sum up MPI_REAL8's on a large memory window with a large offset. My program running (on a single processor, x86_64 architecture) crashes with an error message like: [node14:16236] *** Process received signal *** [node14:16236] Signal: Segmentation fault (11) [node14:16236] Signal code: Address not mapped (1) [node14:16236] Failing at address: 0x2aaa32b16000 [node14:16236] [ 0] /lib64/libpthread.so.0 [0x32e080de00] [node14:16236] [ 1] /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(ompi_mpi_op_sum_double+0x10) [0x2af15530] [node14:16236] [ 2] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_process_op+0x2d7) [0x2aaab1a47257] [node14:16236] [ 3] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so [0x2aaab1a45432] [node14:16236] [ 4] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_passive_unlock+0x93) [0x2aaab1a48243] [node14:16236] [ 5] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so [0x2aaab1a43436] [node14:16236] [ 6] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_progress+0xff) [0x2aaab1a42e0f] [node14:16236] [ 7] /home/stefan/bin/openmpi-1.2.5/lib/libopen-pal.so.0(opal_progress+0x4a) [0x2b3dfa0a] [node14:16236] [ 8] /home/stefan/bin/openmpi-1.2.5/lib/openmpi/mca_osc_pt2pt.so(ompi_osc_pt2pt_module_unlock+0x2a9) [0x2aaab1a48629] [node14:16236] [ 9] /home/stefan/bin/openmpi-1.2.5/lib/libmpi.so.0(PMPI_Win_unlock+0xe1) [0x2af4a291] [node14:16236] [10] /home/stefan/bin/openmpi-1.2.5/lib/libmpi_f77.so.0(mpi_win_unlock_+0x25) [0x2acdd8c5] [node14:16236] [11] /home/stefan/calc/mpi2_test/a.out(MAIN__+0x809) [0x401851] [node14:16236] [12] /home/stefan/calc/mpi2_test/a.out(main+0xe) [0x401bbe] [node14:16236] [13] /lib64/libc.so.6(__libc_start_main+0xf4) [0x32dfc1dab4] [node14:16236] [14] /home/stefan/calc/mpi2_test/a.out [0x400f99] [node14:16236] *** End of error message *** mpirun noticed that job rank 0 with PID 16236 on node node14 exited on signal 11 (Segmentation fault). The relevant part of my FORTRAN source code reads as: ~ program accumulate_test ~ IMPLICIT REAL*8 (A-H,O-Z) ~ include 'mpif.h' ~ INTEGER(KIND=MPI_OFFSET_KIND) MX_SIZE_M C dummy size parameter ~ PARAMETER (MX_SIZE_M = 1 000 000) ~ INTEGER MPIerr, MYID, NPROC ~ INTEGER ITARGET, MY_X_WIN, JCOUNT, JCOUNT_T ~ INTEGER(KIND=MPI_ADDRESS_KIND) MEM_X, MEM_Y ~ INTEGER(KIND=MPI_ADDRESS_KIND) IDISPL_WIN ~ INTEGER(KIND=MPI_ADDRESS_KIND) PTR1, PTR2 ~ INTEGER(KIND=MPI_INTEGER_KIND) ISIZE_REAL8 ~ INTEGER*8 NELEMENT_X, NELEMENT_Y ~ POINTER (PTR1, XMAT(MX_SIZE_M)) ~ POINTER (PTR2, YMAT(MX_SIZE_M)) C ~ CALL MPI_INIT( MPIerr ) ~ CALL MPI_COMM_RANK( MPI_COMM_WORLD, MYID, MPIerr) ~ CALL MPI_COMM_SIZE( MPI_COMM_WORLD, NPROC, MPIerr) C ~ NELEMENT_X = 400 000 000 ~ NELEMENT_Y = 10 000 C ~ CALL MPI_TYPE_EXTENT(MPI_REAL8, ISIZE_REAL8, MPIerr) ~ MEM_X = NELEMENT_X * ISIZE_REAL8 ~ MEM_Y = NELEMENT_Y * ISIZE_REAL8 C C allocate memory C ~ CALL MPI_ALLOC_MEM( MEM_X, MPI_INFO_NULL, PTR1, MPIerr) ~ CALL MPI_ALLOC_MEM( MEM_Y, MPI_INFO_NULL, PTR2, MPIerr) C C fill vectors with 0.0D0 and 1.0D0 C ~ CALL DZERO(XMAT,NELEMENT_X) ~ CALL DONE(YMAT,NELEMENT_Y) C C open memory window C ~ CALL MPI_WIN_CREATE( XMAT, MEM_X, ISIZE_REAL8, ~ & MPI_INFO_NULL, MPI_COMM_WORLD, ~ & MY_X_WIN, MPIerr ) C lock window (MPI_LOCK_SHARED mode) C select target ==> if itarget == myid: no 1-sided communication C ~ ITARGET = MYID ~ CALL MPI_WIN_LOCK( MPI_LOCK_SHARED, ITARGET, MPI_MODE_NOCHECK, ~ & MY_X_WIN, MPIerr) C C transfer data to target ITARGET C ~ JCOUNT_T = 10 000 ~ JCOUNT = JCOUNT_T C set displacement in memory window ~ IDISPL_WIN = 300 0
Re: [OMPI users] openmpi credits for eager messages
What I missed in this whole conversation is that the pieces of text that Ron and Dick are citing are *on the same page* in the MPI spec; they're not disparate parts of the spec that accidentally overlap in discussion scope. Specifically, it says: Resource limitations Any pending communication operation consumes system resources that are limited. Errors may occur when lack of resources prevent the execution of an MPI call. A quality implementation will use a (small) fixed amount of resources for each pending send in the ready or synchronous mode and for each pending receive. However, buffer space may be consumed to store messages sent in standard mode, and must be consumed to store messages sent in buffered mode, when no matching receive is available. The amount of space available for buffering will be much smaller than program data memory on many systems. Then, it will be easy to write programs that overrun available buffer space. ...12 lines down on that page, on the same page, in the same section... Consider a situation where a producer repeatedly produces new values and sends them to a consumer. Assume that the producer produces new values faster than the consumer can consume them. ...skip 2 sentences about buffered sends... If standard sends are used, then the producer will be automatically throttled, as its send operations will block when buffer space is unavailable. I find that to be unambiguous. 1. A loop of MPI_ISENDs on a producer can cause a malloc failure (can't malloc a new MPI_Request), and that's an error. Tough luck. 2. A loop of MPI_SENDs on a producer can run a slow-but-MPI-active consumer out of buffer space if all the incoming messages are queued up (e.g., in the unexpected queue). The language above is pretty clear about this: MPI_SEND on the producer is supposed to block at this point. FWIW: Open MPI does support this mode of operation, as George and Gleb noted (by setting the eager size to 0, therefore forcing *all* sends to be synchronous -- a producer cannot "run ahead" for a while and eventually be throttled when receive buffering is exhausted), but a) it's not the default, and b) it's not documented this way. On Feb 4, 2008, at 1:29 PM, Richard Treumann wrote: Hi Ron - I am well aware of the scaling problems related to the standard send requirements in MPI. I t is a very difficult issue. However, here is what the standard says: MPI 1.2, page 32 lines 29-37 === a standard send operation that cannot complete because of lack of buffer space will merely block, waiting for buffer space to become available or for a matching receive to be posted. This behavior is preferable in many situations. Consider a situation where a producer repeatedly produces new values and sends them to a consumer. Assume that the producer produces new values faster than the consumer can consume them. If buffered sends are used, then a buffer overflow will result. Additional synchronization has to be added to the program so as to prevent this from occurring. If standard sends are used, then the producer will be automatically throttled, as its send operations will block when buffer space is unavailable. If there are people who want to argue that this is unclear or that it should be changed, the MPI Forum can and should take up the discussion. I think this particular wording is pretty clear. The piece of MPI standard wording you quote is somewhat ambiguous: The amount of space available for buffering will be much smaller than program data memory on many systems. Then, it will be easy to write programs that overrun available buffer space. But note that this wording mentions a problem that an application can create but does not say the MPI implementation can fail the job. The language I have pointed to is where the standard says what the MPI implementation must do. The "lack of resource" statement is more about send and receive descriptors than buffer space. If I write a program with an infinite loop of MPI_IRECV postings the standard allows that to fail. Dick Dick Treumann - MPI Team/TCEM IBM Systems & Technology Group Dept 0lva / MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601 Tele (845) 433-7846 Fax (845) 433-8363 users-boun...@open-mpi.org wrote on 02/04/2008 12:24:11 PM: > > > Is what George says accurate? If so, it sounds to me like OpenMPI > > does not comply with the MPI standard on the behavior of eager > > protocol. MPICH is getting dinged in this discussion because they > > have complied with the requirements of the MPI standard. IBM MPI > > also complies with the standard. > > > > If there is any debate about whether the MPI standard does (or > > should) require the behavior I describe below then we should move > > the discussion to the MPI 2.1 Forum and get a clarification. > > [...] > > The MPI S
Re: [OMPI users] Can't compile C++ program with extern "C" { #include mpi.h }
On Wed, 2008-01-30 at 21:21 -0500, Jeff Squyres wrote: > On Jan 30, 2008, at 5:35 PM, Adam C Powell IV wrote: > > > With no reply in a couple of weeks, I'm wondering if my previous > > message > > got dropped. (Then again, my previous message was a couple of weeks > > late in replying to its predecessor...) > > No, it didn't get dropped -- it was exactly your admission of low > priority that had me put this issue as low priority as well. :-) I understand. > > I'm recommending a change to mpi.h which would let C headers > > included by > > C++ programs do: > > #define OMPI_SKIP_MPICXX > > #include > > #undef OMPI_SKIP_MPICXX > > without preventing the C++ headers from being included at another > > time. > > See below for the recommended change. > > I really don't think that's a good solution. The real problem is that > some of Salome's header files are doing things that they should not be > doing (including a C++-safe header file inside extern "C" {}). > > IMHO, the real fix should be to fix the code that is doing the Wrong > Thing. I'm reluctant to provide a subtle workaround in our software > that enables a Wrong Thing -- know what I mean? > > FWIW, I just downloaded HDF5 1.6.6 and I took a [quick] look: it does > indeed look like HDF5's header files are C++-safe. Specifically: they > do not include in an extern "C" block, and all of their > declarations are within extern "C" blocks. Hence, Salome should not > be including inside of an extern "C" block because > is already C++-safe. > > This should fix your problem, right? Sort of, though it will require a large patch to Salomé to get around this, vs. a small patch to OpenMPI to provide a simple workaround to this class a problems. Basically, I'll need to patch every .hh file to make sure it #includes mpi.h or hdf5.h before #including any .h file with an mpi.h or hdf5.h #include in it. Given that Salomé does this, it must have worked with another MPI implementation. And that means that there is likely other software which will try this. As I understand it, your only objection to the change now is "programs shouldn't be doing this", not "this will break something". But then, why wouldn't programs expect to be able to include C headers in a C++ extern C block? Or rather, why shouldn't they be able to do so with mpi.h -- or hdf5.h, which isn't mpi.h -- when numerous other C header files allow it, possibly including other MPI implementations? After all, it's called mpi.h not mpi.hh or .hxx or mpi_cxx.h, right? And isn't the patched version cleaner, in that it separates the C and C++ prototypes into different #ifdef/#define regions? Thanks for the reply, sorry about the delay in getting back to you. -Adam -- GPG fingerprint: D54D 1AEE B11C CE9B A02B C5DD 526F 01E8 564E E4B6 Engineering consulting with open source tools http://www.opennovation.com/