Re: [OMPI users] Problem with openmpi and infiniband
Another thing to try is a change that we made late in the Open MPI v1.2 series with regards to IB: http://www.open-mpi.org/faq/?category=openfabrics#v1.2-use-early-completion On Dec 24, 2008, at 10:07 PM, Tim Mattox wrote: For your runs with Open MPI over InfiniBand, try using openib,sm,self for the BTL setting, so that shared memory communications are used within a node. It would give us another datapoint to help diagnose the problem. As for other things we would need to help diagnose the problem, please follow the advice on this FAQ entry, and the help page: http://www.open-mpi.org/faq/?category=openfabrics#ofa-troubleshoot http://www.open-mpi.org/community/help/ On Wed, Dec 24, 2008 at 5:55 AM, Biagio Lucini wrote: Pavel Shamis (Pasha) wrote: Biagio Lucini wrote: Hello, I am new to this list, where I hope to find a solution for a problem that I have been having for quite a longtime. I run various versions of openmpi (from 1.1.2 to 1.2.8) on a cluster with Infiniband interconnects that I use and administer at the same time. The openfabric stac is OFED-1.2.5, the compilers gcc 4.2 and Intel. The queue manager is SGE 6.0u8. Do you use OpenMPI version that is included in OFED ? Did you was able to run basic OFED/OMPI tests/benchmarks between two nodes ? Hi, yes to both questions: the OMPI version is the one that comes with OFED (1.1.2-1) and the basic tests run fine. For instance, IMB-MPI1 (which is more than basic, as far as I can see) reports for the last test: #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 100022.9322.9522.94 for the openib,self btl (6 processes, all processes on different nodes) and #--- # Benchmarking Barrier # #processes = 6 #--- #repetitions t_min[usec] t_max[usec] t_avg[usec] 1000 191.30 191.42 191.34 for the tcp,self btl (same test) No anomalies for other tests (ping-pong, all-to-all etc.) Thanks, Biagio -- = Dr. Biagio Lucini Department of Physics, Swansea University Singleton Park, SA2 8PP Swansea (UK) Tel. +44 (0)1792 602284 = ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Tim Mattox, Ph.D. - http://homepage.mac.com/tmattox/ tmat...@gmail.com || timat...@open-mpi.org I'm a bright... http://www.the-brights.net/ ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] sending message to the source(0) from other processors
FWIW: you might want to take an MPI tutorial; they're really helpful for learning MPI's capabilities and how to use the primitives. The NCSA has 2 excellent MPI tutorials (intro and advanced); they both require free registration: http://ci-tutor.ncsa.uiuc.edu/login.php On Dec 24, 2008, at 10:52 PM, Win Than Aung wrote: I got the solution. I just need to set the appropriate tag to send and receive. sorry for asking thanks winthan On Wed, Dec 24, 2008 at 10:36 PM, Win Than Aung wrote: thanks Eugene for your example, it helps me a lot. I bump into one more problems lets say I have the file content as follow I have total of six files which all contain real and imaginary value. " 1.001212 1.0012121 //0th 1.001212 1.0012121 //1st 1.001212 1.0012121 //2nd 1.001212 1.0012121 //3rd 1.001212 1.0012121 //4th 1.001212 1.0012121 //5th 1.001212 1.0012121 //6th " char send_buffer[1000]; i use "mpirun -np 6 a.out" which mean i let each processor get access to one file each processor will add "0th and 2nd"(even values) (those values will be sent to root processor and save as file_even_add.dat" and also each processor will add "1st and 3rd"(odd values) (those values will be sent to root processor(here is 0) and saved as "file_odd_add.dat". char recv_buffer[1000]; File* file_even_dat; File* file_odd_dat; if(mpi_my_id == root) { filepteven = fopen("C:\\fileeven.dat"); fileptodd = fopen("C:\\fileodd.dat"); int peer =0; for(peer =0;peer MPI_Recv (recv_buffer,MAX_STR_LEN,MPI_BYTE,MPI_ANY_TAG,MPI_COMM_WORLD,&status); } fprintf(filepteven, "%s \n" ,recv_buffer); } } My question is how do i know which sentbuffer has even add values and which sentbuffer has odd add values? in which order did they get sent? thanks winthan On Tue, Dec 23, 2008 at 3:53 PM, Eugene Loh wrote: Win Than Aung wrote: thanks for your reply jeff so i tried following #include #include int main(int argc, char **argv) { int np, me, sbuf = -1, rbuf = -2,mbuf=1000; int data[2]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Comm_rank(MPI_COMM_WORLD,&me); if ( np < 2 ) MPI_Abort(MPI_COMM_WORLD,-1); if ( me == 1 ) MPI_Send(&sbuf,1,MPI_INT,0,344,MPI_COMM_WORLD); if(me==2) MPI_Send( &mbuf,1,MPI_INT,0,344,MPI_COMM_WORLD); if ( me == 0 ) { MPI_Recv(data,2,MPI_INT,MPI_ANY_SOURCE, 344,MPI_COMM_WORLD,MPI_STATUS_IGNORE); } MPI_Finalize(); return 0; } it can successfuly receive the one sent from processor 1(me==1) but it failed to receive the one sent from processor 2(me==2) mpirun -np 3 hello There is only one receive, so it receives only one message. When you specify the element count for the receive, you're only specifying the size of the buffer into which the message will be received. Only after the message has been received can you inquire how big the message actually was. Here is an example: % cat a.c #include #include int main(int argc, char **argv) { int np, me, peer, value; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); MPI_Comm_rank(MPI_COMM_WORLD,&me); value = me * me + 1; if ( me == 0 ) { for ( peer = 0; peer < np; peer++ ) { if ( peer != 0 ) MPI_Recv(&value,1,MPI_INT,peer, 343,MPI_COMM_WORLD,MPI_STATUS_IGNORE); printf("peer %d had value %d\n", peer, value); } } else MPI_Send(&value,1,MPI_INT,0,343,MPI_COMM_WORLD); MPI_Finalize(); return 0; } % mpirun -np 3 a.out peer 0 had value 1 peer 1 had value 2 peer 2 had value 5 % Alternatively, #include #include #define MAXNP 1024 int main(int argc, char **argv) { int np, me, peer, value, values[MAXNP]; MPI_Init(&argc,&argv); MPI_Comm_size(MPI_COMM_WORLD,&np); if ( np > MAXNP ) MPI_Abort(MPI_COMM_WORLD,-1); MPI_Comm_rank(MPI_COMM_WORLD,&me); value = me * me + 1; MPI_Gather(&value, 1, MPI_INT, values, 1, MPI_INT, 0, MPI_COMM_WORLD); if ( me == 0 ) for ( peer = 0; peer < np; peer++ ) printf("peer %d had value %d\n", peer, values[peer]); MPI_Finalize(); return 0; } % mpirun -np 3 a.out peer 0 had value 1 peer 1 had value 2 peer 2 had value 5 % Which is better? Up to you. The collective routines (like MPI_Gather) do offer MPI implementors (like people developing Open MPI) the opportunity to perform special optimizations (e.g., gather using a binary tree instead of having the root process perform so many receives). ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres Cisco Systems
Re: [OMPI users] Relocating an Open MPI installation using OPAL_PREFIX
It's quite possible that we don't handle this situation properly. Won't you need to libdir's (one for the 32 bit OMPI executables, and one for the 64 bit MPI apps)? On Dec 23, 2008, at 3:58 PM, Ethan Mallove wrote: I think the problem is that I am doing a multi-lib build. I have 32-bit libraries in lib/, and 64-bit libraries in lib/64. I assume I do not see the issue for 32-bit tests, because all the dependencies are where Open MPI expects them to be. For the 64-bit case, I tried setting OPAL_LIBDIR to /opt/openmpi-relocated/lib/lib64, but no luck. Given the below configure arguments, what do my OPAL_* env vars need to be? (Also, could using --enable-orterun-prefix-by-default interfere with OPAL_PREFIX?) $ ./configure CC=cc CXX=CC F77=f77 FC=f90 --with-openib -- without-udapl --disable-openib-ibcm --enable-heterogeneous --enable- cxx-exceptions --enable-shared --enable-orterun-prefix-by-default -- with-sge --enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi- threads --disable-progress-threads --disable-debug CFLAGS="-m32 - xO5" CXXFLAGS="-m32 -xO5" FFLAGS="-m32 -xO5" FCFLAGS="-m32 -xO5" -- prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- tarball-testing/installs/DGQx/install --mandir=/workspace/em162155/ hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ install/man --libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct- v20z-12/ompi-tarball-testing/installs/DGQx/install/lib --includedir=/ workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- testing/installs/DGQx/install/include --without-mx --with-tm=/ws/ ompi-tools/orte/torque/current/shared-install32 --with-contrib-vt- flags="--prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v! 20z-12/ompi-tarball-testing/installs/DGQx/install --mandir=/ workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- testing/installs/DGQx/install/man --libdir=/workspace/em162155/hpc/ mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ install/lib --includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct- v20z-12/ompi-tarball-testing/installs/DGQx/install/include LDFLAGS=- R/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- testing/installs/DGQx/install/lib" $ ./confgiure CC=cc CXX=CC F77=f77 FC=f90 --with-openib -- without-udapl --disable-openib-ibcm --enable-heterogeneous --enable- cxx-exceptions --enable-shared --enable-orterun-prefix-by-default -- with-sge --enable-mpi-f90 --with-mpi-f90-size=small --disable-mpi- threads --disable-progress-threads --disable-debug CFLAGS="-m64 - xO5" CXXFLAGS="-m64 -xO5" FFLAGS="-m64 -xO5" FCFLAGS="-m64 -xO5" -- prefix=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- tarball-testing/installs/DGQx/install --mandir=/workspace/em162155/ hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ install/man --libdir=/workspace/em162155/hpc/mtt-scratch/burl-ct- v20z-12/ompi-tarball-testing/installs/DGQx/install/lib/lib64 -- includedir=/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi- tarball-testing/installs/DGQx/install/include/64 --without-mx --with- tm=/ws/ompi-tools/orte/torque/current/shared-install64 --with- contrib-vt-flags="--prefix=/workspace/em162155/hpc/mtt-scratch/! burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install --mandir=/ workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ompi-tarball- testing/installs/DGQx/install/man --libdir=/workspace/em162155/hpc/ mtt-scratch/burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/ install/lib/lib64 --includedir=/workspace/em162155/hpc/mtt-scratch/ burl-ct-v20z-12/ompi-tarball-testing/installs/DGQx/install/include/ 64 LDFLAGS=-R/workspace/em162155/hpc/mtt-scratch/burl-ct-v20z-12/ ompi-tarball-testing/installs/DGQx/install/lib" --disable-binaries -Ethan On Dec 22, 2008, at 12:42 PM, Ethan Mallove wrote: Can anyone get OPAL_PREFIX to work on Linux? A simple test is to see if the following works for any mpicc/mpirun: $ mv /tmp/foo $ set OPAL_PREFIX /tmp/foo $ mpicc ... $ mpirun ... If you are able to get the above to run successfully, I'm interested in your config.log file. Thanks, Ethan On Thu, Dec/18/2008 11:03:25AM, Ethan Mallove wrote: Hello, The below FAQ lists instructions on how to use a relocated Open MPI installation: http://www.open-mpi.org/faq/?category=building#installdirs On Solaris, OPAL_PREFIX and friends (documented in the FAQ) work for me with both MPI (hello_c) and non-MPI (hostname) programs. On Linux, I can only get the non-MPI case to work. Here are the environment variables I am setting: $ cat setenv_opal_prefix.csh set opal_prefix = "/opt/openmpi-relocated" setenv OPAL_PREFIX $opal_prefix setenv OPAL_BINDIR $opal_prefix/bin setenv OPAL_SBINDIR$opal_prefix/sbin setenv OPAL_DATAROOTDIR$opal_prefix/share setenv OPAL_SYSCONFDIR $opal_prefix/etc setenv OPAL_SHAREDSTATEDIR $opal_prefix/com setenv OPAL_LOCALSTATEDIR $opal_prefix/var setenv O