[OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions
Hi. We've run into an IO issue with 1.4.1 and earlier versions. We're able to reproduce the issue in around 120 lines of code to help, I'd like to find if there's something we're simply doing incorrectly with the build or if it's in fact a known bug. I've included the following in order: 1. Configure options used on all versions tested 2. Successful run on 1.4.3 3. Failed run on 1.3.1 4. Failed run on 1.4.1 5. Source code of test 6. ompi_info We're running this on a single node with 2 processes. An additional thing to note is we can load the 1.4.2 or 1.4.3 environment and successfully run the 1.4.1 or 1.3.1 executable. Thanks. Steve 1. ./configure --prefix=/share/apps/openmpi/1.4.1/intel-12 --with-tm=/opt/torque --enable-debug --with-openib --with-wrapper-cflags="-shared-intel" --with-wrapper-cxxflags="-shared-intel" --with-wrapper-fflags="-shared-intel" --with-wrapper-fcflags="-shared-intel" 2. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.3 10 iotest running on mpi_size: 2 writing 10 ints to file iotest.dat... rank 0 writing: 0 to 4 rank 1 writing: 5 to 9 reading 10 ints from file iotest.dat... just read: 0 0 just read: 1 1 just read: 2 2 just read: 3 3 just read: 4 4 just read: 5 5 just read: 6 6 just read: 7 7 just read: 8 8 just read: 9 9 File looks good. 3. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.3.1 100 iotest running on mpi_size: 2 writing 100 ints to file iotest.dat... rank 0 writing: 0 to 49 rank 1 writing: 50 to 99 reading 100 ints from file iotest.dat... just read: 0 50 iotest.openmpi-1.3.1: iotest.cpp:105: int main(int, char**): Assertion `ibuf == i' failed. [compute-1-1:18731] *** Process received signal *** [compute-1-1:18731] Signal: Aborted (6) [compute-1-1:18731] Signal code: (-6) [compute-1-1:18731] [ 0] /lib64/libpthread.so.0 [0x357800e7c0] [compute-1-1:18731] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265] [compute-1-1:18731] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10] [compute-1-1:18731] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6] [compute-1-1:18731] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.3.1(main+0x3db) [0x408e7f] [compute-1-1:18731] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994] [compute-1-1:18731] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.3.1(__gxx_personality_v0+0x139) [0x408989] [compute-1-1:18731] *** End of error message *** -- mpiexec noticed that process rank 0 with PID 18731 on node compute-1-1.local exited on signal 6 (Aborted). -- 4. [smjones@compute-1-1 ~]$ mpiexec codes/cti/tests/iotest/iotest.openmpi-1.4.1 100 iotest running on mpi_size: 2 writing 100 ints to file iotest.dat... rank 1 writing: 50 to 99 rank 0 writing: 0 to 49 reading 100 ints from file iotest.dat... just read: 0 50 iotest.openmpi-1.4.1: iotest.cpp:105: int __unixcall main(int, char **): Assertion `ibuf == i' failed. [compute-1-1:19057] *** Process received signal *** [compute-1-1:19057] Signal: Aborted (6) [compute-1-1:19057] Signal code: (-6) [compute-1-1:19057] [ 0] /lib64/libpthread.so.0 [0x357800e7c0] [compute-1-1:19057] [ 1] /lib64/libc.so.6(gsignal+0x35) [0x3577830265] [compute-1-1:19057] [ 2] /lib64/libc.so.6(abort+0x110) [0x3577831d10] [compute-1-1:19057] [ 3] /lib64/libc.so.6(__assert_fail+0xf6) [0x35778296e6] [compute-1-1:19057] [ 4] codes/cti/tests/iotest/iotest.openmpi-1.4.1(main+0x472) [0x401ab2] [compute-1-1:19057] [ 5] /lib64/libc.so.6(__libc_start_main+0xf4) [0x357781d994] [compute-1-1:19057] [ 6] codes/cti/tests/iotest/iotest.openmpi-1.4.1(__gxx_personality_v0+0x41) [0x401589] [compute-1-1:19057] *** End of error message *** -- mpiexec noticed that process rank 0 with PID 19057 on node compute-1-1.local exited on signal 6 (Aborted). -- 5. [smjones@frontend iotest]$ cat iotest.cpp #include #include #include #include using std::cout; using std::cerr; using std::endl; // iotest // This simple test reproduces a problem with writing in MPI_Type_indexed in openmpi. // int main(int argc,char * argv[]) { MPI_Init(&argc,&argv); int mpi_size; MPI_Comm_size(MPI_COMM_WORLD, &mpi_size); int mpi_rank; MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank); if (mpi_rank == 0) cout << "iotest running on mpi_size: " << mpi_size << endl; if (argc != 2) { if (mpi_rank == 0) cout << "\n\nUsage: \n\nmpirun -np X iotest \n\n" << endl; MPI_Finalize(); return(-1); } // how many ints to write... int n = atoi(argv[1]); if (mpi_rank == 0) cout << "writing " << n << " ints to file iotest.dat..." << endl; // everybody figure out their local offset and size... int my_disp = mpi_rank*n/mpi_size; int my_n = (mpi_rank+1)*n/mpi_size - my_
Re: [OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions
- Original Message - > On Mon, Sep 12, 2011 at 07:44:25PM -0700, Steve Jones wrote: > > Hi. > > > > We've run into an IO issue with 1.4.1 and earlier versions. We're > > able to reproduce the issue in around 120 lines of code to help. > > Hi Steve Jones > > I'm the ROMIO maintainer, and always looking for ways to improve our > test coverage. > > While it looks like this workload has been fixed in recent versions of > code, I'd like to include your test case to help us catch any > regressions we might introduce down the line. I'd change it to be > straight c and have rank 0 read back the file with MPI_File_read. > > ==rob Hi Rob, Thanks for the quick reply. I've copied Frank Ham, our lead developer, as he wrote the test case below. The test was helpful for us to determine which version worked and also to compare against other MPI variants. It's good if the test case helps you out as well. Thanks. Steve > > [smjones@frontend iotest]$ cat iotest.cpp > > #include > > #include > > #include > > #include > > > > using std::cout; > > using std::cerr; > > using std::endl; > > > > // iotest > > // This simple test reproduces a problem with writing in > > MPI_Type_indexed in openmpi. > > // > > > > int main(int argc,char * argv[]) { > > > > MPI_Init(&argc,&argv); > > > > int mpi_size; > > MPI_Comm_size(MPI_COMM_WORLD, &mpi_size); > > > > int mpi_rank; > > MPI_Comm_rank(MPI_COMM_WORLD, &mpi_rank); > > > > if (mpi_rank == 0) > > cout << "iotest running on mpi_size: " << mpi_size << endl; > > > > if (argc != 2) { > > if (mpi_rank == 0) > > cout << "\n\nUsage: \n\nmpirun -np X iotest > > \n\n" << endl; > > MPI_Finalize(); > > return(-1); > > } > > > > // how many ints to write... > > > > int n = atoi(argv[1]); > > if (mpi_rank == 0) > > cout << "writing " << n << " ints to file iotest.dat..." << > > endl; > > > > // everybody figure out their local offset and size... > > > > int my_disp = mpi_rank*n/mpi_size; > > int my_n = (mpi_rank+1)*n/mpi_size - my_disp; > > > > cout << "rank " << mpi_rank << " writing: " << my_disp << " to " > > << my_disp+my_n-1 << endl; > > > > MPI_File fh; > > int ierr = MPI_File_open(MPI_COMM_WORLD,"iotest.dat", > >MPI_MODE_WRONLY | MPI_MODE_CREATE, > >MPI_INFO_NULL,&fh); > > assert(ierr == 0); > > > > // build the type... > > > > MPI_Datatype int_type; > > MPI_Type_indexed(1,&my_n,&my_disp,MPI_INT,&int_type); > > > > > > > > MPI_Type_commit(&int_type); > > > > // fill a buffer of ints with increasing values, starting with our > > offset... > > > > int * buf = new int[my_n]; > > for (int i = 0; i < my_n; ++i) > > buf[i] = my_disp + i; > > > > // set our view into the file... > > > > MPI_Offset offset = 0; > > MPI_File_set_view(fh, offset, MPI_INT, int_type, "native", > > MPI_INFO_NULL); > > > > // and write... > > > > MPI_Status status; > > MPI_File_write_all(fh, buf, my_n, MPI_INT, &status); > > > > // trim the file to the current size and close... > > > > offset += n*sizeof(int); > > MPI_File_set_size(fh,offset); > > MPI_File_close(&fh); > > > > // cleanup... > > > > delete[] buf; > > MPI_Type_free(&int_type); > > > > // --- > > > > // now let rank 0 read the file using standard io and check for > > // correctness... > > > > if (mpi_rank == 0) { > > > > if (mpi_rank == 0) > > cout << "reading " << n << " ints from file iotest.dat..." << > > endl; > > > > FILE * fp = fopen("iotest.dat","rb"); > > for (int i = 0; i < n; ++i) { > > // just read one at a time - ouch! > > int ibuf; > > fread(&ibuf,sizeof(int),1,fp); > > cout << "just read: " << i << " " << ibuf << endl; > &
Re: [OMPI users] IO issue with OpenMPI 1.4.1 and earlier versions
- Original Message - > On Sep 12, 2011, at 10:44 PM, Steve Jones wrote: > > > We've run into an IO issue with 1.4.1 and earlier versions. We're > > able to reproduce the issue in around 120 lines of code to help, I'd > > like to find if there's something we're simply doing incorrectly > > with the build or if it's in fact a known bug. I've included the > > following in order: > > > > 1. Configure options used on all versions tested > > 2. Successful run on 1.4.3 > > 3. Failed run on 1.3.1 > > 4. Failed run on 1.4.1 > > It looks like https://svn.open-mpi.org/trac/ompi/changeset/22888 fixed > a problem with OMPI's ROMIO that was included in 1.4.2. This could > well be the issue. Hi Jeff, It looks like this was the issue. Thanks for pointing me towards it and the information on ABI compatibility. I must not have been following well as I was under the impression we needed to rebuild for each new version of MPI introduced. Talk soon. Steve > Note, however, that MPI-IO-written files are not guaranteed to be > readable outside of MPI-IO. What happens if you read the file back via > MPI-IO? > > > An additional thing to note is we can load the 1.4.2 or 1.4.3 > > environment and successfully run the 1.4.1 or 1.3.1 executable. > > Open MPI's ABI guarantees started at 1.3.2, meaning that any MPI > application executable compiled with 1.3.2 or later should be able to > run with an OMPI environment 1.3.2 all the way through the end of the > 1.4.x series. > > Hence, it is consistent that your 1.4.1 executable works properly when > run in a 1.4.3 environment if the ROMIO fix was deployed in 1.4.2. > > NOTE: Your 1.3.1 executable *may* work with later OMPI environments, > but it is not guaranteed (and I absolutely would not rely on it). > Here's the text in the README about our ABI policy: > > - > Application Binary Interface (ABI) Compatibility > > > Open MPI provided forward application binary interface (ABI) > compatibility for MPI applications starting with v1.3.2. Prior to > that version, no ABI guarantees were provided. > > NOTE: Prior to v1.3.2, subtle and strange failures are almost > guaranteed to occur if applications were compiled and linked > against shared libraries from one version of Open MPI and then > run with another. The Open MPI team strongly discourages making > any ABI assumptions before v1.3.2. > > Starting with v1.3.2, Open MPI provides forward ABI compatibility -- > with respect to the MPI API only -- in all versions of a given feature > release series and its corresponding super stable series. For > example, on a single platform, an MPI application linked against Open > MPI v1.3.2 shared libraries can be updated to point to the shared > libraries in any successive v1.3.x or v1.4 release and still work > properly (e.g., via the LD_LIBRARY_PATH environment variable or other > operating system mechanism). > > Note that in v1.4.4, a fix was applied to the "large" size of the "use > mpi" F90 MPI bindings module: two of MPI_SCATTERV's parameters had the > wrong type and were corrected. Note that this fix *only* applies if > Open MPI was configured with a Fortran 90 compiler and the > --with-mpi-f90-size=large configure option. > > However, in order to preserve ABI with all releases since v1.3.2, the > old/incorrect MPI_SCATTERV interface was preserved and a new/corrected > interface was added (note that Fortran 90 has function overloading, > similar to C++; hence, both the old and new interface can be accessed > via "call MPI_Scatterv(...)"). > > Applications that use the old/incorrect MPI_SCATTERV binding will > continue to compile/link just like they did with releases since > v1.3.2. However, application developers are ***STRONGLY*** encouraged > to fix their applications to use the correct bindings for the > following reasons: > > - The parameter type mismatch may cause application crashes or > silent data corruption. > - An annoying message (which cannot be disabled) is sent to stdout > warning the user that they are using an incorrect interface. > - The old/incorrect interface will be removed in Open MPI v1.7 > (i.e., applications that use the old/incorrect binding will not > compile with Open MPI v1.7). > > Open MPI reserves the right to break ABI compatibility at new feature > release series. For example, the same MPI application from above > (linked against Open MPI v1.3.2 shared libraries) will *not* work with > Open MPI v1.5 shared libraries. > - > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Trouble with OpenMPI and Intel 10.1 compilers
- "Ray Muno" wrote: > Gus Correa wrote: > > Hi Ray and list > > > > I have Intel ifort 10.1.017 on a Rocks 4.3 cluster. > > The OpenMPI compiler wrappers (i.e. "opal_wrapper") work fine, > > and find the shared libraries (Intel or other) without a problem. > > > > My guess is that this is not an OpenMPI problem, but an Intel > compiler > > environment glitch. > > I wonder if your .profile/.tcshrc/.bashrc files initialize the Intel > > > compiler environment properly. > > I.e., "source /share/apps/intel/fce/10.1.018/bin/ifortvars.csh" or > > similar, to get the right > > Intel environment variables inserted on > > PATH, LD_LIBRARY_PATH, MANPATH. and INTEL_LICENSE_FILE. > > > > Not doing this caused trouble for me in the past. > > Double or inconsistent assignment of LD_LIBRARY_PATH and PATH > > (say on the ifortvars.csh and on the user login files) also caused > > conflicts. > > > > I am not sure if this needs to be done before you configure and > install > > OpenMPI, > > but doing it after you build OpenMPI may still be OK. > > > > I hope this helps, > > Gus Correa > > > > That does help. I confirmed that what I added needs to be in the > environment (LD_LIBRARY_PATH). Must have missed that in the docs. I > have now added the appropriate variables to our modules environment. > > Seems strange that OpenMPI built without these being set at all. I > could > also compile test codes with the compilers, just not with mpicc and > mpif90. > Are you adding -i_dynamic to base flags, or something different? Steve