Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-14 Thread George Bosilca
Alex, The code looks good, and is 100% MPI standard accurate. I would change the way you create the subcoms in the parent. You do a lot of useless operations, as you can achieve exactly the same outcome (one communicator per node), either by duplicating MPI_COMM_SELF or doing an MPI_Comm_split wi

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi Gilles, ok I patched the file, without valgrind it exploded at MPI_File_close: *** Error in `/pmi/cmpbib/compilation_BIB_gcc-4.5.1_64bit/COMPILE_AUTO/GIREF/bin/Test.NormesEtProjectionChamp.dev': free(): invalid next size (normal): 0x04b6c950 *** === Backtrace: = /lib64

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, here is a patch for the v1.8 series, it fixes a one byte overflow. valgrind should stop complaining, and assuming this is the root cause of the memory corruption, that could also fix your program. that being said, shared_fp_fname is limited to 255 characters (this is hard coded) so even if

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
On 12/14/2014 09:55 PM, Gilles Gouaillardet wrote: Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). Oh my god! Is it that simple? By the way, my filename is shorter thant 256, but the whole path is: echo "/pmi/cmpbib/compilation_BIB_gcc-4.5.

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, i checked the source code (v1.8) and the limit for the shared_fp_fname is 256 (hard coded). i am now checking if the overflow is correctly detected (that could explain the one byte overflow reported by valgrind) Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: > Hi again, > >

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi again, some new hints that might help: 1- With valgrind : If I run the same test case, same data, but moved to a shorter path+filename, then *valgrind* does *not* complains!! 2- Without valgrind: *Sometimes*, the test case with long path+filename passes without "segfaulting"! 3- It

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi Gilles, On 12/14/2014 09:20 PM, Gilles Gouaillardet wrote: Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? I would like to, but the complete app is big (and not public), is on top of PETSc with mkl, and in C++... :-( I can

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Gilles Gouaillardet
Eric, can you make your test case (source + input file + howto) available so i can try to reproduce and fix this ? Based on the stack trace, i assume this is a complete end user application. have you tried/been able to reproduce the same kind of crash with a trimmed test program ? BTW, what kind

Re: [OMPI users] OpenMPI 1.8.4rc3, 1.6.5 and 1.6.3: segmentation violation in mca_io_romio_dist_MPI_File_close

2014-12-14 Thread Eric Chamberland
Hi, I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for my problem with collective MPI I/O. A problem still there. In this 2 processes example, process rank 1 dies with segfault while process rank 0 wait indefinitely... Running with valgrind, I found these errors which m

Re: [OMPI users] OMPI users] OMPI users] OMPI users] MPI inside MPI (still)

2014-12-14 Thread Alex A. Schmidt
Hi, Sorry, guys. I don't think the newbie here can follow any discussion beyond basic mpi... Anyway, if I add the pair call MPI_COMM_GET_PARENT(mpi_comm_parent,ierror) call MPI_COMM_DISCONNECT(mpi_comm_parent,ierror) on the spawnee side I get the proper response in the spawning processes. Plea