Eric, here is a patch for the v1.8 series, it fixes a one byte overflow.
valgrind should stop complaining, and assuming this is the root cause of the memory corruption, that could also fix your program. that being said, shared_fp_fname is limited to 255 characters (this is hard coded) so even if it gets truncated to 255 characters (instead of 256), the behavior could be kind of random. /* from ADIOI_Shfp_fname : If the real file is /tmp/thakur/testfile, the shared-file-pointer file will be /tmp/thakur/.testfile.shfp.xxxx, where xxxx is FWIW, xxxx is a random number that takes between 1 and 10 characters could you please give this patch a try and let us know the results ? Cheers, Gilles On 2014/12/15 11:52, Eric Chamberland wrote: > Hi again, > > some new hints that might help: > > 1- With valgrind : If I run the same test case, same data, but > moved to a shorter path+filename, then *valgrind* does *not* > complains!!!!!! > 2- Without valgrind: *Sometimes*, the test case with long > path+filename passes without "segfaulting"! > 3- It seems to happen at the fourth file I try to open using the > following described procedure: > > Also, I was wondering about this: In this 2 processes test case > (running in the same node), I : > > 1- open the file collectively (which resides on the same ssd drive on > my computer) > 2- MPI_File_read_at_all a long int and 3 chars (11 bytes) > 3- stop (because I detect I am not reading my MPIIO file format) > 4- close the file > > A guess (FWIW): Can process rank 0, for example close the file too > quickly, which destroys the string reserved for the filename that is > used by process rank 1 which could be using shared memory on the same > node? > > Thanks, > > Eric > > On 12/14/2014 02:06 PM, Eric Chamberland wrote: >> Hi, >> >> I finally (thanks for fixing oversubscribing) tested with 1.8.4rc3 for >> my problem with collective MPI I/O. >> >> A problem still there. In this 2 processes example, process rank 1 >> dies with segfault while process rank 0 wait indefinitely... >> >> Running with valgrind, I found these errors which may gives hints: >> >> ************************************************* >> Rank 1: >> ************************************************* >> On process rank 1, without valgrind it ends with either a segmentation >> violation or memory corruption or invalide free without valgrind). >> >> But running with valgrind, it tells: >> >> ==16715== Invalid write of size 2 >> ==16715== at 0x4C2E793: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16715== by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) >> ==16715== by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match >> (pml_ob1_recvfrag.c:225) >> ==16715== by 0x2544110C: mca_btl_vader_check_fboxes >> (btl_vader_fbox.h:220) >> ==16715== by 0x25443577: mca_btl_vader_component_progress >> (btl_vader_component.c:695) >> ==16715== by 0x1F5F0F27: opal_progress (opal_progress.c:207) >> ==16715== by 0x1ACB40B3: opal_condition_wait (condition.h:93) >> ==16715== by 0x1ACB4201: ompi_request_wait_completion (request.h:381) >> ==16715== by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) >> ==16715== by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:254) >> ==16715== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16715== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16715== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16715== by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) >> ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16715== Address 0x32ef3e50 is 0 bytes after a block of size 256 >> alloc'd >> ==16715== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16715== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16715== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ... >> ... >> ==16715== Invalid write of size 1 >> ==16715== at 0x4C2E7BB: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16715== by 0x1F60AA91: opal_convertor_unpack (opal_convertor.c:321) >> ==16715== by 0x25AA8DD3: mca_pml_ob1_recv_frag_callback_match >> (pml_ob1_recvfrag.c:225) >> ==16715== by 0x2544110C: mca_btl_vader_check_fboxes >> (btl_vader_fbox.h:220) >> ==16715== by 0x25443577: mca_btl_vader_component_progress >> (btl_vader_component.c:695) >> ==16715== by 0x1F5F0F27: opal_progress (opal_progress.c:207) >> ==16715== by 0x1ACB40B3: opal_condition_wait (condition.h:93) >> ==16715== by 0x1ACB4201: ompi_request_wait_completion (request.h:381) >> ==16715== by 0x1ACB4305: ompi_request_default_wait (req_wait.c:39) >> ==16715== by 0x26BA2FFB: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:254) >> ==16715== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16715== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16715== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16715== by 0x2FE1CC48: ADIOI_Shfp_fname (shfp_fname.c:67) >> ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16715== Address 0x32ef3e60 is 16 bytes after a block of size 256 >> alloc'd >> ==16715== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16715== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16715== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16715== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16715== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16715== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16715== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16715== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16715== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16715== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16715== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16715== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16715== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16715== >> >> ************************************************* >> Rank 0: >> ************************************************* >> >> ==16714== Invalid read of size 1 >> ==16714== at 0x4C2CA74: __strrchr_sse42 (vg_replace_strmem.c:194) >> ==16714== by 0x2FE1CAB7: ADIOI_Shfp_fname (shfp_fname.c:51) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 >> alloc'd >> ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== >> ... >> ==16714== Invalid read of size 1 >> ==16714== at 0x4C2D034: strlen (vg_replace_strmem.c:412) >> ==16714== by 0x2FE1CB81: ADIOI_Shfp_fname (shfp_fname.c:61) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 >> alloc'd >> ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ... >> ==16714== Invalid read of size 2 >> ==16714== at 0x4C2E79E: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) >> ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) >> ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare >> (pml_ob1_sendreq.c:620) >> ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl >> (pml_ob1_sendreq.h:397) >> ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq >> (pml_ob1_sendreq.h:460) >> ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) >> ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:112) >> ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== Address 0x219377d0 is 0 bytes after a block of size 256 >> alloc'd >> ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ... >> ==16714== Invalid read of size 2 >> ==16714== at 0x4C2E790: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) >> ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) >> ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare >> (pml_ob1_sendreq.c:620) >> ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl >> (pml_ob1_sendreq.h:397) >> ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq >> (pml_ob1_sendreq.h:460) >> ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) >> ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:112) >> ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== Address 0x219377d2 is 2 bytes after a block of size 256 >> alloc'd >> ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ... >> ==16714== Invalid read of size 1 >> ==16714== at 0x4C2E7B8: memcpy@@GLIBC_2.14 (vg_replace_strmem.c:915) >> ==16714== by 0x2543FADC: vader_prepare_src (btl_vader_module.c:590) >> ==16714== by 0x25AB17AA: mca_bml_base_prepare_src (bml.h:341) >> ==16714== by 0x25AB4207: mca_pml_ob1_send_request_start_prepare >> (pml_ob1_sendreq.c:620) >> ==16714== by 0x25AA3519: mca_pml_ob1_send_request_start_btl >> (pml_ob1_sendreq.h:397) >> ==16714== by 0x25AA3766: mca_pml_ob1_send_request_start_seq >> (pml_ob1_sendreq.h:460) >> ==16714== by 0x25AA41E1: mca_pml_ob1_isend (pml_ob1_isend.c:171) >> ==16714== by 0x26BA2AF5: ompi_coll_tuned_bcast_intra_generic >> (coll_tuned_bcast.c:112) >> ==16714== by 0x26BA36F7: ompi_coll_tuned_bcast_intra_binomial >> (coll_tuned_bcast.c:385) >> ==16714== by 0x26B94289: ompi_coll_tuned_bcast_intra_dec_fixed >> (coll_tuned_decision_fixed.c:258) >> ==16714== by 0x1ACD55F2: PMPI_Bcast (pbcast.c:110) >> ==16714== by 0x2FE1CBE5: ADIOI_Shfp_fname (shfp_fname.c:63) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ==16714== Address 0x219377e0 is 16 bytes after a block of size 256 >> alloc'd >> ==16714== at 0x4C2C5A4: malloc (vg_replace_malloc.c:296) >> ==16714== by 0x2FE1C78E: ADIOI_Malloc_fn (malloc.c:50) >> ==16714== by 0x2FE1C951: ADIOI_Shfp_fname (shfp_fname.c:25) >> ==16714== by 0x2FDEB493: mca_io_romio_dist_MPI_File_open (open.c:177) >> ==16714== by 0x2FDE3B0D: mca_io_romio_file_open >> (io_romio_file_open.c:40) >> ==16714== by 0x1AD52344: module_init (io_base_file_select.c:455) >> ==16714== by 0x1AD51DFA: mca_io_base_file_select >> (io_base_file_select.c:238) >> ==16714== by 0x1ACA582F: ompi_file_open (file.c:130) >> ==16714== by 0x1AD30DA3: PMPI_File_open (pfile_open.c:94) >> ==16714== by 0x13F9B36F: >> PAIO::ouvreFichierMPIIO(PAGroupeProcessus&, std::string const&, int, >> ompi_file_t*&, bool) (PAIO.cc:290) >> ==16714== by 0xCA44252: >> GISLectureEcriture<double>::litGISMPI(std::string, >> GroupeInfoSur<double>&, std::string&) (GISLectureEcriture.icc:411) >> ==16714== by 0xCA23F0D: Champ::importeParallele(std::string const&) >> (Champ.cc:951) >> ==16714== by 0x4D0DEE: main (Test.NormesEtProjectionChamp.cc:789) >> ... >> >> >> I have to precise that with MPICH 3.1.3, I can't reproduce the same >> bad behavior. >> >> Also, the segfault is not always there: running the same code with >> other inputs, gave me trouble-free results with or without valgrind. >> I noticed the problem appears mors frequently with longer "paths". >> >> Please, help! >> >> Thanks, >> >> Eric >> >> ompi_info -all : >> http://www.giref.ulaval.ca/~ericc/ompi_bug/ompi_info.all.184rc3.txt.gz >> config.log: >> http://www.giref.ulaval.ca/~ericc/ompi_bug/config.184rc3.log.gz >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2014/12/25983.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2014/12/25986.php
diff --git a/ompi/mca/io/romio/romio/adio/common/shfp_fname.c b/ompi/mca/io/romio/romio/adio/common/shfp_fname.c index 024ced5..98ec101 100644 --- a/ompi/mca/io/romio/romio/adio/common/shfp_fname.c +++ b/ompi/mca/io/romio/romio/adio/common/shfp_fname.c @@ -2,6 +2,8 @@ /* * * Copyright (C) 1997 University of Chicago. + * Copyright (c) 2014 Research Organization for Information Science + * and Technology (RIST). All rights reserved. * See COPYRIGHT notice in top-level directory. */ @@ -51,7 +53,7 @@ void ADIOI_Shfp_fname(ADIO_File fd, int rank) slash = strrchr(fd->shared_fp_fname, '/'); #endif ADIOI_Strncpy(slash + 1, ".", 2); - len = 256 - (slash+2 - fd->shared_fp_fname); + len = 255 - (slash+2 - fd->shared_fp_fname); ADIOI_Strncpy(slash + 2, ptr + 1, len); }