Sorry for the long delay. Unfortunately, I am no longer able to reproduce the Valgrind errors I reported earlier with either the debug version or the normally-compiled version of OMPI 1.8.7. I don’t know what happened - probably some change to our cluster infrastructure that I am not aware of and that I am not able to track down. Sorry for having wasted your collective time on this; if this error should arise again, I will try to get a proper Valgrind report with -enable-debug and report it here.
Michael > On 30 Jul 2015, at 22:10 , Nathan Hjelm <hje...@lanl.gov> wrote: > > > I agree with Ralph. Please run again with --enable-debug. That will give > more information (line number) on where the error is occuring. > > Looking at the function in question the only place I see that could be > causing this warning is the call to strlen. Some implementations of > strlen use operate on larger chunks (4 or 8 bytes). This will make > valgrind unhappy but does not make the implementation invalid as no read > will cross a page boundary (so no SEGV). One example of such a strlen > implementation is the one used by icc which uses vector operations on > 8-byte chunks of the string. > > -Nathan > > On Wed, Jul 29, 2015 at 07:58:09AM -0700, Ralph Castain wrote: >> If you have the time, it would be helpful. You might also configure >> -enable-debug. >> Meantime, I can take another gander to see how it could happen - looking >> at the code, it sure seems impossible, but maybe there is some strange >> path that would break it. >> >> On Jul 29, 2015, at 6:29 AM, Schlottke-Lakemper, Michael >> <m.schlottke-lakem...@aia.rwth-aachen.de> wrote: >> If it is helpful, I can try to compile OpenMPI with debug information >> and get more details on the reported error. However, it would be good if >> someone could tell me the necessary compile flags (on top of -O0 -g) and >> it would take me probably 1-2 weeks to do it. >> Michael >> >> -------- Original message -------- >> From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com> >> Date: 29/07/2015 14:17 (GMT+01:00) >> To: Open MPI Users <us...@open-mpi.org> >> Subject: Re: [OMPI users] Invalid read of size 4 (Valgrind error) with >> OpenMPI 1.8.7 >> >> Thomas, >> can you please elaborate ? >> I checked the code of opal_os_dirpath_create and could not find where >> such a thing can happen >> Thanks, >> Gilles >> On Wednesday, July 29, 2015, Thomas Jahns <ja...@dkrz.de> wrote: >> >> Hello, >> >> On 07/28/15 17:34, Schlottke-Lakemper, Michael wrote: >> >> That's what I suspected. Thank you for your confirmation. >> >> you are mistaken, the allocation is 51 bytes long, i.e. valid bytes >> are at offsets 0 to 50. But since the read of 4 bytes starts at offset >> 48, the bytes at offsets 48, 49, 50 and 51 get read, the last of which >> is illegal. It probably does no harm at the moment in practice, >> because virtually all allocators always add some padding to the next >> multiple of some power of 2. But still this means the program is >> incorrect in terms of any programming language definition involved >> (might be C, C++ or Fortran). >> >> Regards, Thomas >> >> On 25 Jul 2015, at 16:10 , Ralph Castain <r...@open-mpi.org >> <mailto:r...@open-mpi.org>> wrote: >> >> Looks to me like a false positive - we do malloc some space, and >> do access >> different parts of it. However, it looks like we are inside the >> space at all >> times. >> >> I'd suppress it >> >> On Jul 23, 2015, at 12:47 AM, Schlottke-Lakemper, Michael >> <m.schlottke-lakem...@aia.rwth-aachen.de >> <mailto:m.schlottke-lakem...@aia.rwth-aachen.de>> wrote: >> >> Hi folks, >> >> recently we've been getting a Valgrind error in PMPI_Init for >> our suite of >> regression tests: >> >> ==5922== Invalid read of size 4 >> ==5922== at 0x61CC5C0: opal_os_dirpath_create (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2) >> ==5922== by 0x5F207E5: orte_session_dir (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x5F34F04: orte_ess_base_app_setup (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x7E96679: rte_init (in >> /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so) >> ==5922== by 0x5F12A77: orte_init (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x509883C: ompi_mpi_init (in >> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) >> ==5922== by 0x50B843A: PMPI_Init (in >> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) >> ==5922== by 0xEBA79C: ZFS::run() (in >> >> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) >> ==5922== by 0x4DC243: main (in >> >> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) >> ==5922== Address 0x710f670 is 48 bytes inside a block of size >> 51 alloc'd >> ==5922== at 0x4C29110: malloc (in >> /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so) >> ==5922== by 0x61CC572: opal_os_dirpath_create (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2) >> ==5922== by 0x5F207E5: orte_session_dir (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x5F34F04: orte_ess_base_app_setup (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x7E96679: rte_init (in >> /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so) >> ==5922== by 0x5F12A77: orte_init (in >> /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6) >> ==5922== by 0x509883C: ompi_mpi_init (in >> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) >> ==5922== by 0x50B843A: PMPI_Init (in >> /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2) >> ==5922== by 0xEBA79C: ZFS::run() (in >> >> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) >> ==5922== by 0x4DC243: main (in >> >> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production) >> ==5922== >> >> What is weird is that it seems to depend on the pbs/torque >> session we're in: >> sometimes the error does not occur and all and all tests run >> fine (this is in >> fact the only Valgrind error we're having at the moment). Other >> times every >> single test we're running has this error. >> >> Has anyone seen this or might be able to offer an explanation? >> If it is a >> false-positive, I'd be happy to suppress it :) >> >> Thanks a lot in advance >> >> Michael >> >> P.S.: This error is not covered/suppressed by the default ompi >> suppression >> file in $PREFIX/share/openmpi. >> >> -- >> Michael Schlottke-Lakemper >> >> SimLab Highly Scalable Fluids & Solids Engineering >> Ju:lich Aachen Research Alliance (JARA-HPC) >> RWTH Aachen University >> Wu:llnerstrasse 5a >> 52062 Aachen >> Germany >> >> Phone: +49 (241) 80 95188 >> Fax: +49 (241) 80 92257 >> Mail: m.schlottke-lakem...@aia.rwth-aachen.de >> <mailto:m.schlottke-lakem...@aia.rwth-aachen.de> >> Web: http://www.jara.org/jara-hpc >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27303.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org <mailto:us...@open-mpi.org> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27328.php >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27348.php >> >> -- >> Thomas Jahns >> HD(CP)^2 >> Abteilung Anwendungssoftware >> >> Deutsches Klimarechenzentrum GmbH >> Bundesstrasse 45a o D-20146 Hamburg o Germany >> >> Phone: +49 40 460094-151 >> Fax: +49 40 460094-270 >> Email: Thomas Jahns <ja...@dkrz.de> >> URL: www.dkrz.de >> >> Gescha:ftsfu:hrer: Prof. Dr. Thomas Ludwig >> Sitz der Gesellschaft: Hamburg >> Amtsgericht Hamburg HRB 39784 >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27359.php > >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users >> Link to this post: >> http://www.open-mpi.org/community/lists/users/2015/07/27360.php > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2015/07/27362.php
signature.asc
Description: Message signed with OpenPGP using GPGMail