Sorry for the long delay.

Unfortunately, I am no longer able to reproduce the Valgrind errors I reported 
earlier with either the debug version or the normally-compiled version of  OMPI 
1.8.7. I don’t know what happened - probably some change to our cluster 
infrastructure that I am not aware of and that I am not able to track down. 
Sorry for having wasted your collective time on this; if this error should 
arise again, I will try to get a proper Valgrind report with -enable-debug and 
report it here.

Michael

> On 30 Jul 2015, at 22:10 , Nathan Hjelm <hje...@lanl.gov> wrote:
> 
> 
> I agree with Ralph. Please run again with --enable-debug. That will give
> more information (line number) on where the error is occuring.
> 
> Looking at the function in question the only place I see that could be
> causing this warning is the call to strlen. Some implementations of
> strlen use operate on larger chunks (4 or 8 bytes). This will make
> valgrind unhappy but does not make the implementation invalid as no read
> will cross a page boundary (so no SEGV). One example of such a strlen
> implementation is the one used by icc which uses vector operations on
> 8-byte chunks of the string.
> 
> -Nathan
> 
> On Wed, Jul 29, 2015 at 07:58:09AM -0700, Ralph Castain wrote:
>>   If you have the time, it would be helpful. You might also configure
>>   -enable-debug.
>>   Meantime, I can take another gander to see how it could happen - looking
>>   at the code, it sure seems impossible, but maybe there is some strange
>>   path that would break it.
>> 
>>     On Jul 29, 2015, at 6:29 AM, Schlottke-Lakemper, Michael
>>     <m.schlottke-lakem...@aia.rwth-aachen.de> wrote:
>>     If it is helpful, I can try to compile OpenMPI with debug information
>>     and get more details on the reported error. However, it would be good if
>>     someone could tell me the necessary compile flags (on top of -O0 -g) and
>>     it would take me probably 1-2 weeks to do it.
>>     Michael
>> 
>>     -------- Original message --------
>>     From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
>>     Date: 29/07/2015 14:17 (GMT+01:00)
>>     To: Open MPI Users <us...@open-mpi.org>
>>     Subject: Re: [OMPI users] Invalid read of size 4 (Valgrind error) with
>>     OpenMPI 1.8.7
>> 
>>     Thomas,
>>     can you please elaborate ?
>>     I checked the code of opal_os_dirpath_create and could not find where
>>     such a thing can happen
>>     Thanks,
>>     Gilles
>>     On Wednesday, July 29, 2015, Thomas Jahns <ja...@dkrz.de> wrote:
>> 
>>       Hello,
>> 
>>       On 07/28/15 17:34, Schlottke-Lakemper, Michael wrote:
>> 
>>         That's what I suspected. Thank you for your confirmation.
>> 
>>       you are mistaken, the allocation is 51 bytes long, i.e. valid bytes
>>       are at offsets 0 to 50. But since the read of 4 bytes starts at offset
>>       48, the bytes at offsets 48, 49, 50 and 51 get read, the last of which
>>       is illegal. It probably does no harm at the moment in practice,
>>       because virtually all allocators always add some padding to the next
>>       multiple of some power of 2. But still this means the program is
>>       incorrect in terms of any programming language definition involved
>>       (might be C, C++ or Fortran).
>> 
>>       Regards, Thomas
>> 
>>           On 25 Jul 2015, at 16:10 , Ralph Castain <r...@open-mpi.org
>>           <mailto:r...@open-mpi.org>> wrote:
>> 
>>           Looks to me like a false positive - we do malloc some space, and
>>           do access
>>           different parts of it. However, it looks like we are inside the
>>           space at all
>>           times.
>> 
>>           I'd suppress it
>> 
>>             On Jul 23, 2015, at 12:47 AM, Schlottke-Lakemper, Michael
>>             <m.schlottke-lakem...@aia.rwth-aachen.de
>>             <mailto:m.schlottke-lakem...@aia.rwth-aachen.de>> wrote:
>> 
>>             Hi folks,
>> 
>>             recently we've been getting a Valgrind error in PMPI_Init for
>>             our suite of
>>             regression tests:
>> 
>>             ==5922== Invalid read of size 4
>>             ==5922==    at 0x61CC5C0: opal_os_dirpath_create (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
>>             ==5922==    by 0x5F207E5: orte_session_dir (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x7E96679: rte_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
>>             ==5922==    by 0x5F12A77: orte_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x509883C: ompi_mpi_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>>             ==5922==    by 0x50B843A: PMPI_Init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>>             ==5922==    by 0xEBA79C: ZFS::run() (in
>>             
>> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>>             ==5922==    by 0x4DC243: main (in
>>             
>> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>>             ==5922==  Address 0x710f670 is 48 bytes inside a block of size
>>             51 alloc'd
>>             ==5922==    at 0x4C29110: malloc (in
>>             /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>>             ==5922==    by 0x61CC572: opal_os_dirpath_create (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
>>             ==5922==    by 0x5F207E5: orte_session_dir (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x7E96679: rte_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
>>             ==5922==    by 0x5F12A77: orte_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>>             ==5922==    by 0x509883C: ompi_mpi_init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>>             ==5922==    by 0x50B843A: PMPI_Init (in
>>             /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>>             ==5922==    by 0xEBA79C: ZFS::run() (in
>>             
>> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>>             ==5922==    by 0x4DC243: main (in
>>             
>> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>>             ==5922==
>> 
>>             What is weird is that it seems to depend on the pbs/torque
>>             session we're in:
>>             sometimes the error does not occur and all and all tests run
>>             fine (this is in
>>             fact the only Valgrind error we're having at the moment). Other
>>             times every
>>             single test we're running has this error.
>> 
>>             Has anyone seen this or might be able to offer an explanation?
>>             If it is a
>>             false-positive, I'd be happy to suppress it :)
>> 
>>             Thanks a lot in advance
>> 
>>             Michael
>> 
>>             P.S.: This error is not covered/suppressed by the default ompi
>>             suppression
>>             file in $PREFIX/share/openmpi.
>> 
>>             --
>>             Michael Schlottke-Lakemper
>> 
>>             SimLab Highly Scalable Fluids & Solids Engineering
>>             Ju:lich Aachen Research Alliance (JARA-HPC)
>>             RWTH Aachen University
>>             Wu:llnerstrasse 5a
>>             52062 Aachen
>>             Germany
>> 
>>             Phone: +49 (241) 80 95188
>>             Fax: +49 (241) 80 92257
>>             Mail: m.schlottke-lakem...@aia.rwth-aachen.de
>>             <mailto:m.schlottke-lakem...@aia.rwth-aachen.de>
>>             Web: http://www.jara.org/jara-hpc
>> 
>>             _______________________________________________
>>             users mailing list
>>             us...@open-mpi.org <mailto:us...@open-mpi.org>
>>             Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>             Link to this post:
>>             http://www.open-mpi.org/community/lists/users/2015/07/27303.php
>> 
>>           _______________________________________________
>>           users mailing list
>>           us...@open-mpi.org <mailto:us...@open-mpi.org>
>>           Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>           Link to this post:
>>           http://www.open-mpi.org/community/lists/users/2015/07/27328.php
>> 
>>         _______________________________________________
>>         users mailing list
>>         us...@open-mpi.org
>>         Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>         Link to this post:
>>         http://www.open-mpi.org/community/lists/users/2015/07/27348.php
>> 
>>       --
>>       Thomas Jahns
>>       HD(CP)^2
>>       Abteilung Anwendungssoftware
>> 
>>       Deutsches Klimarechenzentrum GmbH
>>       Bundesstrasse 45a  o  D-20146 Hamburg  o  Germany
>> 
>>       Phone:  +49 40 460094-151
>>       Fax:    +49 40 460094-270
>>       Email:  Thomas Jahns <ja...@dkrz.de>
>>       URL:    www.dkrz.de
>> 
>>       Gescha:ftsfu:hrer: Prof. Dr. Thomas Ludwig
>>       Sitz der Gesellschaft: Hamburg
>>       Amtsgericht Hamburg HRB 39784
>> 
>>     _______________________________________________
>>     users mailing list
>>     us...@open-mpi.org
>>     Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>     Link to this post:
>>     http://www.open-mpi.org/community/lists/users/2015/07/27359.php
> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post: 
>> http://www.open-mpi.org/community/lists/users/2015/07/27360.php
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27362.php

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to