I agree with Ralph. Please run again with --enable-debug. That will give
more information (line number) on where the error is occuring.

Looking at the function in question the only place I see that could be
causing this warning is the call to strlen. Some implementations of
strlen use operate on larger chunks (4 or 8 bytes). This will make
valgrind unhappy but does not make the implementation invalid as no read
will cross a page boundary (so no SEGV). One example of such a strlen
implementation is the one used by icc which uses vector operations on
8-byte chunks of the string.

-Nathan

On Wed, Jul 29, 2015 at 07:58:09AM -0700, Ralph Castain wrote:
>    If you have the time, it would be helpful. You might also configure
>    -enable-debug.
>    Meantime, I can take another gander to see how it could happen - looking
>    at the code, it sure seems impossible, but maybe there is some strange
>    path that would break it.
> 
>      On Jul 29, 2015, at 6:29 AM, Schlottke-Lakemper, Michael
>      <m.schlottke-lakem...@aia.rwth-aachen.de> wrote:
>      If it is helpful, I can try to compile OpenMPI with debug information
>      and get more details on the reported error. However, it would be good if
>      someone could tell me the necessary compile flags (on top of -O0 -g) and
>      it would take me probably 1-2 weeks to do it. 
>      Michael 
> 
>      -------- Original message --------
>      From: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
>      Date: 29/07/2015 14:17 (GMT+01:00)
>      To: Open MPI Users <us...@open-mpi.org>
>      Subject: Re: [OMPI users] Invalid read of size 4 (Valgrind error) with
>      OpenMPI 1.8.7
> 
>      Thomas,
>      can you please elaborate ?
>      I checked the code of opal_os_dirpath_create and could not find where
>      such a thing can happen
>      Thanks,
>      Gilles
>      On Wednesday, July 29, 2015, Thomas Jahns <ja...@dkrz.de> wrote:
> 
>        Hello,
> 
>        On 07/28/15 17:34, Schlottke-Lakemper, Michael wrote:
> 
>          That's what I suspected. Thank you for your confirmation.
> 
>        you are mistaken, the allocation is 51 bytes long, i.e. valid bytes
>        are at offsets 0 to 50. But since the read of 4 bytes starts at offset
>        48, the bytes at offsets 48, 49, 50 and 51 get read, the last of which
>        is illegal. It probably does no harm at the moment in practice,
>        because virtually all allocators always add some padding to the next
>        multiple of some power of 2. But still this means the program is
>        incorrect in terms of any programming language definition involved
>        (might be C, C++ or Fortran).
> 
>        Regards, Thomas
> 
>            On 25 Jul 2015, at 16:10 , Ralph Castain <r...@open-mpi.org
>            <mailto:r...@open-mpi.org>> wrote:
> 
>            Looks to me like a false positive - we do malloc some space, and
>            do access
>            different parts of it. However, it looks like we are inside the
>            space at all
>            times.
> 
>            I'd suppress it
> 
>              On Jul 23, 2015, at 12:47 AM, Schlottke-Lakemper, Michael
>              <m.schlottke-lakem...@aia.rwth-aachen.de
>              <mailto:m.schlottke-lakem...@aia.rwth-aachen.de>> wrote:
> 
>              Hi folks,
> 
>              recently we've been getting a Valgrind error in PMPI_Init for
>              our suite of
>              regression tests:
> 
>              ==5922== Invalid read of size 4
>              ==5922==    at 0x61CC5C0: opal_os_dirpath_create (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
>              ==5922==    by 0x5F207E5: orte_session_dir (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x7E96679: rte_init (in
>              /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
>              ==5922==    by 0x5F12A77: orte_init (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x509883C: ompi_mpi_init (in
>              /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>              ==5922==    by 0x50B843A: PMPI_Init (in
>              /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>              ==5922==    by 0xEBA79C: ZFS::run() (in
>              
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>              ==5922==    by 0x4DC243: main (in
>              
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>              ==5922==  Address 0x710f670 is 48 bytes inside a block of size
>              51 alloc'd
>              ==5922==    at 0x4C29110: malloc (in
>              /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
>              ==5922==    by 0x61CC572: opal_os_dirpath_create (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-pal.so.6.2.2)
>              ==5922==    by 0x5F207E5: orte_session_dir (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x5F34F04: orte_ess_base_app_setup (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x7E96679: rte_init (in
>              /aia/opt/openmpi-1.8.7/lib64/openmpi/mca_ess_env.so)
>              ==5922==    by 0x5F12A77: orte_init (in
>              /aia/opt/openmpi-1.8.7/lib64/libopen-rte.so.7.0.6)
>              ==5922==    by 0x509883C: ompi_mpi_init (in
>              /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>              ==5922==    by 0x50B843A: PMPI_Init (in
>              /aia/opt/openmpi-1.8.7/lib64/libmpi.so.1.6.2)
>              ==5922==    by 0xEBA79C: ZFS::run() (in
>              
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>              ==5922==    by 0x4DC243: main (in
>              
> /aia/r018/scratch/mic/.zfstester/.zacc_cron/zacc_cron_r9063/zfs_gnu_production)
>              ==5922==
> 
>              What is weird is that it seems to depend on the pbs/torque
>              session we're in:
>              sometimes the error does not occur and all and all tests run
>              fine (this is in
>              fact the only Valgrind error we're having at the moment). Other
>              times every
>              single test we're running has this error.
> 
>              Has anyone seen this or might be able to offer an explanation?
>              If it is a
>              false-positive, I'd be happy to suppress it :)
> 
>              Thanks a lot in advance
> 
>              Michael
> 
>              P.S.: This error is not covered/suppressed by the default ompi
>              suppression
>              file in $PREFIX/share/openmpi.
> 
>              --
>              Michael Schlottke-Lakemper
> 
>              SimLab Highly Scalable Fluids & Solids Engineering
>              Ju:lich Aachen Research Alliance (JARA-HPC)
>              RWTH Aachen University
>              Wu:llnerstrasse 5a
>              52062 Aachen
>              Germany
> 
>              Phone: +49 (241) 80 95188
>              Fax: +49 (241) 80 92257
>              Mail: m.schlottke-lakem...@aia.rwth-aachen.de
>              <mailto:m.schlottke-lakem...@aia.rwth-aachen.de>
>              Web: http://www.jara.org/jara-hpc
> 
>              _______________________________________________
>              users mailing list
>              us...@open-mpi.org <mailto:us...@open-mpi.org>
>              Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>              Link to this post:
>              http://www.open-mpi.org/community/lists/users/2015/07/27303.php
> 
>            _______________________________________________
>            users mailing list
>            us...@open-mpi.org <mailto:us...@open-mpi.org>
>            Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>            Link to this post:
>            http://www.open-mpi.org/community/lists/users/2015/07/27328.php
> 
>          _______________________________________________
>          users mailing list
>          us...@open-mpi.org
>          Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>          Link to this post:
>          http://www.open-mpi.org/community/lists/users/2015/07/27348.php
> 
>        --
>        Thomas Jahns
>        HD(CP)^2
>        Abteilung Anwendungssoftware
> 
>        Deutsches Klimarechenzentrum GmbH
>        Bundesstrasse 45a  o  D-20146 Hamburg  o  Germany
> 
>        Phone:  +49 40 460094-151
>        Fax:    +49 40 460094-270
>        Email:  Thomas Jahns <ja...@dkrz.de>
>        URL:    www.dkrz.de
> 
>        Gescha:ftsfu:hrer: Prof. Dr. Thomas Ludwig
>        Sitz der Gesellschaft: Hamburg
>        Amtsgericht Hamburg HRB 39784
> 
>      _______________________________________________
>      users mailing list
>      us...@open-mpi.org
>      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>      Link to this post:
>      http://www.open-mpi.org/community/lists/users/2015/07/27359.php

> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: 
> http://www.open-mpi.org/community/lists/users/2015/07/27360.php

Attachment: pgpE4N3ZJyX1h.pgp
Description: PGP signature

Reply via email to