Hello Gilles,

              First of all I am extremely grateful for this
communication from you on a weekend and that too few hours after I

posted my email. Well I am not sure I can go on posting log files as
you rightly point out that MPI is not the source of the

problem. Still I have enclosed the valgrind log files as you
requested. I have downloaded the MPICH packages as you suggested

and I am going to install them shortly. But before I do that I think I
have a clue on the source of my problem(double free or corruption) and
I would really appreciate

your advice.


As I mentioned before COSMO has been compiled with mpif90 for shared
memory usage and with gfortran for sequential access.

But it is dependent on a lot of external third party software such as
zlib, libcurl, hdf5, netcdf and netcdf-fortran. When I

looked at the config.log of those packages all of them had  been
compiled with gfortran and gcc and some cases g++ with
enable-shared option. So my question then is could that be a source of
the "mismatch" ?

In other words I would have to recompile all those packages with
mpif90 and mpicc and then try another test. At the very


least there should be no mixing of gcc/gfortran compiled code with
mpif90 compiled code. Comments ?


Best regards,
Ashwin.

>Ashwin,

>did you try to run your app with a MPICH-based library (mvapich,
>IntelMPI or even stock mpich) ?
>or did you try with Open MPI v1.10 ?
>the stacktrace does not indicate the double free occurs in MPI...

>it seems you ran valgrind vs a shell and not your binary.
>assuming your mpirun command is
>mpirun lmparbin_all
>i suggest you try again with
>mpirun --tag-output valgrind lmparbin_all
>that will generate one valgrind log per task, but these are prefixed
>so it should be easier to figure out what is going wrong

>Cheers,

>Gilles


On Sun, Jun 18, 2017 at 11:41 AM, ashwin .D <winas...@gmail.com> wrote:
> There is a sequential version of the same program COSMO (no reference to
> MPI) that I can run without any problems. Of course it takes a lot longer to
> complete. Now I also ran valgrind (not sure whether that is useful or not)
> and I have enclosed the logs.


On Sun, Jun 18, 2017 at 8:11 AM, ashwin .D <winas...@gmail.com> wrote:

> There is a sequential version of the same program COSMO (no reference to
> MPI) that I can run without any problems. Of course it takes a lot longer
> to complete. Now I also ran valgrind (not sure whether that is useful or
> not) and I have enclosed the logs.
>
> On Sat, Jun 17, 2017 at 7:20 PM, ashwin .D <winas...@gmail.com> wrote:
>
>> Hello Gilles,
>>                    I am enclosing all the information you requested.
>>
>> 1)  as an attachment I enclose the log file
>> 2) I did rebuild OpenMPI 2.1.1 with the --enable-debug feature and I
>> reinstalled it /usr/lib/local.
>> I ran all the examples in the examples directory. All passed except
>> oshmem_strided_puts where I got this message
>>
>> [[48654,1],0][pshmem_iput.c:70:pshmem_short_iput] Target PE #1 is not in
>> valid range
>> ------------------------------------------------------------
>> --------------
>> SHMEM_ABORT was invoked on rank 0 (pid 13409, host=a-Vostro-3800) with
>> errorcode -1.
>> ------------------------------------------------------------
>> --------------
>>
>>
>> 3) I deleted all old OpenMPI versions under /usr/local/lib.
>> 4) I am using the COSMO weather model - http://www.cosmo-model.org/ to
>> run simulations
>> The support staff claim they have seen no errors with a similar setup.
>> They use
>>
>> 1) gfortran 4.8.5
>> 2) OpenMPI 1.10.1
>>
>> The only difference is I use OpenMPI 2.1.1.
>>
>> 5) I did try this option as well mpirun --mca btl tcp,self -np 4 cosmo.
>> and I got the same error as in the mpi_logs file
>>
>> 6) Regarding compiler and linking options on Ubuntu 16.04
>>
>> mpif90 --showme:compile and --showme:link give me the options for
>> compiling and linking.
>>
>> Here are the options from my makefile
>>
>> -pthread -lmpi_usempi -lmpi_mpifh -lmpi for linking
>>
>> 7) I have a 64 bit OS.
>>
>> Well I think I have responded all of your questions. In any case I have
>> not please let me know and I will respond ASAP. The only thing I have not
>> done is look at /usr/local/include. I saw some old OpenMPI files there. If
>> those need to be deleted I will do after I hear from you.
>>
>> Best regards,
>> Ashwin.
>>
>>
>

Attachment: logs
Description: Binary data

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to