Hello Gilles, First of all I am extremely grateful for this communication from you on a weekend and that too few hours after I
posted my email. Well I am not sure I can go on posting log files as you rightly point out that MPI is not the source of the problem. Still I have enclosed the valgrind log files as you requested. I have downloaded the MPICH packages as you suggested and I am going to install them shortly. But before I do that I think I have a clue on the source of my problem(double free or corruption) and I would really appreciate your advice. As I mentioned before COSMO has been compiled with mpif90 for shared memory usage and with gfortran for sequential access. But it is dependent on a lot of external third party software such as zlib, libcurl, hdf5, netcdf and netcdf-fortran. When I looked at the config.log of those packages all of them had been compiled with gfortran and gcc and some cases g++ with enable-shared option. So my question then is could that be a source of the "mismatch" ? In other words I would have to recompile all those packages with mpif90 and mpicc and then try another test. At the very least there should be no mixing of gcc/gfortran compiled code with mpif90 compiled code. Comments ? Best regards, Ashwin. >Ashwin, >did you try to run your app with a MPICH-based library (mvapich, >IntelMPI or even stock mpich) ? >or did you try with Open MPI v1.10 ? >the stacktrace does not indicate the double free occurs in MPI... >it seems you ran valgrind vs a shell and not your binary. >assuming your mpirun command is >mpirun lmparbin_all >i suggest you try again with >mpirun --tag-output valgrind lmparbin_all >that will generate one valgrind log per task, but these are prefixed >so it should be easier to figure out what is going wrong >Cheers, >Gilles On Sun, Jun 18, 2017 at 11:41 AM, ashwin .D <winas...@gmail.com> wrote: > There is a sequential version of the same program COSMO (no reference to > MPI) that I can run without any problems. Of course it takes a lot longer to > complete. Now I also ran valgrind (not sure whether that is useful or not) > and I have enclosed the logs. On Sun, Jun 18, 2017 at 8:11 AM, ashwin .D <winas...@gmail.com> wrote: > There is a sequential version of the same program COSMO (no reference to > MPI) that I can run without any problems. Of course it takes a lot longer > to complete. Now I also ran valgrind (not sure whether that is useful or > not) and I have enclosed the logs. > > On Sat, Jun 17, 2017 at 7:20 PM, ashwin .D <winas...@gmail.com> wrote: > >> Hello Gilles, >> I am enclosing all the information you requested. >> >> 1) as an attachment I enclose the log file >> 2) I did rebuild OpenMPI 2.1.1 with the --enable-debug feature and I >> reinstalled it /usr/lib/local. >> I ran all the examples in the examples directory. All passed except >> oshmem_strided_puts where I got this message >> >> [[48654,1],0][pshmem_iput.c:70:pshmem_short_iput] Target PE #1 is not in >> valid range >> ------------------------------------------------------------ >> -------------- >> SHMEM_ABORT was invoked on rank 0 (pid 13409, host=a-Vostro-3800) with >> errorcode -1. >> ------------------------------------------------------------ >> -------------- >> >> >> 3) I deleted all old OpenMPI versions under /usr/local/lib. >> 4) I am using the COSMO weather model - http://www.cosmo-model.org/ to >> run simulations >> The support staff claim they have seen no errors with a similar setup. >> They use >> >> 1) gfortran 4.8.5 >> 2) OpenMPI 1.10.1 >> >> The only difference is I use OpenMPI 2.1.1. >> >> 5) I did try this option as well mpirun --mca btl tcp,self -np 4 cosmo. >> and I got the same error as in the mpi_logs file >> >> 6) Regarding compiler and linking options on Ubuntu 16.04 >> >> mpif90 --showme:compile and --showme:link give me the options for >> compiling and linking. >> >> Here are the options from my makefile >> >> -pthread -lmpi_usempi -lmpi_mpifh -lmpi for linking >> >> 7) I have a 64 bit OS. >> >> Well I think I have responded all of your questions. In any case I have >> not please let me know and I will respond ASAP. The only thing I have not >> done is look at /usr/local/include. I saw some old OpenMPI files there. If >> those need to be deleted I will do after I hear from you. >> >> Best regards, >> Ashwin. >> >> >
logs
Description: Binary data
_______________________________________________ users mailing list users@lists.open-mpi.org https://rfd.newmexicoconsortium.org/mailman/listinfo/users