Elio You should ask this question in the forum of the simulation program you are using. These failures have most likely nothing to do with MPI (or, at least, OpenMPI) so this is the wrong place for these questions.
Here is a bit of suggestion: does your program run without MPI at all? (i.e. in a stand-alone mode or perhaps with a different SPMD model such as OpenMP). If so, try running it in that mode to see if it behaves any better. Even if it does not, the stack trace will be more insightful. With OMPI's process launcher getting mixed up with your code stack, the source of the crash can be harder to figure out. HTH, Durga 1% of the executables have 99% of CPU privilege! Userspace code! Unite!! Occupy the kernel!!! On Sat, Apr 23, 2016 at 4:10 PM, Elio Physics <elio-phys...@live.com> wrote: > Well, I changed the compiler from mpif90 to mpiifort with corresponding > flags -i8 -g and recompiled. i am not getting the segmentation fault > problem anymore and the program runs but later stops with no errors (except > that the Fermi energy was not found!) and with some strange empty files > that are produced something like: fortDgcQe3 fortechvF2 fortMaN6a1 > fortnxoYy1 fortvR5F8q. i still feel something is wrong.. Does anybody > know what are these files? > > > Regards > > > ------------------------------ > *From:* users <users-boun...@open-mpi.org> on behalf of Ralph Castain < > r...@open-mpi.org> > *Sent:* Saturday, April 23, 2016 1:38 PM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT > > I don’t see any way this could be compilation related - I suspect there is > simply some error in the program (e.g., forgetting to initialize some > memory region). > > > On Apr 23, 2016, at 8:03 AM, Elio Physics <elio-phys...@live.com> wrote: > > Hello Andy, > > the program is not mine. I have got it from a group upon request. It might > be program related because I run other codes such as quantum espresso and > work perfectly fine although it is the cluster people who compiled it. > Since I have compiled the program I am having problems with, I am thinking > that it might be "compilation" related. This is why i wanted some experts' > opinion on this > > > > ------------------------------ > *From:* users <users-boun...@open-mpi.org> on behalf of Andy Riebs < > andy.ri...@hpe.com> > *Sent:* Saturday, April 23, 2016 12:49 PM > *To:* us...@open-mpi.org > *Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT > > The challenge for the MPI experts here (of which I am NOT one!) is that > the problem appears to be in your program; MPI is simply reporting that > your program failed. If you got the program from someone else, you will > need to solicit their help. If you wrote it, well, it is never a bad time > to learn to use gdb! > > Best regards > Andy > > On 04/23/2016 10:41 AM, Elio Physics wrote: > > I am not really an expert with gdb. What is the core file? and how to use > gdb? I have got three files as an output when the executable is used. One > is the actual output which stops and the other two are error files (from > which I knew about the segmentation fault). > > > thanks > > > ------------------------------ > *From:* users <users-boun...@open-mpi.org> <users-boun...@open-mpi.org> on > behalf of Ralph Castain <r...@open-mpi.org> <r...@open-mpi.org> > *Sent:* Saturday, April 23, 2016 11:39 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT > > valgrind isn’t going to help here - there are multiple reasons why your > application could be segfaulting. Take a look at the core file with gdb and > find out where it is failing. > > On Apr 22, 2016, at 10:20 PM, Elio Physics <elio-phys...@live.com> wrote: > > One more thing i forgot to mention in my previous e-mail. In the output > file I get the following message: > > > 2 total processes killed (some possibly by mpirun during cleanup) > > Thanks > > > > ------------------------------ > *From:* users <users-boun...@open-mpi.org> on behalf of Elio Physics < > <elio-phys...@live.com>elio-phys...@live.com> > *Sent:* Saturday, April 23, 2016 3:07 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT > > I have used valgrind and this is what i got: > > valgrind mpirun ~/Elie/SPRKKR/bin/kkrscf6.3MPI Fe_SCF.inp > > scf-51551.jlborges.fisica.ufmg.br.out > ==8135== Memcheck, a memory error detector > ==8135== Copyright (C) 2002-2012, and GNU GPL'd, by Julian Seward et al. > ==8135== Using Valgrind-3.8.1 and LibVEX; rerun with -h for copyright info > ==8135== Command: mpirun /home/emoujaes/Elie/SPRKKR/bin/kkrscf6.3MPI > Fe_SCF.inp > ==8135== > -------------------------------------------------------------------------- > mpirun noticed that process rank 0 with PID 8147 on node > jlborges.fisica.ufmg.br exited on signal 11 (Segmentation fault). > -------------------------------------------------------------------------- > ==8135== > ==8135== HEAP SUMMARY: > ==8135== in use at exit: 485,683 bytes in 1,899 blocks > ==8135== total heap usage: 7,723 allocs, 5,824 frees, 12,185,660 bytes > allocated > ==8135== > ==8135== LEAK SUMMARY: > ==8135== definitely lost: 34,944 bytes in 34 blocks > ==8135== indirectly lost: 26,613 bytes in 58 blocks > ==8135== possibly lost: 0 bytes in 0 blocks > ==8135== still reachable: 424,126 bytes in 1,807 blocks > ==8135== suppressed: 0 bytes in 0 blocks > ==8135== Rerun with --leak-check=full to see details of leaked memory > ==8135== > ==8135== For counts of detected and suppressed errors, rerun with: -v > ==8135== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 6 from 6) > > What does that supposed to mean? > > Regards > ------------------------------ > *From:* users <users-boun...@open-mpi.org> on behalf of Ralph Castain < > <r...@open-mpi.org>r...@open-mpi.org> > *Sent:* Saturday, April 23, 2016 1:36:50 AM > *To:* Open MPI Users > *Subject:* Re: [OMPI users] MPIRUN SEGMENTATION FAULT > > All I can say is that your program segfault’d during execution - you might > want to look at the core file using a debugger like gdb to see why it > failed. > > > On Apr 22, 2016, at 8:32 PM, Elio Physics < <elio-phys...@live.com> > elio-phys...@live.com> wrote: > > Dear all, > > I have successfully compiled a code where the executable have been > produced. However when I started using the executable with mpirun, the code > stopped with the following error: > > "mpirun noticed that process rank 0 with PID 570 on node compute-1-9.local > exited on signal 11 (Segmentation fault)." > > What is that error due to? and How can i solve it? > > I will post the make.inc compilation file: > > BUILD_TYPE ?= > #BUILD_TYPE := debug > > VERSION = 6.3 > > ifeq ($(BUILD_TYPE), debug) > VERSION := $(VERSION)$(BUILD_TYPE) > endif > > BIN =~/Elie/SPRKKR/bin > #BIN=~/bin > #BIN=/tmp/$(USER) > > LIB = -L/opt/intel/mkl/lib/intel64/libmkl_blas95_ilp64 > -L/opt/intel/mkl/lib/intel64/libmkl_lapack95_ilp64 > -L/opt/intel/mkl/lib/intel64 -lmkl_scalapack_ilp64 -lmkl_intel_ilp64 > -lmkl_core -lmkl_sequential -lmkl_blacs_i > ntelmpi_ilp64 -lpthread -lm -ldl > > #Include mpif.h > INCLUDE = -I/opt/intel/mkl/include/intel64/ilp64 > -I/opt/intel/mkl/lib/include > > #FFLAGS > FFLAGS = -O2 > > FC = mpif90 -c $(FFLAGS) $(INCLUDE) > LINK = mpif90 $(FFLAGS) $(INCLUDE) > > MPI=MPI > > Thanks in advance > > Elio > University of Rondonia, brazil > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> > http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > <http://www.open-mpi.org/community/lists/users/2016/04/29000.php> > http://www.open-mpi.org/community/lists/users/2016/04/29000.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: <http://www.open-mpi.org/mailman/listinfo.cgi/users> > http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > <http://www.open-mpi.org/community/lists/users/2016/04/29003.php> > http://www.open-mpi.org/community/lists/users/2016/04/29003.php > > > > > _______________________________________________ > users mailing listus...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/29005.php > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/29007.php > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users > Link to this post: > http://www.open-mpi.org/community/lists/users/2016/04/29010.php >