Hi Mohammadali
"Signal number 11 SEGV", is the Unix/Linux signal for a memory
violation (a.k.a. segmentation violation or segmentation fault).
This normally happens when the program tries to read
or write in a memory area that it did not allocate, already
freed, or belongs to another process.
That is most likely a programming error on the FEM code,
probably not an MPI error, probably not a PETSC error either.
The "errorcode 59" seems to be the PETSC error message
issued when it receives a signal (in this case a
segmentation fault signal, I guess) from the operational
system (Linux, probably).
Apparently it simply throws the error message and
calls MPI_Abort, and the program stops.
This is what petscerror.h include file has about error code 59:
#define PETSC_ERR_SIG 59 /* signal received */
**
One suggestion is to compile the code with debugging flags (-g),
and attach a debugger to it. Not an easy task if you have many
processes/ranks in your program, if your debugger is the default
Linux gdb, but it is not impossible to do either.
Depending on the computer you have, you may have a parallel debugger,
such as TotalView or DDT, which are more user friendly.
You could also compile it with the flag -traceback
(or -fbacktrace, the syntax depends on the compiler, check the compiler
man page).
This at least will tell you the location in the program where the
segmentation fault happened (in the STDERR file of your job).
I hope this helps.
Gus Correa
PS - The zip attachment with your "myjob.sh" script
was removed from the email.
Many email server programs remove zip for safety.
Files with ".sh" suffix are also removed in general.
You could compress it with gzip or bzip2 instead.
On 11/15/2016 02:40 PM, Beheshti, Mohammadali wrote:
Hi,
I am running simulations in a software which uses ompi to solve an FEM
problem. From time to time I receive the error “
MPI_ABORT was invoked on rank 0 in communicator compute with errorcode
59” in the output file while for the larger simulations (with larger FEM
mesh) I almost always get this error. I don’t have any idea what is the
cause of this error. The error file contains a PETSC error: ”caught
signal number 11 SEGV”. I am running my jobs on a HPC system which has
Open MPI version 2.0.0. I am also using a bash file (myjob.sh) which is
attached. The ompi_info - - all command and ifconfig command outputs
are also attached. I appreciate any help in this regard.
Thanks
Ali
**************************
Mohammadali Beheshti
Post-Doctoral Fellow
Department of Medicine (Cardiology)
Toronto General Research Institute
University Health Network
Tel: 416-340-4800 <tel:416-340-4800> ext. 6837
**************************
This e-mail may contain confidential and/or privileged information for
the sole use of the intended recipient.
Any review or distribution by anyone other than the person for whom it
was originally intended is strictly prohibited.
If you have received this e-mail in error, please contact the sender and
delete all copies.
Opinions, conclusions or other information contained in this e-mail may
not be that of the organization.
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users