On Thursday 11 March 2010, Matthew MacManes wrote: > Can anybody tell me if this is an error associated with openmpi, versus an > issue with the program I am running (MRBAYES, > https://sourceforge.net/projects/mrbayes/) > > We are trying to run a large simulated dataset using 1,000,000 bases > divided up into 1000 genes, 5 taxa.. An error is occurring, but we are not > sure why. We are using the MPI version of MRBAYES v3.2-cvs on a linux > 16core 24GB RAM machine. It does not appear as if the program runs out of > memory (max memory usage is 13gb). Maybe this is an OpenMPI problem and > not related to MrBayes... > > See snippet of error message below. Can anybody give me any hints about the > source of the problem? > > I am using OPENMPI version 1.4.1. > > ... > Defining charset called gene997 > Defining charset called gene998 > Defining charset called gene999 > Defining charset called gene1000 > Defining partition called Genes > [macmanes:02546] *** Process received signal *** > [macmanes:02546] Signal: Segmentation fault (11) > [macmanes:02546] Signal code: Address not mapped (1) > [macmanes:02546] Failing at address: (nil) > [macmanes:02546] [ 0] /lib/libpthread.so.0 [0x7ffd0f322190] > [macmanes:02546] *** End of error message *** > -------------------------------------------------------------------------- > mpirun noticed that process rank 13 with PID 2546 on node macmanes exited > on signal 11 (Segmentation fault).
On of the ranks got a "Segmentation fault". This would typically indicate a problem with the app not the MPI. Maybe you ran out of stack space? (ulimit -s). Have you tried a different/lower number of ranks? /Peter
signature.asc
Description: This is a digitally signed message part.