Signal 15 is usually SIGTERM on Linux, meaning that some external entity 
probably killed the job.

The OMPI error message you describe is also typical for that kind of scenario 
-- i.e., a process exited without calling MPI_Finalize could mean that it 
called exit() or some external process killed it.


On Aug 3, 2011, at 7:24 AM, BasitAli Khan wrote:

> I am trying to run a rather heavy wrf simulation with spectral nudging but 
> the simulation crashes after 1.8 minutes of integration.
>  The simulation has two domains    with  d01 = 601x601 and d02 = 721x721 and 
> 51 vertical levels. I tried this simulation on two different systems but 
> result was more or less same. For example 
> 
> On our Bluegene/P  with SUSE Linux Enterprise Server 10 ppc and XLF compiler 
> I tried to run wrf on 2048 shared memory nodes (1 compute node = 4 cores , 32 
> bit, 850 Mhz). For the parallel run I used mpixlc, mpixlcxx and mpixlf90.  I 
> got the following error message in the wrf.err file
> 
> <Aug 01 19:50:21.244540> BE_MPI (ERROR): The error message in the job
> record is as follows:
> <Aug 01 19:50:21.244657> BE_MPI (ERROR):   "killed with signal 15"
> 
> I also tried to run the same simulation on our linux cluster (Linux Red Hat 
> Enterprise 5.4m  x86_64 and Intel compiler) with 8, 16 and 64 nodes (1 
> compute node=8 cores). For the parallel run I am used 
> mpi/openmpi/1.4.2-intel-11. I got the following error message in the error 
> log after couple of minutes of integration. 
> 
> "mpirun has exited due to process rank 45 with PID 19540 on
> node ci118 exiting without calling "finalize". This may
> have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here)."
> 
> I tried many things but nothing seems to be working. However, if I reduce  
> grid points below 200, the simulation goes fine. It appears that probably 
> OpenMP has problem with large number of grid points but I have no idea how to 
> fix it. I will greatly appreciate if you could suggest some solution.
> 
> Best regards, 
> ---
> Basit A. Khan, Ph.D.
> Postdoctoral Fellow
> Division of Physical Sciences & Engineering
> Office# 3204, Level 3, Building 1,
> King Abdullah University of Science & Technology
> 4700 King Abdullah Blvd, Box 2753, Thuwal 23955 –6900,
> Kingdom of Saudi Arabia.
> 
> Office: +966(0)2 808 0276,  Mobile: +966(0)5 9538 7592
> E-mail: basitali.k...@kaust.edu.sa
> Skype name: basit.a.khan 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/


Reply via email to