Signal 15 is usually SIGTERM on Linux, meaning that some external entity probably killed the job.
The OMPI error message you describe is also typical for that kind of scenario -- i.e., a process exited without calling MPI_Finalize could mean that it called exit() or some external process killed it. On Aug 3, 2011, at 7:24 AM, BasitAli Khan wrote: > I am trying to run a rather heavy wrf simulation with spectral nudging but > the simulation crashes after 1.8 minutes of integration. > The simulation has two domains with d01 = 601x601 and d02 = 721x721 and > 51 vertical levels. I tried this simulation on two different systems but > result was more or less same. For example > > On our Bluegene/P with SUSE Linux Enterprise Server 10 ppc and XLF compiler > I tried to run wrf on 2048 shared memory nodes (1 compute node = 4 cores , 32 > bit, 850 Mhz). For the parallel run I used mpixlc, mpixlcxx and mpixlf90. I > got the following error message in the wrf.err file > > <Aug 01 19:50:21.244540> BE_MPI (ERROR): The error message in the job > record is as follows: > <Aug 01 19:50:21.244657> BE_MPI (ERROR): "killed with signal 15" > > I also tried to run the same simulation on our linux cluster (Linux Red Hat > Enterprise 5.4m x86_64 and Intel compiler) with 8, 16 and 64 nodes (1 > compute node=8 cores). For the parallel run I am used > mpi/openmpi/1.4.2-intel-11. I got the following error message in the error > log after couple of minutes of integration. > > "mpirun has exited due to process rank 45 with PID 19540 on > node ci118 exiting without calling "finalize". This may > have caused other processes in the application to be > terminated by signals sent by mpirun (as reported here)." > > I tried many things but nothing seems to be working. However, if I reduce > grid points below 200, the simulation goes fine. It appears that probably > OpenMP has problem with large number of grid points but I have no idea how to > fix it. I will greatly appreciate if you could suggest some solution. > > Best regards, > --- > Basit A. Khan, Ph.D. > Postdoctoral Fellow > Division of Physical Sciences & Engineering > Office# 3204, Level 3, Building 1, > King Abdullah University of Science & Technology > 4700 King Abdullah Blvd, Box 2753, Thuwal 23955 –6900, > Kingdom of Saudi Arabia. > > Office: +966(0)2 808 0276, Mobile: +966(0)5 9538 7592 > E-mail: basitali.k...@kaust.edu.sa > Skype name: basit.a.khan > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/