Hello Gus, Jody The system has enough memory. I unlimited the stack size before runnning WRF by the command *ulimit -s unlimited*.But he problem occured. Thanks
Hi Ahsan, Jody > > Just a guess that this may be a stack size problem. > Did you try to run WRF with unlimited stack size? > Also, does your machine have enough memory to run WRF? > > I hope this helps, > Gus Correa > > > jody wrote: > > Hi > > At a first glance i would say this is not a OpenMPI problem, > > but a wrf problem (though io must admit i have no knowledge whatsoever > ith wrf) > > > > Have you tried running a single instance of wrf.exe? > > Have you tried to run a simple application (like a "hello world") on your > nodes? > > > > Jody > > > > > > On Tue, Feb 22, 2011 at 7:37 AM, Ahsan Ali <ahsansha...@gmail.com> > wrote: > >> Hello, > >> I an stuck in a problem that is regarding the running for Weather > research > >> and Forecasting Model (WRFV 3.2.1). I get the following error while > running > >> with mpirun. Any help would be highly appreciated. > >> > >> [pmdtest@pmd02 em_real]$ mpirun -np 4 wrf.exe > >> starting wrf task 0 of 4 > >> starting wrf task 1 of 4 > >> starting wrf task 3 of 4 > >> starting wrf task 2 of 4 > >> > -------------------------------------------------------------------------- > >> mpirun noticed that process rank 3 with PID 6044 on node > pmd02.pakmet.com > >> exited on signal 11 (Segmentation fault). > >> > >> > >> > >> -- > >> Syed Ahsan Ali Bokhari > >> Electronic Engineer (EE) > >> Research & Development Division > >> Pakistan Meteorological Department H-8/4, Islamabad. > >> Phone # off +92518358714 > >> Cell # +923155145014 > >> > >> > Dear Jody, > > WRF is running well on serial option (i.e single interface) . I am running > another application HRM using OpenMPI , there is no issue with that and > application is running on cluster of many nodes. The wrf manual says the > following about MPI run: > > I*f you have run the model on multiple processors using MPI, you should > have > a number of rsl.out.* and rsl.error.* files. Type ?tail rsl.out.0000? to > see > if you get ?SUCCESS COMPLETE WRF?. This is a good indication that the model > has run successfully.* > > *Take a look at either rsl.out.0000 file or other standard out file. This > file logs the times taken to compute for one model time step, and to write > one history and restart output:* > > * > Timing for main: time 2006-01-21_23:55:00 on domain 2: 4.91110 elapsed > seconds.* > > *Timing for main: time 2006-01-21_23:56:00 on domain 2: 4.73350 elapsed > seconds.* > > *Timing for main: time 2006-01-21_23:57:00 on domain 2: 4.72360 elapsed > seconds.* > > *Timing for main: time 2006-01-21_23:57:00 on domain 1: 19.55880 elapsed > seconds.* > > *and* > > *Timing for Writing wrfout_d02_2006-01-22_00:00:00 for domain 2: 1.17970 > elapsed seconds.* > > *Timing for main: time 2006-01-22_00:00:00 on domain 1: 27.66230 elapsed > seconds.* > > *Timing for Writing wrfout_d01_2006-01-22_00:00:00 for domain 1: 0.60250 > elapsed seconds.* > > * * > > *If the model did not run to completion, take a look at these standard > output/error files too. If the model has become numerically unstable, it > may > have violated the CFL criterion (for numerical stability). Check whether > this is true by typing the following:* > > * * > > *grep cfl rsl.error.* or grep cfl wrf.out* > > *you might see something like these:* > > *5 points exceeded cfl=2 in domain 1 at time 4.200000 * > > * MAX AT i,j,k: 123 48 3 cfl,w,d(eta)= > 4.165821* > > *21 points exceeded cfl=2 in domain 1 at time 4.200000 * > > * MAX AT i,j,k: 123 49 4 cfl,w,d(eta)= > 10.66290* > > But when I check the rsl.out* or rsl.error* there is no indication on any > error occured ,It seems that the application just didn't start. > [pmdtest@pmd02 em_real]$ tail rsl.out.0000 > WRF NUMBER OF TILES FROM OMP_GET_MAX_THREADS = 8 > WRF TILE 1 IS 1 IE 360 JS 1 JE 25 > WRF TILE 2 IS 1 IE 360 JS 26 JE 50 > WRF TILE 3 IS 1 IE 360 JS 51 JE 74 > WRF TILE 4 IS 1 IE 360 JS 75 JE 98 > WRF TILE 5 IS 1 IE 360 JS 99 JE 122 > WRF TILE 6 IS 1 IE 360 JS 123 JE 146 > WRF TILE 7 IS 1 IE 360 JS 147 JE 170 > WRF TILE 8 IS 1 IE 360 JS 171 JE 195 > WRF NUMBER OF TILES = 8 > > > > Syed Ahsan Ali Bokhari Electronic Engineer (EE) Research & Development Division Pakistan Meteorological Department H-8/4, Islamabad. Phone # off +92518358714 Cell # +923155145014