I am not familiar with attaching debugger to the processes. Other things you asked are as follows:
Is this the first time you've ran it (with Open MPI? with any MPI?) *No We have been running this and other models but this problem has arised now * How many processes is the job using? Are you oversubscribing your processors?* I have tried to run on cluster having 184 cores as well on 8 cores of the same server * What version of Open MPI are you using? *openmpi 1.4.2* Have you tested all network connections? *yes * It might help us to know the size of cluster you are running and what type of network? *the cluster has 32 nodes dell power edge blade servers and connectivity is Gigabit Ethernet and Infiniband, * On Tue, Apr 24, 2012 at 3:02 PM, TERRY DONTJE <terry.don...@oracle.com>wrote: > To determine if an MPI process is waiting for a message do what Rayson > suggested and attach a debugger to the processes and see if any of them are > stuck in MPI. Either internally in a MPI_Recv or MPI_Wait call or looping > on a MPI_Test call. > > Other things to consider. > Is this the first time you've ran it (with Open MPI? with any MPI?)? > How many processes is the job using? Are you oversubscribing your > processors? > What version of Open MPI are you using? > Have you tested all network connections? > It might help us to know the size of cluster you are running and what > type of network? > > --td > > On 4/24/2012 2:42 AM, Syed Ahsan Ali wrote: > > Dear Rayson, > > That is a Nuemrical model that is written by National weather service of a > country. The logs of the model show every detail about the simulation > progress. I have checked on the remote nodes as well the application binary > is running but the logs show no progress, it is just waiting at a point. > The input data is correct everything is fine. How can I check if the MPI > task is waiting for a message? > Ahsan > > On Tue, Apr 24, 2012 at 11:03 AM, Rayson Ho <raysonlo...@gmail.com> wrote: > >> Seems like there's a bug in the application. Did you or someone else >> write it, or did you get it from an ISV?? >> >> You can log onto one of the nodes, attach a debugger, and see if the >> MPI task is waiting for a message (looping in one of the MPI receive >> functions)... >> >> Rayson >> >> ================================= >> Open Grid Scheduler / Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> Scalable Grid Engine Support Program >> http://www.scalablelogic.com/ >> >> >> On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali <ahsansha...@gmail.com> >> wrote: >> > Dear All, >> > >> > I am having problem with running an application on Dell cluster . The >> model >> > starts well but no further progress is shown. It just stuck. I have >> checked >> > the systems, no apparent hardware error is there. Other open mpi >> > applications are running well on the same cluster. I have tried running >> the >> > application on cores of the same server as well but the problem is >> same. The >> > application just don't move further. The same application is also >> running >> > well on a backup cluster. Please help. >> > >> > >> > Thanks and Best Regards >> > >> > Ahsan >> > >> > _______________________________________________ >> > users mailing list >> > us...@open-mpi.org >> > http://www.open-mpi.org/mailman/listinfo.cgi/users >> >> >> >> -- >> ================================================== >> Open Grid Scheduler - The Official Open Source Grid Engine >> http://gridscheduler.sourceforge.net/ >> >> _______________________________________________ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > > > _______________________________________________ > users mailing > listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Terry D. Dontje | Principal Software Engineer > Developer Tools Engineering | +1.781.442.2631 > Oracle *- Performance Technologies* > 95 Network Drive, Burlington, MA 01803 > Email terry.don...@oracle.com > > > > > _______________________________________________ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users >