I am not familiar with attaching debugger to the processes. Other things
you asked are as follows:

  Is this the first time you've ran it (with Open MPI? with any MPI?) *No
We have been running this and other models but this problem has arised now
*  How many processes is the job using?  Are you oversubscribing your
processors?* I have tried to run on cluster having 184 cores as well on 8
cores of the same server
*  What version of Open MPI are you using? *openmpi 1.4.2*
  Have you tested all network connections?  *yes
*  It might help us to know the size of cluster you are running and what
type of network? *the cluster has 32 nodes dell power edge blade servers
and connectivity is Gigabit Ethernet and Infiniband,
*


On Tue, Apr 24, 2012 at 3:02 PM, TERRY DONTJE <terry.don...@oracle.com>wrote:

> To determine if an MPI process is waiting for a message do what Rayson
> suggested and attach a debugger to the processes and see if any of them are
> stuck in MPI.  Either internally in a MPI_Recv or MPI_Wait call or looping
> on a MPI_Test call.
>
> Other things to consider.
>   Is this the first time you've ran it (with Open MPI? with any MPI?)?
>   How many processes is the job using?  Are you oversubscribing your
> processors?
>   What version of Open MPI are you using?
>   Have you tested all network connections?
>   It might help us to know the size of cluster you are running and what
> type of network?
>
> --td
>
> On 4/24/2012 2:42 AM, Syed Ahsan Ali wrote:
>
> Dear Rayson,
>
> That is a Nuemrical model that is written by National weather service of a
> country. The logs of the model show every detail about the simulation
> progress. I have checked on the remote nodes as well the application binary
> is running but the logs show no progress, it is just waiting at a point.
> The input data is correct everything is fine. How can I check if the MPI
> task is waiting for a message?
> Ahsan
>
> On Tue, Apr 24, 2012 at 11:03 AM, Rayson Ho <raysonlo...@gmail.com> wrote:
>
>> Seems like there's a bug in the application. Did you or someone else
>> write it, or did you get it from an ISV??
>>
>> You can log onto one of the nodes, attach a debugger, and see if the
>> MPI task is waiting for a message (looping in one of the MPI receive
>> functions)...
>>
>> Rayson
>>
>> =================================
>> Open Grid Scheduler / Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> Scalable Grid Engine Support Program
>> http://www.scalablelogic.com/
>>
>>
>> On Tue, Apr 24, 2012 at 12:49 AM, Syed Ahsan Ali <ahsansha...@gmail.com>
>> wrote:
>> > Dear All,
>> >
>> > I am having problem with running an application on Dell cluster . The
>> model
>> > starts well but no further progress is shown. It just stuck. I have
>> checked
>> > the systems, no apparent hardware error is there. Other open mpi
>> > applications are running well on the same cluster. I have tried running
>> the
>> > application on cores of the same server as well but the problem is
>> same. The
>> > application just don't move further. The same application is also
>> running
>> > well on a backup cluster. Please help.
>> >
>> >
>> > Thanks and Best Regards
>> >
>> > Ahsan
>> >
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> --
>> ==================================================
>> Open Grid Scheduler - The Official Open Source Grid Engine
>> http://gridscheduler.sourceforge.net/
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
>
>
> _______________________________________________
> users mailing 
> listusers@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
>   Terry D. Dontje | Principal Software Engineer
> Developer Tools Engineering | +1.781.442.2631
> Oracle *- Performance Technologies*
> 95 Network Drive, Burlington, MA 01803
> Email terry.don...@oracle.com
>
>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

Reply via email to