Hi, I find the reason why the program is killed by operating system in the case that the problem size is large. It consumes more memory and leads to more memory swap. This also degrade the program performance. But, I cannot determine which function of the worker process causes the problem. I have used try-catch in my code but no exception popped out. I found that -------------------------------------------------------------------When the processes running on your server attempt to allocate more memory than your system has available, the kernel begins to swap memory pages to and from the disk. This is done in order to free up sufficient physical memory to meet the RAM allocation requirements of the requestor.------------------------------------------------------------------ I am not sure it is really caused by CPLEX ( an optimization model solver) or other routines or maybe by other dynamic memory allocation used by CPLEX API libray at background. Any help is really appreciated. Jack From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Wed, 13 Apr 2011 10:34:38 -0600 To: us...@open-mpi.org Subject: Re: [OMPI users] OMPI monitor each process behavior
On Apr 13, 2011, at 10:19 AM, Jack Bryan wrote:Hi, I am using mpirun (Open MPI) 1.3.4 But, I have these, orte-clean orted orte-iof orte-ps orterun Can they do the same thing ? Unfortunately, no If I use them, will they use a lot of memory on each worker node and print out a lot of things on some log files ? No, but they won't help. orte-top would be run only on the head node (i.e., where you are logged in), and would generate output to your screen. But you don't have it with that release, so the point is moot. Afraid there isn't much else you can do - you might talk to your sys admin and see what tools are available on your cluster for this purpose. Perhaps a nice parallel debugger is available? Any help is really appreciated. Thanks Jack From: r...@open-mpi.org List-Post: users@lists.open-mpi.org Date: Wed, 13 Apr 2011 08:09:17 -0600 To: us...@open-mpi.org Subject: Re: [OMPI users] OMPI monitor each process behavior What version are you using? If you are using 1.5.x, there is an "orte-top" command that will do what you ask. It queries the daemons to get the info. On Apr 12, 2011, at 9:55 PM, Jack Bryan wrote:Hi , All: I need to monitor the memory usage of each parallel process on a linux Open MPI cluster. But, top, ps command cannot help here because they only show the head node information. I need to follow the behavior of each process on each cluster node. I cannot use ssh to access each node. The program takes 8 hours to finish. Any help is really appreciated. Jack _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users