date:20110413

Re: [OMPI users] Over committing?

2011-04-13 Thread Ralph Castain

Have you folks used a debugger such as TotalView or padb to look at these stalls? I ask because we discovered a long time ago that MPI collectives can "hang" in the scenario you describe. It is caused by one rank falling behind, and then never catching up due to resource allocations - i.e.., on

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan

Hi, I find the reason why the program is killed by operating system in the case that the problem size is large. It consumes more memory and leads to more memory swap. This also degrade the program performance. But, I cannot determine which function of the worker process causes the problem. I

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread James Dinan

Sudheer, Locks in MPI don't mean mutexes, they mark the beginning and end of a passive mode communication epoch. All MPI operations within an epoch logically occur concurrently and must be non-conflicting. So, what you're written below is incorrect: the get is not guaranteed to complete unt

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Abhishek Kulkarni

On Wed, Apr 13, 2011 at 2:49 PM, Barrett, Brian W wrote: > This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock. > Some might call what I'm about to describe erroneous. I wrote the > one-sided code in Open MPI and may be among those people. > > In both implementations, one-sid

Re: [OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Barrett, Brian W

This is mostly an issue of how MPICH2 and Open MPI implement lock/unlock. Some might call what I'm about to describe erroneous. I wrote the one-sided code in Open MPI and may be among those people. In both implementations, one-sided communication is not necessarily truly asynchronous. That is, t

[OMPI users] MPI one-sided passive synchronization.

2011-04-13 Thread Abhishek Kulkarni

Hello, I am trying to better understand the semantics of passive synchronization in one-sided communication calls. Doesn't MPI_Win_unlock() block to ensure that all the preceeding RMA calls in that epoch have been synced? In that case, why is an undefined value returned when trying to load from a

[OMPI users] Process to resource mapping in ompi-restart

2011-04-13 Thread kishor kharbas

Hello All, I have been enjoying using Transparent CR in Open MPI for my research ! I have few questions regarding working of ompi-restart: 1. I there a fixed mapping of processes to resources when ompi-restart is done. 2. Is there a way for the user to control it. If I am correct, ompi-restart d

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain

On Apr 13, 2011, at 10:19 AM, Jack Bryan wrote: > Hi, I am using > > mpirun (Open MPI) 1.3.4 > > But, I have these, > > orte-clean orted orte-ioforte-ps orterun > > Can they do the same thing ? Unfortunately, no > > If I use them, will they use a lot of memory on each wo

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain

On Apr 13, 2011, at 10:29 AM, Jack Bryan wrote: > Hi , > > If I cannot ssh to a worker node, it means that my program cannot work > correctly ? No, that's not true. People thought you were on a cluster using ssh as the launcher. From prior notes, you were using Torque, so not being allowed

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan

Hi, I do not have qrsh I have qrerunqrls qrttoppm qrun Can they do the same thing ? thanks > From: re...@staff.uni-marburg.de > Date: Wed, 13 Apr 2011 16:28:14 +0200 > To: us...@open-mpi.org > Subject: Re: [OMPI users] OMPI monitor each process behavior > > Am 13.04.2011 um 05:55 schr

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan

Hi , If I cannot ssh to a worker node, it means that my program cannot work correctly ? I can run it on 32 nodes *4 cores/node parallel processes. But, for larger parallel processes, 128 nodes * 1 cpu/node, it is killed by signal 9. Is this a reason ? thanks > Date: Wed, 13 Apr 2011 05:59:1

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Jack Bryan

Hi, I am using mpirun (Open MPI) 1.3.4 But, I have these, orte-clean orted orte-ioforte-ps orterun Can they do the same thing ? If I use them, will they use a lot of memory on each worker node and print out a lot of things on some log files ? Any help is really appreciated. T

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin

The 16 cores refers to x3755-m2s. We have a mix of 3550s and 3755s in the cluster. It could be memory, but I think not. The jobs are well within memory capacity, and the memory is mainly static. If out of memory then the jobs would be first candidate for the job. Larger jobs run on the 3755s w

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin

Inline -Original Message- > From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf Of Stergiou, Jonathan C CIV NSWCCD West Bethesda,6640 > Sent: 13 April 2011 16:52 > To: Open MPI Users > Subject: Re: [OMPI users] Over committing? > > Martin, > > We have seen simil

Re: [OMPI users] Over committing?

2011-04-13 Thread Reuti

Am 13.04.2011 um 17:09 schrieb Rushton Martin: > Version 1.3.2 > > Consider a job that will run with 28 processes. The user submits it > with: > > $ qsub -l nodes=4:ppn=7 ... > > which reserves 7 cores on (in this case) each of x3550x014 x3550x015 and > x3550x016 x3550x020. Torque generates a

Re: [OMPI users] Over committing?

2011-04-13 Thread Stergiou, Jonathan C CIV NSWCCD West Bethesda, 6640

Martin, We have seen similar behavior when using certain codes. CodeA can run at ppn=8 with no problem, but CodeB will run much more slowly (or hang) with ppn=8; instead we use ppn=7 for CodeB. This becomes complicated when we run CodeA and CodeB together (coupled simulations). It requires

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin

I'm afraid I can't comment on how OMPI was configured, "as supplied by the suppliers"! The users experiencing these problems use the Intel bindings, loaded via the modules command. We are running CentOS 5.3. Martin Rushton HPC System Manager, Weapons Technologies Tel: 01959 514777, Mobile: 0793

Re: [OMPI users] Over committing?

2011-04-13 Thread Ralph Castain

Afraid I have no idea - we regularly run on Torque machines with the nodes fully populated. While most runs are only for a few hours, some runs go for days. How was OMPI configured? What OS version? On Apr 13, 2011, at 9:09 AM, Rushton Martin wrote: > Version 1.3.2 > > Consider a job that w

Re: [OMPI users] Over committing?

2011-04-13 Thread Rushton Martin

Version 1.3.2 Consider a job that will run with 28 processes. The user submits it with: $ qsub -l nodes=4:ppn=7 ... which reserves 7 cores on (in this case) each of x3550x014 x3550x015 and x3550x016 x3550x020. Torque generates a file (PBS_NODEFILE) which lists each node 7 times. The mpirun co

Re: [OMPI users] Over committing?

2011-04-13 Thread Ralph Castain

On Apr 13, 2011, at 8:13 AM, Rushton Martin wrote: > The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). > Jobs are submitted by Torque/MOAB. When run with up to np=8 there is > good performance. Attempting to run with more processors brings > problems, specifically if any one

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Reuti

Am 13.04.2011 um 05:55 schrieb Jack Bryan: > I need to monitor the memory usage of each parallel process on a linux Open > MPI cluster. > > But, top, ps command cannot help here because they only show the head node > information. > > I need to follow the behavior of each process on each clus

[OMPI users] Over committing?

2011-04-13 Thread Rushton Martin

The bulk of our compute nodes are 8 cores (twin 4-core IBM x3550-m2). Jobs are submitted by Torque/MOAB. When run with up to np=8 there is good performance. Attempting to run with more processors brings problems, specifically if any one node of a group of nodes has all 8 cores in use the job hang

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Ralph Castain

What version are you using? If you are using 1.5.x, there is an "orte-top" command that will do what you ask. It queries the daemons to get the info. On Apr 12, 2011, at 9:55 PM, Jack Bryan wrote: > Hi , All: > > I need to monitor the memory usage of each parallel process on a linux Open > M

Re: [OMPI users] Problem with setting up openmpi-1.4.3

2011-04-13 Thread Eugene Loh

amosl...@gmail.com wrote: Hi, I am embarrassed! I submitted a note to the users on setting up openmpi-1.4.3 using SUSE-11.3 under Linux and received several replies. I wanted to transfer them but they disappeared for no apparent reason. I hope that those that sent me messages wil

Re: [OMPI users] OMPI monitor each process behavior

2011-04-13 Thread Tim Prince

On 4/12/2011 8:55 PM, Jack Bryan wrote: I need to monitor the memory usage of each parallel process on a linux Open MPI cluster. But, top, ps command cannot help here because they only show the head node information. I need to follow the behavior of each process on each cluster node. Did you

Re: [OMPI users] OpenMPI 1.4.2 Hangs When Using More Than 1 Proc

2011-04-13 Thread Stergiou, Jonathan C CIV NSWCCD West Bethesda, 6640

All, It looks like the issue is solved. Our sysadmin had been working on the issue too - he noticed a lot of "junk" in my /etc/ld.so.conf.d/ directory. After "cleaning" it out (I think he ended up wiping everything out, then rebooting the machine, then re-configuring specific items as needed)

Re: [OMPI users] "Value out of bounds in file" error

2011-04-13 Thread hi

Hi Rainer, When executing "mpirun blacs_hello_example.exe" (reference: http://www.netlib.org/blacs/BLACS/Examples.html#HELLO), I am now getting folloing error... << C:\blacs_examples>mpirun blacs_hello_example.exe forrtl: severe (157): Program Exception - access violation Image P

Re: [OMPI users] "Value out of bounds in file" error

2011-04-13 Thread hi

Hi Rainer, Thanks for acknowledgment. > You may want to port/compile BLACS from netlib yourselve, see here: > http://icl.cs.utk.edu/lapack-for-windows/VisualStudio_install.html With that I am seeing compilation errors as reported in... http://icl.cs.utk.edu/lapack-forum/viewtopic.php?f=12&t=2354

Re: [OMPI users] Over committing?

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] MPI one-sided passive synchronization.

Re: [OMPI users] MPI one-sided passive synchronization.

Re: [OMPI users] MPI one-sided passive synchronization.

[OMPI users] MPI one-sided passive synchronization.

[OMPI users] Process to resource mapping in ompi-restart

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] Over committing?

Re: [OMPI users] OMPI monitor each process behavior

[OMPI users] Over committing?

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] Problem with setting up openmpi-1.4.3

Re: [OMPI users] OMPI monitor each process behavior

Re: [OMPI users] OpenMPI 1.4.2 Hangs When Using More Than 1 Proc

Re: [OMPI users] "Value out of bounds in file" error

Re: [OMPI users] "Value out of bounds in file" error

28 matches

Site Navigation

Mail list logo

Footer information