[OMPI users] Can not run a parallel job on all the nodes in the cluster

2012-03-27 Thread Hameed Alzahrani
Hi, When I run any parallel job I get the answer just from the submitting node even when I tried to benchmark the cluster using LINPACK but it look like the job just working on the submitting node is there a way to make openMPI send the job equally to all the nodes depending on the number of

Re: [OMPI users] oMPI hang with IB question

2012-03-27 Thread Jeffrey Squyres
Dylan -- Sorry for the delay in replying. On an offhand guess, does the problem go away if you run with: --mca mpi_leave_pinned 0 ? On Mar 20, 2012, at 3:35 PM, Dylan Nelson wrote: > Hello, > > I've been having trouble for awhile now running some OpenMPI+IB jobs on > multiple tasks. The p

Re: [OMPI users] Data distribution on different machines

2012-03-27 Thread Jeffrey Squyres
You might want to take an MPI tutorial or two; there's a few good ones available on the net. My favorites are the basic and intermediate level MPI tutorials at NCSA. On Mar 25, 2012, at 1:13 PM, Rohan Deshpande wrote: > Hi, > > I want to distribute the data on different machines using open mp

Re: [OMPI users] Problem with MPI_Barrier (Inter-communicator)

2012-03-27 Thread Rodrigo Oliveira
Hi Edgar. Thanks for the response. I just did not understand why the Barrier works before I remove one of the client processes. I tryed it with 1 server and 3 clients and it worked properly. After I removed 1 of the clients, it stops working. So, the removal is affecting the functionality of Barr

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
Grzegorz, sometimes when a parallel application quits there are processes left running on the compute nodes. You can usually find these by running 'pgrep -P 1' and excluding any processes owned by root. These 'orphan' processes use up memory - so if you are having problems with applications quittin

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread Grzegorz Maj
John, thank you for your reply. I checked the system logs and there are no signs of oom killer. What do you mean by cleaning 'orphan' processes? Should I check if there are any processes left after each job execution? I have always been assuming that when mpirun terminates, everything is cleaned

Re: [OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread John Hearns
Have you checked the system logs on the machines where this is running? Is it perhaps that the processes use lots of memory and the Out Of Memory (OOM) killer is killing them? Also check all nodes for left-over 'orphan' processes which are still running after a job finishes - these should be killed

[OMPI users] MPI daemon died unexpectedly

2012-03-27 Thread Grzegorz Maj
Hi, I have an MPI application using ScaLAPACK routines. I'm running it on OpenMPI 1.4.3. I'm using mpirun to launch less than 100 processes. I'm using it quite extensively for almost two years and it almost always works fine. However, once every 3-4 months I get the following error during the execu