Hi,
When I run any parallel job I get the answer just from the submitting node even
when I tried to benchmark the cluster using LINPACK but it look like the job
just working on the submitting node is there a way to make openMPI send the job
equally to all the nodes depending on the number of
Dylan --
Sorry for the delay in replying.
On an offhand guess, does the problem go away if you run with:
--mca mpi_leave_pinned 0
?
On Mar 20, 2012, at 3:35 PM, Dylan Nelson wrote:
> Hello,
>
> I've been having trouble for awhile now running some OpenMPI+IB jobs on
> multiple tasks. The p
You might want to take an MPI tutorial or two; there's a few good ones
available on the net.
My favorites are the basic and intermediate level MPI tutorials at NCSA.
On Mar 25, 2012, at 1:13 PM, Rohan Deshpande wrote:
> Hi,
>
> I want to distribute the data on different machines using open mp
Hi Edgar.
Thanks for the response. I just did not understand why the Barrier works
before I remove one of the client processes.
I tryed it with 1 server and 3 clients and it worked properly. After I
removed 1 of the clients, it stops working. So, the removal is affecting
the functionality of Barr
Grzegorz, sometimes when a parallel application quits there are
processes left running on the compute nodes. You can usually find
these by running 'pgrep -P 1' and excluding any processes owned by
root.
These 'orphan' processes use up memory - so if you are having problems
with applications quittin
John, thank you for your reply.
I checked the system logs and there are no signs of oom killer.
What do you mean by cleaning 'orphan' processes? Should I check if
there are any processes left after each job execution? I have always
been assuming that when mpirun terminates, everything is cleaned
Have you checked the system logs on the machines where this is running?
Is it perhaps that the processes use lots of memory and the Out Of
Memory (OOM) killer is killing them?
Also check all nodes for left-over 'orphan' processes which are still
running after a job finishes - these should be killed
Hi,
I have an MPI application using ScaLAPACK routines. I'm running it on
OpenMPI 1.4.3. I'm using mpirun to launch less than 100 processes. I'm
using it quite extensively for almost two years and it almost always
works fine. However, once every 3-4 months I get the following error
during the execu