Hello,
we are seeing a large difference in performance for some applications
depending on what MPI is being used.
Attached are performance numbers and oprofile output (first 30 lines)
from one out of 14 nodes from one application run using OpenMPI,
IntelMPI and Scali MPI respectively.
Scali
Do you know if the application use some collective operations ?
Thanks
Pasha
Torgny Faxen wrote:
Hello,
we are seeing a large difference in performance for some applications
depending on what MPI is being used.
Attached are performance numbers and oprofile output (first 30 lines)
from one
Pasha,
no collectives are being used.
A simple grep in the code reveals the following MPI functions being used:
MPI_Init
MPI_wtime
MPI_COMM_RANK
MPI_COMM_SIZE
MPI_BUFFER_ATTACH
MPI_BSEND
MPI_PACK
MPI_UNPACK
MPI_PROBE
MPI_GET_COUNT
MPI_RECV
MPI_IPROBE
MPI_FINALIZE
where MPI_IPROBE is the clear wi
Could you send us the mpirun cmd line? I wonder if you are missing some
options that could help. Also, you might:
(a) upgrade to 1.3.3 - it looks like you are using some kind of pre-release
version
(b) add -mca mpi_show_mca_params env,file - this will cause rank=0 to output
what mca params it see
On Mon, Aug 3, 2009 at 1:41 PM, Ralph Castain wrote:
> The only thing that changes is the required connectivity. It sounds to me
> like you may have a firewall issue here, where cloud3 is blocking
> connectivity from cloud6, but cloud6 is allowing connectivity from cloud3.
>
> Is there a firewall
Torgny,
We have one know issue in openib btl that it related to IPROBE -
https://svn.open-mpi.org/trac/ompi/ticket/1362
Theoretical it maybe source cause of the performance degradation, but
for me the performance difference sounds too big.
* Do you know what is typical message size for this ap
We've found on certain applications binding to processors can have up to
a 2x difference. ScaliMPI automatically binds processes by socket so if
you are not running a one process per cpu job each process will land on
a different socket.
OMPI defaults to not binding at all. You may want to tr
If the above doesn't improve anything the next question is do you know
what the sizes of the messages are? For very small messages I believe
Scali shows a 2x better performance than Intel and OMPI (I think this
is due to a fastpath optimization).
I remember that mvapich was faster that sca
A comment to the below. I meant the 2x performance was for shared
memory communications.
--td
Message: 3
Date: Wed, 05 Aug 2009 09:55:42 -0400
From: Terry Dontje
Subject: Re: [OMPI users] Performance difference on OpenMPI, IntelMPI
and ScaliMPI
To: us...@open-mpi.org
Message-ID: <4a79
Ralph,
I am running through a locally provided wrapper but it translates to:
/software/mpi/openmpi/1.3b2/i101017/bin/mpirun -np 144 -npernode 8 -mca
mpi_show_mca_params env,file /nobac
kup/rossby11/faxen/RCO_scobi/src_161.openmpi/rco2.24pe
a) Upgrade.. This will take some time, it will have to
Okay, one problem is fairly clear. As Terry indicated, you have to tell us
to bind or else you lose a lot of performace. Set -mca opal_paffinity_alone
1 on your cmd line and it should make a significant difference.
On Wed, Aug 5, 2009 at 8:10 AM, Torgny Faxen wrote:
> Ralph,
> I am running thro
Ralph,
I can't get "opal_paffinity_alone" to work (see below). However, there
is a "mpi_affinity_alone" that I tried without any improvement.
However, setting:
-mca btl_openib_eager_limit 65536
gave a 15% improvement so OpenMPI is now down to 326 (from previous 376
seconds). Still a lot more t
Hi guys,
I'm trying to run an example program, mpi-ring, on a rocks cluster.
When launched via sge with 8 processors (we have 8 procs per node),
the program works fine, but with any more processors and the program
fails.
I'm using open-mpi 1.3.2, included below, at end of post, is output of
However, setting:
-mca btl_openib_eager_limit 65536
gave a 15% improvement so OpenMPI is now down to 326 (from previous
376 seconds). Still a lot more than ScaliMPI with 214 seconds.
Can you please run ibv_devinfo on one of compute nodes ? It is
interesting to know what kind of IB HW you have
I assume it is working with np=8 because the 8 processes are getting
launched on the same node as mpirun and therefore there is no call to
qrsh to start up any remote processes. When you go beyond 8, mpirun
calls qrsh to start up processes on some of the remote nodes.
I would suggest first th
Hi Rolf,
Thanks for answering!
Eli
Here is qstat -t, when I launch it with 8 processors. It looks to me
like it is actually using compute node 8. The mpirun job was submitted
on the head node 'nimbus'
Tried swapping out hostname for mpirun in the job script. For both 8
and 16 processo
16 matches
Mail list logo