Thanking you all very much for the reply.
I would request to have some reference about what Tim Prince & Andreas has
said.
Tim said that OpenMPI has had effective shared memory message passing. Is
that anything to do with --enable-MPI-threads switch while installing
OpeMPI?
regards,
AA
For MPI_Comm_split, all processes in the input communicator (oldcomm
or MPI_COMM_WORLD in your case) must call the operation since it is
collective over the input communicator. In your program rank 0 is not
calling the operation, so MPI_Comm_split is waiting for it to
participate.
If you want rank
On Dec 12, 2011, at 9:45 AM, Josh Hursey wrote:
> For MPI_Comm_split, all processes in the input communicator (oldcomm
> or MPI_COMM_WORLD in your case) must call the operation since it is
> collective over the input communicator. In your program rank 0 is not
> calling the operation, so MPI_Comm_
I've got a strange problem with Fortran90 and MPI_BCAST call on a large
application. I've isolated the problem in this short program samples.
With fortran we can use subarrays in functions calls. Example, with passing a
subarray to the "change" procedure:
MODULE mymod
IMPLICIT NONE
CONTAINS
S
On Sat, Dec 10, 2011 at 3:21 PM, amjad ali wrote:
> (2) The latest MPI implementations are intelligent enough that they use some
> efficient mechanism while executing MPI based codes on shared memory
> (multicore) machines. (please tell me any reference to quote this fact).
Not an academic paper
I think this is a *great* topic for discussion, so let me throw some
fuel to the fire: the mechanism described in the blog (that makes
perfect sense) is fine for (N)UMA shared memory architectures. But
will it work for asymmetric architectures such as the Cell BE or
discrete GPUs where the data bet
On Dec 12, 2011, at 9:45 AM, Josh Hursey wrote:
For MPI_Comm_split, all processes in the input communicator (oldcomm
or MPI_COMM_WORLD in your case) must call the operation since it is
collective over the input communicator. In your program rank 0 is not
calling the operation, so MPI_Comm_sp
Hi Patrick
I think tab(i,:) is not contiguous in memory, but has a stride of nbcpus.
Since the MPI type you are passing is just the barebones MPI_INTEGER,
MPI_BCAST expects the four integers to be contiguous in memory, I guess.
The MPI calls don't have any idea of the Fortran90 memory layout,
and
On Dec 12, 2011, at 8:42 AM, amjad ali wrote:
> Thanking you all very much for the reply.
>
> I would request to have some reference about what Tim Prince & Andreas has
> said.
>
> Tim said that OpenMPI has had effective shared memory message passing. Is
> that anything to do with --enable
What FORTRAN compiler are you using? This should not really be an issue
with the MPI implementation, but with the FORTRAN. This is legitimate
usage in FORTRAN 90 and the compiler should deal with it. I do similar
things using ifort and it creates temporary arrays when necessary and it
all works
Hello,
We are running a cluster that has a good number of older nodes with
Mellanox IB HCAs that have the "mthca" device name ("ib_mthca" kernel
module).
These adapters are all at firmware level 4.8.917 .
The Open MPI in use is 1.5.3 , kernel 2.6.39 , x86-64. Jobs are
launched/managed using Slu
The interface to MPI_Bcast does not specify a assumed-shape-array dummy
first argument. Consequently, as David points out, the compiler makes a
contiguous temporary copy of the array section to pass to the routine. If
using ifort, try the "-check arg_temp_created" compiler option to verify
creation
I have multiple GPUs on a node in my cluster and am trying to run some
benchmarks on the system. However, since my department is in research
and has a job system set up, I can only take one GPU offline to test
until I are sure I know what I are doing. My problem is trying to set up
the mpirun
Hi Erin,
uhm, I don't think this is related to MPI as MPI is completely
orthogonal to GPU programming. MPI doesn't even know about GPUs. Just
select the GPU like you weren't using MPI at all.
HTH
-Andreas
On 14:44 Mon 12 Dec , Erin Rasmussen wrote:
>
> I have multiple GPUs on a node in my
I'm working with a cluster that has both CPUs and GPUs, and I'm trying
to run the High Performance Linpack benchmark on it. Before I can do a
full system run, I have to figure out how to get the benchmark to run on
both GPUs and CPUs at the same time. I have HPL working fine with
openmpi using
On 14:53 Mon 12 Dec , Erin Rasmussen wrote:
> I have HPL working fine with openmpi using multiple nodes, but now
> I'm trying to use it on our system with multiple nodes with CPUs and
> GPUs.
So this is an inquiry related to hpl, not Open MPI. Anyways, which
version of HPL are you using and ho
This is a benchmarking project for the next couple of months because
we're building our GPU system at the moment. Right now I'm using HPL 2.0
from the site http://netlib.org/benchmark/hpl on our CPU only cluster. I
have the CUDA version from Nvidia downloaded for when our system is
running thou
On 15:10 Mon 12 Dec , Erin Rasmussen wrote:
> I just need to have a pretty good idea of how to run the
> benchmark with GPUs because we would like to get into the Top 500.
I may be wrong but AFAIK getting into the Top 500 requires meticulous
configuration of all sorts of parameters. If you w
I'm connected with the forums at Nvidia now. Thanks!
On 12/12/2011 03:20 PM, Andreas Schäfer wrote:
On 15:10 Mon 12 Dec , Erin Rasmussen wrote:
I just need to have a pretty good idea of how to run the
benchmark with GPUs because we would like to get into the Top 500.
I may be wrong but AFA
19 matches
Mail list logo