Hello,
I have a simple question for the shared memory (sm) module developers of
Open MPI.
In the current implementation, is there any advantage of having shared
cache among processes communicating?
For example let say we have P1 and P2 placed in the same CPU on 2
different physical cores with
At the moment, I believe the answer is the main memory route. We have
a project just starting here (LANL) to implement the cache-level
exchange, but it won't be ready for release for awhile.
On Jun 25, 2009, at 2:39 AM, Simone Pellegrini wrote:
Hello,
I have a simple question for the share
Ralph Castain wrote:
At the moment, I believe the answer is the main memory route. We have
a project just starting here (LANL) to implement the cache-level
exchange, but it won't be ready for release for awhile.
Interesting, actually I am a PhD student and my topic is optimization of
MPI applic
FWIW: there's also work going on to use direct process-to-process
copies (vs. using shared memory bounce buffers). Various MPI
implementations have had this technology for a while (e.g., QLogic's
PSM-based MPI); the Open-MX guys are publishing the knem open source
kernel module for this pu
Doesn't that still pull the message off-socket? I thought it went
through the kernel for that method, which means moving it to main
memory.
On Jun 25, 2009, at 6:49 AM, Jeff Squyres wrote:
FWIW: there's also work going on to use direct process-to-process
copies (vs. using shared memory bo
On Jun 25, 2009, at 9:12 AM, Ralph Castain wrote:
Doesn't that still pull the message off-socket? I thought it went
through the kernel for that method, which means moving it to main
memory.
It may or may not.
Sorry -- let me clarify: I was just pointing out other on-node/memory-
based work
When using OpenMPI and nwchem standalone (mpirun --byslot --mca btl
self,sm,tcp --mca btl_base_verbose 30 --mca btl_tcp_if_exclude lo,eth1
$NWCHEM h2o.nw > & h2o.nwo.$$) the job runs fine.
When running the same job via the PBSPro scheduler I get errors. The PBS
script is called nwrun and is run
Is it correct to assume that, when one is configuring openmpi v1.3.2 and if
one leaves out the
--with-openib=/dir
from the ./configure command line, that InfiniBand support will NOT be built
into openmpi v1.3.2? Then, if an Ethernet network is present that connects
all the nodes, openmpi will us
It's not reproducible, but I sometimes see messages like
[node01:29645] MX BTL delete procs
running 1.3.1 with Open-MX and the MX BTL. Looking at the code, it's a
dummy routine, but I didn't get as far as figuring out why it's
(sometimes) called and what its significance is. Can someone expl
As a follow up, the problem was with host name resolution. The error was
introduced, with a change to the Rocks environment, which broke reverse
lookups for host names.
--
Ray Muno
On Jun 25, 2009, at 1:02 PM, Dave Love wrote:
Also, Brice Goglin, the Open-MX author had a couple of questions
concerning multi-rail MX while I'm on:
1. Does the MX MTL work with multi-rail?
I believe the answer is yes as long as all NICs are in the same fabric
(they usually are).
2. "Yo
Dear All,
I have been encountering a fatal type "error polling LP CQ with status RETRY
EXCEEDED ERROR status number 12" whenever I try to run a simple MPI code (see
below) that performs an AlltoAll call.
We are running the OpenMPI 1.3.2 stack on top of the OFED 1.4.1 stack. Our
cluster is comp
While using the BLACS test programs, I've seen that with recent SVN checkouts
(including todays) the MPI_Abort test left procs running. The last SVN I
have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
Works O.K. in the standard 1.3.2 release.
A test program is below. GCC was used.
Using what launch environment?
On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
While using the BLACS test programs, I've seen that with recent SVN
checkouts
(including todays) the MPI_Abort test left procs running. The last
SVN I
have where it worked was 1.4a1r20936. By 1.4a1r21246 it fail
Something like:
#!/bin/ksh
set -x
PREFIX=$OPENMPI_GCC_SVN
export PATH=$OPENMPI_GCC_SVN/bin:$PATH
MCA="--mca btl tcp,self"
mpicc -g -O6 mpiabort.c
NPROCS=4
mpirun --prefix $PREFIX -x LD_LIBRARY_PATH $MCA -np $NPROCS -machinefile fred
./a.out
DM
On Thu, 25 Jun 2009, Ralph Castain wrote:
Using
Sorry - should have been more clear. Are you using rsh, qrsh (i.e.,
SGE), SLURM, Torque, ?
On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:
Something like:
#!/bin/ksh
set -x
PREFIX=$OPENMPI_GCC_SVN
export PATH=$OPENMPI_GCC_SVN/bin:$PATH
MCA="--mca btl tcp,self"
mpicc -g -O6 mpiabort.c
N
Just local machine - direct from the command line wth a script like
the one below. So, no launch mechanism.
Fails on SUSE Linux Enterprise Server 10 (x86_64) - SP2 and
Fedora release 10 (Cambridge), for example.
DM
On Thu, 25 Jun 2009, Ralph Castain wrote:
Sorry - should have been more clear.
On Jun 25, 2009, at 13:17 , Scott Atchley wrote:
On Jun 25, 2009, at 1:02 PM, Dave Love wrote:
Also, Brice Goglin, the Open-MX author had a couple of questions
concerning multi-rail MX while I'm on:
1. Does the MX MTL work with multi-rail?
I believe the answer is yes as long as all NICs ar
Hi Jim, list
1) Your first question:
I opened a thread on this list two months or so ago about a similar
situation: when OpenMPI would use/not use libnuma.
I asked a question very similar to your question about IB support,
and how the configure script would provide it or not.
Jeff answerer it, a
On Jun 25, 2009, at 12:53 PM, Jim Kress wrote:
Is it correct to assume that, when one is configuring openmpi v1.3.2
and if
one leaves out the
--with-openib=/dir
from the ./configure command line, that InfiniBand support will NOT
be built
into openmpi v1.3.2? Then, if an Ethernet network i
This thread diverged quite a bit into Open MPI configuration and build
issues -- did the original question get answered?
On Jun 24, 2009, at 8:18 PM, Jim Kress ORG wrote:
> Have you investigated Jeff's question on whether the code was
> compiled/linked with the same OpenMPI version (1.3.2)?
21 matches
Mail list logo