No clarification necessary. Standard is not user guide. Semantics are clear
from what is defined. Users who don't like the interface can write a
library that does what they want.
Jeff
On Thursday, February 11, 2016, Nathan Hjelm wrote:
>
> I should also say that I think this is something that m
Indeed, I ran with MPICH. But I like OpenMPI's choice better here, which is
why I said that I would explicitly set the pointer to bull when size is
zero.
Jeff
On Thursday, February 11, 2016, Nathan Hjelm wrote:
>
> Jeff probably ran with MPICH. Open MPI's are consistent with our choice
> of def
I should also say that I think this is something that may be worth
clarifying in the standard. Either semantic is fine with me but there is
no reason to change the behavior if it does not violate the standard.
-Nathan
On Thu, Feb 11, 2016 at 01:35:28PM -0700, Nathan Hjelm wrote:
>
> Jeff probab
Jeff probably ran with MPICH. Open MPI's are consistent with our choice
of definition for size=0:
query: me=1, them=0, size=0, disp=1, base=0x0
query: me=1, them=1, size=4, disp=1, base=0x1097e30f8
query: me=1, them=2, size=4, disp=1, base=0x1097e30fc
query: me=1, them=3, size=4, disp=1, base=0x1
You can be right semantically. But also the sentence "the first address in the
memory segment of process i is consecutive with the last address in the memory
segment of process i - 1" is not easy to interpret correctly for a zero size
segment.
There may be good reasons not to allocate the point
Thanks Jeff, that was an interesting result. The pointers are here well
defined, also for the zero size segment.
However I can't reproduce your output. I still get null pointers (output
below).
(I tried both 1.8.5 and 1.10.2 versions)
What could be the difference?
Peter
mpirun -np 4 a.out
See attached. Output below. Note that the base you get for ranks 0 and 1
is the same, so you need to use the fact that size=0 at rank=0 to know not
to dereference that pointer and expect to be writing into rank 0's memory,
since you will write into rank 1's.
I would probably add "if (size==0) ba
On Thu, Feb 11, 2016 at 8:46 AM, Nathan Hjelm wrote:
>
>
> On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote:
> >I would add that the present situation is bound to give problems for
some
> >users.
> >It is natural to divide an array in segments, each process treating
its
> >
On Thu, Feb 11, 2016 at 02:17:40PM +, Peter Wind wrote:
>I would add that the present situation is bound to give problems for some
>users.
>It is natural to divide an array in segments, each process treating its
>own segment, but needing to read adjacent segments too.
>MPI_
I would add that the present situation is bound to give problems for some
users.
It is natural to divide an array in segments, each process treating its own
segment, but needing to read adjacent segments too.
MPI_Win_allocate_shared seems to be designed for this.
This will work fine as long a
Yes, that is what I meant.
Enclosed is a C example.
The point is that the code would logically make sense for task 0, but since it
asks for a segment of size=0, it only gets a null pointer, which cannot be used
to access the shared parts.
Peter
- Original Message -
> I think Peter
I think Peter's point is that if
- the windows uses contiguous memory
*and*
- all tasks knows how much memory was allocated by all other tasks in the
window
then it could/should be possible to get rid of MPI_Win_shared_query
that is likely true if no task allocates zero byte.
now, if a task alloca
On Wed, Feb 10, 2016 at 8:44 AM, Peter Wind wrote:
> I agree that in practice the best practice would be to use
> Win_shared_query.
>
> Still I am confused by this part in the documentation:
> "The allocated memory is contiguous across process ranks unless the info
> key *alloc_shared_noncontig*
I agree that in practice the best practice would be to use Win_shared_query.
Still I am confused by this part in the documentation:
"The allocated memory is contiguous across process ranks unless the info key
alloc_shared_noncontig is specified. Contiguous across process ranks means that
the f
I don't know about bulletproof, but Win_shared_query is the *only* valid
way to get the addresses of memory in other processes associated with a
window.
The default for Win_allocate_shared is contiguous memory, but it can and
likely will be mapped differently into each process, in which case only
Peter,
The bulletproof way is to use MPI_Win_shared_query after
MPI_Win_allocate_shared.
I do not know if current behavior is a bug or a feature...
Cheers,
Gilles
On Wednesday, February 10, 2016, Peter Wind wrote:
> Hi,
>
> Under fortran, MPI_Win_allocate_shared is called with a window size o
Sorry for that, here is the attachement!
Peter
- Original Message -
> Peter --
>
> Somewhere along the way, your attachment got lost. Could you re-send?
>
> Thanks.
>
>
> > On Feb 10, 2016, at 5:56 AM, Peter Wind wrote:
> >
> > Hi,
> >
> > Under fortran, MPI_Win_allocate_shared is
- Original Message -
> Peter --
>
> Somewhere along the way, your attachment got lost. Could you re-send?
>
> Thanks.
>
>
> > On Feb 10, 2016, at 5:56 AM, Peter Wind wrote:
> >
> > Hi,
> >
> > Under fortran, MPI_Win_allocate_shared is called with a window size of zero
> > for some
Peter --
Somewhere along the way, your attachment got lost. Could you re-send?
Thanks.
> On Feb 10, 2016, at 5:56 AM, Peter Wind wrote:
>
> Hi,
>
> Under fortran, MPI_Win_allocate_shared is called with a window size of zero
> for some processes.
> The output pointer is then not valid for t
Hi,
Under fortran, MPI_Win_allocate_shared is called with a window size of zero for
some processes.
The output pointer is then not valid for these processes (null pointer).
Did I understood this wrongly? shouldn't the pointers be contiguous, so that
for a zero sized window, the pointer should po
Peter,
a patch is available at
https://github.com/ggouaillardet/ompi-release/commit/0b62eabcae403b95274ce55973a7ce29483d0c98.patch
it is now under review
Cheers,
Gilles
On 2/2/2016 11:22 PM, Gilles Gouaillardet wrote:
Thanks Peter,
this is just a workaround for a bug we just identified, t
Thanks Peter,
this is just a workaround for a bug we just identified, the fix will come
soon
Cheers,
Gilles
On Tuesday, February 2, 2016, Peter Wind wrote:
> That worked!
>
> i.e with the changed you proposed the code gives the right result.
>
> That was efficient work, thank you Gilles :)
>
That worked!
i.e with the changed you proposed the code gives the right result.
That was efficient work, thank you Gilles :)
Best wishes,
Peter
- Original Message -
> Thanks Peter,
> that is quite unexpected ...
> let s try an other workaround, can you replace
> integer
Thanks Peter,
that is quite unexpected ...
let s try an other workaround, can you replace
integer:: comm_group
with
integer:: comm_group, comm_tmp
and
call MPI_COMM_SPLIT(comm, irank*2/num_procs, irank, comm_group, ierr)
with
call MPI_COMM_SPLIT(comm, irank*2/num
Thanks Gilles,
I get the following output (I guess it is not what you wanted?).
Peter
$ mpirun --mca osc pt2pt -np 4 a.out
--
A requested component was not found, or was unable to be opened. This
means that this compon
Peter,
at first glance, your test program looks correct.
can you please try to run
mpirun --mca osc pt2pt -np 4 ...
I might have identified a bug with the sm osc component.
Cheers,
Gilles
On Tuesday, February 2, 2016, Peter Wind wrote:
> Enclosed is a short (< 100 lines) fortran code examp
Enclosed is a short (< 100 lines) fortran code example that uses shared memory.
It seems to me it behaves wrongly if openmpi is used.
Compiled with SGI/mpt , it gives the right result.
To fail, the code must be run on a single node.
It creates two groups of 2 processes each. Within each group mem
Cristian,
one more thing...
two containers on the same host cannot communicate with the sm btl.
you might want to mpirun with --mca btl tcp,self on one physical machine
without container,
in order to asses the performance degradation due to using tcp btl and
without any containerization effect.
C
Dear Cristian,
according to your configuration:
a) - 8 Linux containers on the same machine configured with 2 cores
b) - 8 physical machines
c) - 1 physical machine
a) and c) have exactly the same physical computational resources
despite the fact that a) is being virtualized and the
Hello Cristian,
TAU is still under active development and the developers respond fairly
fast to emails. The latest version, 2.24.1, came out just two months
ago. Check out https://www.cs.uoregon.edu/research/tau/home.php for more
information.
If you are running in to issues getting the lates
Hi Christian, list
I haven't been following the shared memory details of OMPI lately,
but my recollection from some time ago is that in the 1.8 series the
default (and recommended) shared memory transport btl switched from
"sm" to "vader", which is the latest greatest.
In this case, I guess the
Thank you for your answer Harald
Actually I was already using TAU before but it seems that it is not
maintained any more and there are problems when instrumenting
applications with the version 1.8.5 of OpenMPI.
I was using the OpenMPI 1.6.5 before to test the execution of HPC
application on
Christian,
one explanation could be that the benchmark is memory bound, so running
on more sockets means higher memory bandwidth means better performance.
an other explanation is that on one node, you are running one openmp
thread per mpi task, and on 8 nodes, you are running 8 openmp threads
Cristian,
you might observe super-speedup heres because in 8 nodes you have 8
times the cache you have in only 1 node. You can also validate that by
checking for cache miss activity using the tools that I mentioned in my
other email.
Best regards.
On 22/07/15 09:42, Crisitan RUIZ wrote:
Sorry, I've just discovered that I was using the wrong command to run on
8 machines. I have to get rid of the "-np 8"
So, I corrected the command and I used:
mpirun --machinefile machine_mpi_bug.txt --mca btl self,sm,tcp
--allow-run-as-root mg.C.8
And got these results
8 cores:
Mop/s total
Dear Cristian,
as you probably know C class is one of the large classes for the NAS
benchmarks. That is likely to mean that the application is taking much
more time to do the actual computation rather than communication. This
could explain why you see this little difference between the two
Hello,
I'm running OpenMPI 1.8.5 on a cluster with the following characteristics:
Each node is equipped with two Intel Xeon E5-2630v3 processors (with 8
cores each), 128 GB of RAM and a 10 Gigabit Ethernet adapter.
When I run the NAS benchmarks using 8 cores in the same machine, I'm
gettin
On 05/23/2012 03:05 PM, Jeff Squyres wrote:
On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote:
If process A sends a message to process B and the eager protocol is used then I
assume that the message is written into a shared memory area and picked up by
the receiver when the receive operati
On May 23, 2012, at 7:05 AM, Jeff Squyres wrote:
> On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote:
>
>>> If process A sends a message to process B and the eager protocol is used
>>> then I assume that the message is written into a shared memory area and
>>> picked up by the receiver when
On May 23, 2012, at 6:05 AM, Simone Pellegrini wrote:
>> If process A sends a message to process B and the eager protocol is used
>> then I assume that the message is written into a shared memory area and
>> picked up by the receiver when the receive operation is posted.
Open MPI has a few dif
I think I found the answer to my question on Jeff Squyres blog:
http://blogs.cisco.com/performance/shared-memory-as-an-mpi-transport-part-2/
However now I have a new question, how do I know if my machine uses the
copyin/copyout mechanism or the direct mapping?
Assuming that I am running on Op
Dear all,
I would like to have a confirmation on the assumptions I have on how
OpenMPI implements the rendezvous protocol for shared memory.
If process A sends a message to process B and the eager protocol is used
then I assume that the message is written into a shared memory area and
picked
Hi,
I am trying to implement the following collectives in MPI
sharedmemory, Alltoall, Broadcast, Reduce with zero copy
optimizations.So for Reduce, my compiler allocates all the send
buffers in sharedmemory (mmap anonymous), and allocates only one
receive buffer againin shared memory. Then all the
All the shared memory code is in the "sm" BTL (byte transfer layer) component:
ompi/mca/btl/sm. All the TCP MPI code is in the "tcp" BTL component:
ompi/mca/btl/tcp. Think of "ob1" as the MPI engine that is the bottom of
MPI_SEND, MPI_RECV, and friends. It takes a message to be sent, determin
Thanks a lot Jeff.
PIN is a dynamic binary instrumentation tool from Intel. It runs on top of
the Binary in the MPI node. When its given function calls to instrument, it
will insert trappings before/after that funtion call in the binary of the
program you are instrumenting and you can insert your
On Nov 22, 2011, at 1:09 AM, Shamik Ganguly wrote:
> I want to trace when the MPI library prevents an MPI_Send from going to the
> socket and makes it access shared memory because the target node is on the
> same chip (CMP). I want to use PIN to trace this. Can you please give me some
> pointe
Hi,
I want to trace when the MPI library prevents an MPI_Send from going to
the socket and makes it access shared memory because the target node is on
the same chip (CMP). I want to use PIN to trace this. Can you please give
me some pointers about which functions are taking this decision so that
I'm afraid this isn't correct. You definitely don't want the session directory
in /dev/shm as this will almost always cause problems.
We look thru a progression of envars to find where to put the session directory:
1. the MCA param orte_tmpdir_base
2. the envar OMPI_PREFIX_ENV
3. the envar TMP
Since /tmp is mounted across a network and /dev/shm is (always) local,
/dev/shm seems to be the right place for shared memory transactions.
If you create temporary files using mktemp is it being created in
/dev/shm or /tmp?
On Thu, Nov 3, 2011 at 11:50 AM, Bogdan Costescu wrote:
> On Thu, Nov 3,
On Thu, Nov 3, 2011 at 15:54, Blosch, Edwin L wrote:
> - /dev/shm is 12 GB and has 755 permissions
> ...
> % ls –l output:
>
> drwxr-xr-x 2 root root 40 Oct 28 09:14 shm
This is your problem: it should be something like drwxrwxrwt. It might
depend on the distribution, f.e. the followi
On Nov 3, 2011, at 8:54 AM, Blosch, Edwin L wrote:
> Can anyone guess what the problem is here? I was under the impression that
> OpenMPI (1.4.4) would look for /tmp and would create its shared-memory
> backing file there, i.e. if you don’t set orte_tmpdir_base to anything.
That is correct
>
Can anyone guess what the problem is here? I was under the impression that
OpenMPI (1.4.4) would look for /tmp and would create its shared-memory backing
file there, i.e. if you don't set orte_tmpdir_base to anything.
Well, there IS a /tmp and yet it appears that OpenMPI has chosen to use
/dev
On 3/30/2011 10:08 AM, Eugene Loh wrote:
Michele Marena wrote:
I've launched my app with mpiP both when two processes are on
different node and when two processes are on the same node.
The process 0 is the manager (gathers the results only), processes 1
and 2 are workers (compute).
This is the
Michele Marena wrote:
I've launched my app with mpiP both when two processes are
on different node and when two processes are on the same node.
The process 0 is the manager (gathers the results only),
processes 1 and 2 are workers (compute).
This is the case processes 1 and 2 a
Hi Jeff,
I thank you for your help,
I've launched my app with mpiP both when two processes are on different node
and when two processes are on the same node.
The process 0 is the manager (gathers the results only), processes 1 and 2
are workers (compute).
This is the case processes 1 and 2 are o
How many messages are you sending, and how large are they? I.e., if your
message passing is tiny, then the network transport may not be the bottleneck
here.
On Mar 28, 2011, at 9:41 AM, Michele Marena wrote:
> I run ompi_info --param btl sm and this is the output
>
> MCA btl
I run ompi_info --param btl sm and this is the output
MCA btl: parameter "btl_base_debug" (current value: "0")
If btl_base_debug is 1 standard debug is output,
if > 1 verbose debug is output
MCA btl: parameter "btl" (current value: )
The fact that this exactly matches the time you measured with shared memory is
suspicious. My guess is that you aren't actually using shared memory at all.
Does your "ompi_info" output show shared memory as being available? Jeff or
others may be able to give you some params that would let you ch
What happens with 2 processes on the same node with tcp?
With --mca btl self,tcp my app runs in 23s.
2011/3/28 Jeff Squyres (jsquyres)
> Ah, I didn't catch before that there were more variables than just tcp vs.
> shmem.
>
> What happens with 2 processes on the same node with tcp?
>
> Eg, when b
On 3/28/2011 3:29 AM, Michele Marena wrote:
Each node have two processors (no dual-core).
which seems to imply that the 2 processors share memory space and a
single memory buss, and the question is not about what I originally guessed.
--
Tim Prince
On 3/28/2011 3:44 AM, Jeff Squyres (jsquyres) wrote:
Ah, I didn't catch before that there were more variables than just tcp
vs. shmem.
What happens with 2 processes on the same node with tcp?
Eg, when both procs are on the same node, are you thrashing caches or
memory?
In fact, I made the gues
Ah, I didn't catch before that there were more variables than just tcp vs.
shmem.
What happens with 2 processes on the same node with tcp?
Eg, when both procs are on the same node, are you thrashing caches or memory?
Sent from my phone. No type good.
On Mar 28, 2011, at 6:27 AM, "Michele Mar
Each node have two processors (no dual-core).
2011/3/28 Michele Marena
> However, I thank you Tim, Ralh and Jeff.
> My sequential application runs in 24s (wall clock time).
> My parallel application runs in 13s with two processes on different nodes.
> With shared memory, when two processes are o
However, I thank you Tim, Ralh and Jeff.
My sequential application runs in 24s (wall clock time).
My parallel application runs in 13s with two processes on different nodes.
With shared memory, when two processes are on the same node, my app runs in
23s.
I'm not understand why.
2011/3/28 Jeff Squyr
If your program runs faster across 3 processes, 2 of which are local to each
other, with --mca btl tcp,self compared to --mca btl tcp,sm,self, then
something is very, very strange.
Tim cites all kinds of things that can cause slowdowns, but it's still very,
very odd that simply enabling using t
On Mar 27, 2011, at 7:37 AM, Tim Prince wrote:
> On 3/27/2011 2:26 AM, Michele Marena wrote:
>> Hi,
>> My application performs good without shared memory utilization, but with
>> shared memory I get performance worst than without of it.
>> Do I make a mistake? Don't I pay attention to something?
On 3/27/2011 2:26 AM, Michele Marena wrote:
Hi,
My application performs good without shared memory utilization, but with
shared memory I get performance worst than without of it.
Do I make a mistake? Don't I pay attention to something?
I know OpenMPI uses /tmp directory to allocate shared memory
This is my machinefile
node-1-16 slots=2
node-1-17 slots=2
node-1-18 slots=2
node-1-19 slots=2
node-1-20 slots=2
node-1-21 slots=2
node-1-22 slots=2
node-1-23 slots=2
Each cluster node has 2 processors. I launch my application with 3
processes, one on node-1-16 (manager) and two on node-1-17(worke
Hi,
My application performs good without shared memory utilization, but with
shared memory I get performance worst than without of it.
Do I make a mistake? Don't I pay attention to something?
I know OpenMPI uses /tmp directory to allocate shared memory and it is in
the local filesystem.
I thank yo
Yes, It works fine without shared memory. I thank you Ralph. I will check
the code for logical mistakes, otherwise I choose the option suggested by
you.
2011/3/26 Ralph Castain
> Your other option is to simply not use shared memory. TCP contains loopback
> support, so you can run with just
>
> -
Your other option is to simply not use shared memory. TCP contains loopback
support, so you can run with just
-mca btl self,tcp
and shared memory won't be used. It will run a tad slower that way, but at
least your app will complete.
On Mar 26, 2011, at 2:30 PM, Reuti wrote:
> Am 26.03.2011 u
Am 26.03.2011 um 21:16 schrieb Michele Marena:
> No, I can't. I'm not a administrator of the cluster and I'm not the owner.
You can recompile your private version of Open MPI and install it in
$HOME/local/openmpi-1.4.3 or alike and set paths accordingly during compilation
of your source and exe
No, I can't. I'm not a administrator of the cluster and I'm not the owner.
2011/3/26 Ralph Castain
> Can you update to a more recent version? That version is several years
> out-of-date - we don't even really support it any more.
>
>
> On Mar 26, 2011, at 1:04 PM, Michele Marena wrote:
>
> Yes,
Can you update to a more recent version? That version is several years
out-of-date - we don't even really support it any more.
On Mar 26, 2011, at 1:04 PM, Michele Marena wrote:
> Yes, the syntax is wrong in the email, but I write it correctly when I launch
> mpirun. When some communicating pr
Yes, the syntax is wrong in the email, but I write it correctly when I
launch mpirun. When some communicating processes are on the same node the
application don't terminate, otherwise the application terminate and its
results are correct. My OpenMPI version is 1.2.7.
2011/3/26 Ralph Castain
>
>
On Mar 26, 2011, at 11:34 AM, Michele Marena wrote:
> Hi,
> I've a problem with shared memory. When my application runs using pure
> message passing (one process for node), it terminates and returns correct
> results. When 2 processes share a node and use shared memory for exchanges
> messages
Hi,
I've a problem with shared memory. When my application runs using pure
message passing (one process for node), it terminates and returns correct
results. When 2 processes share a node and use shared memory for exchanges
messages, my application don't terminate. At shell I write "mpirun -nolocal
/ MS P963 -- 2455 South Road -- Poughkeepsie, NY 12601
Tele (845) 433-7846 Fax (845) 433-8363
From:
Andrei Fokau
To:
Open MPI Users
List-Post: users@lists.open-mpi.org
Date:
10/06/2010 10:12 AM
Subject:
Re: [OMPI users] Shared memory
Sent by:
users-boun...@open-mpi.org
Currently we
Currently we run a code on a cluster with distributed memory, and this code
needs a lot of memory. Part of the data stored in memory is the same for
each process, but it is organized as one array - we can split it if
necessary. So far no magic occurred for us. What do we need to do to make
the magi
Open MPI will use shared memory to communicate between peers on the sane node -
but that's hidden beneath the covers; it's not exposed via the MPI API. You
just MPI-send and magic occurs and the receiver gets the message.
Sent from my PDA. No type good.
On Oct 4, 2010, at 11:13 AM, "Andrei Fo
Does OMPI have shared memory capabilities (as it is mentioned in MPI-2)?
How can I use them?
Andrei
On Sat, Sep 25, 2010 at 23:19, Andrei Fokau wrote:
> Here are some more details about our problem. We use a dozen of 4-processor
> nodes with 8 GB memory on each node. The code we run needs about
Here are some more details about our problem. We use a dozen of 4-processor
nodes with 8 GB memory on each node. The code we run needs about 3 GB per
processor, so we can load only 2 processors out of 4. The vast majority of
those 3 GB is the same for each processor and is accessed continuously
dur
I think the 'middle ground' approach can be simplified even further if
the data file is in a shared device (e.g. NFS/Samba mount) that can be
mounted at the same location of the file system tree on all nodes. I
have never tried it, though and mmap()'ing a non-POSIX compliant file
system such as Sam
It seems to me there are two extremes.
One is that you replicate the data for each process. This has the
disadvantage of consuming lots of memory "unnecessarily."
Another extreme is that shared data is distributed over all processes.
This has the disadvantage of making at least some of the
The data are read from a file and processed before calculations begin, so I
think that mapping will not work in our case.
Global Arrays look promising indeed. As I said, we need to put just a part
of data to the shared section. John, do you (or may be other users) have an
experience of working wit
Am 24.09.2010 um 13:26 schrieb John Hearns:
> On 24 September 2010 08:46, Andrei Fokau wrote:
>> We use a C-program which consumes a lot of memory per process (up to few
>> GB), 99% of the data being the same for each process. So for us it would be
>> quite reasonable to put that part of data in
On 24 September 2010 08:46, Andrei Fokau wrote:
> We use a C-program which consumes a lot of memory per process (up to few
> GB), 99% of the data being the same for each process. So for us it would be
> quite reasonable to put that part of data in a shared memory.
http://www.emsl.pnl.gov/docs/glo
Is the data coming from a read-only file? In that case, a better way
might be to memory map that file in the root process and share the map
pointer in all the slave threads. This, like shared memory, will work
only for processes within a node, of course.
On Fri, Sep 24, 2010 at 3:46 AM, Andrei Fo
We use a C-program which consumes a lot of memory per process (up to few
GB), 99% of the data being the same for each process. So for us it would be
quite reasonable to put that part of data in a shared memory.
In the source code, the memory is allocated via malloc() function. What
would it requir
Thanks, that explains it :)
On Tue, Jan 19, 2010 at 15:01, Ralph Castain wrote:
> Shared memory doesn't extend between comm_spawn'd parent/child processes in
> Open MPI. Perhaps someday it will, but not yet.
>
>
> On Jan 19, 2010, at 1:14 PM, Nicolas Bock wrote:
>
> Hello list,
>
> I think I und
Shared memory doesn't extend between comm_spawn'd parent/child processes in
Open MPI. Perhaps someday it will, but not yet.
On Jan 19, 2010, at 1:14 PM, Nicolas Bock wrote:
> Hello list,
>
> I think I understand better now what's happening, although I still don't know
> why. I have attached t
Hello list,
I think I understand better now what's happening, although I still don't
know why. I have attached two small C codes that demonstrate the problem.
The code in main.c uses MPI_Comm_spawn() to start the code in the second
source, child.c. I can force the issue by running the main.c code
Dunno. Do lower np values succeed? If so, at what value of np does
the job no longer start?
Perhaps it's having a hard time creating the shared-memory backing file
in /tmp. I think this is a 64-Mbyte file. If this is the case, try
reducing the size of the shared area per this FAQ item:
ht
Sorry, I forgot to give more details on what versions I am using:
OpenMPI 1.4
Ubuntu 9.10, kernel 2.6.31-16-generic #53-Ubuntu
gcc (Ubuntu 4.4.1-4ubuntu8) 4.4.1
On Fri, Jan 15, 2010 at 15:47, Nicolas Bock wrote:
> Hello list,
>
> I am running a job on a 4 quadcore AMD Opteron. This machine ha
Hello list,
I am running a job on a 4 quadcore AMD Opteron. This machine has 16 cores,
which I can verify by looking at /proc/cpuinfo. However, when I run a job
with
mpirun -np 16 -mca btl self,sm job
I get this error:
--
A
On Jun 25, 2009, at 9:12 AM, Ralph Castain wrote:
Doesn't that still pull the message off-socket? I thought it went
through the kernel for that method, which means moving it to main
memory.
It may or may not.
Sorry -- let me clarify: I was just pointing out other on-node/memory-
based work
Doesn't that still pull the message off-socket? I thought it went
through the kernel for that method, which means moving it to main
memory.
On Jun 25, 2009, at 6:49 AM, Jeff Squyres wrote:
FWIW: there's also work going on to use direct process-to-process
copies (vs. using shared memory bo
FWIW: there's also work going on to use direct process-to-process
copies (vs. using shared memory bounce buffers). Various MPI
implementations have had this technology for a while (e.g., QLogic's
PSM-based MPI); the Open-MX guys are publishing the knem open source
kernel module for this pu
Ralph Castain wrote:
At the moment, I believe the answer is the main memory route. We have
a project just starting here (LANL) to implement the cache-level
exchange, but it won't be ready for release for awhile.
Interesting, actually I am a PhD student and my topic is optimization of
MPI applic
At the moment, I believe the answer is the main memory route. We have
a project just starting here (LANL) to implement the cache-level
exchange, but it won't be ready for release for awhile.
On Jun 25, 2009, at 2:39 AM, Simone Pellegrini wrote:
Hello,
I have a simple question for the share
1 - 100 of 110 matches
Mail list logo