Anna,
The monitoring PML tracks all activity on the PML but might choose to only
expose that one that the user can be interested in, aka its own messages,
and hide the rest of the traffic. This is easy in OMPI because all internal
messages are generated using negative tags (which are not allowed f
I'm not sure if I correctly understand the compiler complaint here, but I
think it is complaining about a non-optional dummy argument being
omitted from the call. In this case, I assume the issue is raised in the
mpif Fortran interface (not the f08 interface), due to the fact that the
error is not
Hi Eli,
I agree with you, keep the checks enabled, and users that want them off can
do it via our MCA parameters (command line or
${HOME}/.openmpi/mca-params.conf).
I don't think it is ever effective to try to save a few branches in MPI
functions that usually cost over a microsecond, and lose all
can be seen as contiguous
> (even if described otherwise)"? In what way could it be described
> otherwise, but still be seen as contiguous?
>
> Thanks,
> Pascal Boeschoten
>
> On Tue, 23 Apr 2024 at 16:05, George Bosilca wrote:
>
>> zero copy does not work with n
zero copy does not work with non-contiguous datatypes (it would require
both processes to know the memory layout used by the peer). As long as the
memory layout described by the type can be seen as contiguous (even if
described otherwise), it should work just fine.
George.
On Tue, Apr 23, 2024
All the examples work for me on using ULFM ge87f595 compiled with
minimalistic options:
'--prefix=XXX --enable-picky --enable-debug --disable-heterogeneous
--enable-contrib-no-build=vt --enable-mpirun-prefix-by-default
--enable-mpi-ext=ftmpi --with-ft=mpi --with-pmi'.
I run using ipoib, so I selec
MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component
>>> v5.0.1)
>>> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.1)
>>> MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component
>>> v5.0.1)
>>>
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.1)
>> MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
>> v5.0.1)
>> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.1)
>> MCA par
OMPI seems unable to create a communication medium between your processes.
There are few known issues on OSX, please read
https://github.com/open-mpi/ompi/issues/12273 for more info.
Can you provide the header of the ompi_info command. What I'm interested on
is the part about `Configure command li
;
> MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD
>
> with errorcode 79.
>
>
> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>
> You may or may not see output from other processes, depending on
>
> exactly when Open MPI kill
does one simply have to always use
> positive numbers? Why I would prefer Abort is because it seems safer.
>
> BR Alex
>
>
> --
> *Von:* George Bosilca
> *Gesendet:* Dienstag, 18. Juli 2023 18:47
> *An:* Open MPI Users
> *Cc:* Alexander Stadi
Alex,
How are your values "random" if you provide correct values ? Even for
negative values you could use MIN to pick one value and return it. What is
the problem with `MPI_Abort` ? it does seem to do what you want.
George.
On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users <
users@li
I can't replicate this on my setting, but I am not using the tar archive
from the OMPI website (I use the git tag). Can you do `ls -l
opal/datatype/.lib` in your build directory.
George.
On Wed, Jul 12, 2023 at 7:14 AM Elad Cohen via users <
users@lists.open-mpi.org> wrote:
> Hi Jeff, thanks f
layer, so I should look to that community / code for quantifying how large
> those buffers get inside my application?
>
> Thanks again, and apologies for what is surely a woeful misuse of the
> correct terminology here on some of this stuff.
>
> - Brian
>
>
> On Mon, Ap
Brian,
OMPI does not have an official mechanism to report how much memory OMPI
allocates. But, there is hope:
1. We have a mechanism to help debug memory issues (OPAL_ENABLE_MEM_DEBUG).
You could enable it and then provide your own flavor of memory tracking in
opal/util/malloc.c
2. You can use a
Edgar is right, UCX_TLS has some role in the selection. You can see the
current selection by running `uxc_info -c`. In my case, UCX_TLS is set to
`all` somehow, and I had either a not-connected IB device or a GPU.
However, I did not set UCX_TLS manually, and I can't see it anywhere in my
system con
ucx PML should work just fine even on a single node scenario. As Jeff
indicated you need to move the MCA param `--mca pml ucx` before your
command.
George.
On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users <
users@lists.open-mpi.org> wrote:
> If this run was on a single node, t
Assuming a correct implementation the described communication pattern
should work seamlessly.
Would it be possible to either share a reproducer or provide the execution
stack by attaching a debugger to the deadlocked application to see the
state of the different processes. I wonder if all processe
This error seems to be initiated from the PMIX regex framework. Not sure
exactly which one is used, but a good starting point is in one of the files
in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex
function in the different components, one of them is raising the error.
George.
mizations on the Neighbor collectives
> at some point?
>
> regards
> Michael
>
> On Wed, Jun 8, 2022 at 1:29 PM George Bosilca wrote:
>
>> Michael,
>>
>> As far as I know none of the implementations of the
>> neighborhood collectives in OMPI are arc
Michael,
As far as I know none of the implementations of the
neighborhood collectives in OMPI are architecture-aware. The only 2
components that provide support for neighborhood collectives are basic (for
the blocking version) and libnbc (for the non-blocking versions).
George.
On Wed, Jun 8,
That is weird, but maybe it is not a deadlock, but a very slow progress. In
the child can you print the fdmax and i in the frame do_child.
George.
On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users <
users@lists.open-mpi.org> wrote:
> Jeff, thanks.
> from 1:
>
> (lldb) process attach --pid 9
40.dylib`opal_fd_read + 52
>
> frame #2: 0x00010784b418
> mca_odls_default.so`odls_default_fork_local_proc
> + 284
>
> frame #3: 0x0001002c7914
> libopen-rte.40.dylib`orte_odls_base_spawn_proc
> + 968
>
> frame #4: 0x0001003d96dc
> libeven
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run
both MPI and non-MPI apps without any issues.
Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a
CTRL+C to get back on the debugger and then `backtrace` to see where it is
waiting.
George.
On Wed, Ma
dylib
>
> "_opal_atomic_wmb", referenced from:
>
> import-atom in libopen-pal.dylib
>
> ld: symbol(s) not found for architecture x86_64
>
> make[2]: *** [opal_wrapper] Error 1
>
> make[1]: *** [all-recursive] Error 1
>
> make: *** [all-recursive] E
1. I am not aware of any outstanding OMPI issues with the M1 chip that
would prevent OMPI from compiling and running efficiently in an M1-based
setup, assuming the compilation chain is working properly.
2. M1 supports x86 code via Rosetta, an app provided by Apple to ensure a
smooth transition fro
Vladimir,
A while back the best cluster monitoring tool was Ganglia (
http://ganglia.sourceforge.net/), but it has not been maintained for
several years. There are quite a few alternatives out there, I found
nightingale (https://github.com/didi/nightingale) to be simple to install
and use.
Good l
Sajid,
`--bind-to-core` should have generated the same warning on OSX. Not sure
why this is happening, but I think the real bug here is the lack of warning
when using the deprecated argument.
Btw, the current master does not even accept 'bind-to-core', instead it
complains about 'unrecognized opt
OMPI cannot support process binding on OSX because, as the message
indicates, there is no OS API for process binding (at least not exposed to
the user-land applications).
George.
On Thu, Mar 17, 2022 at 3:25 PM Sajid Ali via users <
users@lists.open-mpi.org> wrote:
> Hi OpenMPI-developers,
>
I see similar issues on platforms with multiple IP addresses, if some of
them are not fully connected. In general, specifying which interface OMPI
can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem.
George.
On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via users
There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE:
1. it is propagated from one of the underlying point-to-point
communications, which means that at least one of the participants has an
input buffer with a larger size. I know you said the size is fixed, but it
only matters if all proces
via users <
users@lists.open-mpi.org> wrote:
>
>
> On Mon, Feb 14, 2022 at 9:01 PM George Bosilca
> wrote:
>
>> On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users <
>> users@lists.open-mpi.org> wrote:
>>
>>> 1. Where can I use this commun
On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users <
users@lists.open-mpi.org> wrote:
> I've been successful at using MPI_Dist_graph_create_adjacent to create a
> new communicator with graph topology, and using it with
> MPI_Neighbor_alltoallv. But I have a few questions:
>
> 1. Where can I u
l like OSU INAM can get info from network
> fabric and even
>
> switches related to a particular MPI job ...
>
>
> There should be more info gathered in the background
>
>
> ------
> *From:* George Bosilca
> *Sent:* Friday, February
Collecting data during execution is possible in OMPI either with an
external tool, such as mpiP, or the internal infrastructure, SPC. Take a
look at ./examples/spc_example.c or ./test/spc/spc_test.c to see how to use
this.
George.
On Fri, Feb 11, 2022 at 9:43 AM Bertini, Denis Dr. via users <
#include
>int MPI_Type_create_resized(MPI_Datatype oldtype, MPI_Aint lb,
> MPI_Aint extent, MPI_Datatype *newtype)
>
>
> Jonas
> On 16-12-2021 22:39, George Bosilca wrote:
>
> You are confusing the size and extent of the datatype. The size (aka the
>
You are confusing the size and extent of the datatype. The size (aka the
physical number of bytes described by the memory layout) would be
m*nloc*sizeof(type), while the extent will be related to where you expect
the second element of the same type to start. If you do resize, you will
incorporate t
The error message is self explanatory, the application calls MPI_Recv with
an invalid TAG. The MPI standard defines a valid tag as a positive integer
between 0 and the value of the MPI_UB_TAG attribute on MPI_COMM_WORLD. At
this point it seems plausible this is an application issue.
Check that the
You need to enable the monitoring PML in order to get access to the
pml_monitoring_messages_count MPI_T. For this you need to know what PML you
are currently using and add monitoring to the pml MCA variable. As an
example if you use ob1 you should add the following to your mpirun command
"--mca pml
Hi Pierre,
MPI is allowed to pipeline the collective communications. This explains why
the MPI_Op takes the len of the buffers as an argument. Because your MPI_Op
ignores this length it alters data outside the temporary buffer we use for
the segment. Other versions of the MPI_Allreduce implementat
Carl,
AVX support was introduced in 4.1 which explains why you did not have such
issues before. What is your configure command in these 2 cases ? Please
create an issue on github and attach your config.log.
George.
On Fri, Feb 5, 2021 at 2:44 PM Carl Ponder via users <
users@lists.open-mpi.o
gt; 4.- The hostfile.
> >
> > The duration of the delay is just a few seconds, about 3 ~ 4.
> >
> > Essentially, the first error message I get from a waiting process is
> "74: MPI_ERR_PROC_FAILED: Process Failure".
> >
> > Hope this information can he
Daniel,
There are no timeouts in OMPI with the exception of the initial connection
over TCP, where we use the socket timeout to prevent deadlocks. As you
already did quite a few communicator duplications and other collective
communications before you see the timeout, we need more info about this.
ns over TCP/IP and hence rule out any memory
> leak that could be triggered by your fast interconnect.
>
>
>
> In any case, a reproducer will greatly help us debugging this issue.
>
>
> Cheers,
>
>
> Gilles
>
>
>
> On 12/4/2020 7:20 AM, George Bosilca via
Patrick,
I'm afraid there is no simple way to check this. The main reason being that
OMPI use handles for MPI objects, and these handles are not tracked by the
library, they are supposed to be provided by the user for each call. In
your case, as you already called MPI_Type_free on the datatype, yo
John,
There are many things in play in such an experiment. Plus, expecting linear
speedup even at the node level is certainly overly optimistic.
1. A single core experiment has full memory bandwidth, so you will
asymptotically reach the max flops. Adding more cores will increase the
memory pressu
Diego,
I see nothing wrong with the way you create the datatype. In fact this is
the perfect example on how to almost do it right in FORTRAN. The almost is
because your code is highly dependent on the -r8 compiler option (otherwise
the REAL in your type will not match the MPI_DOUBLE_PRECISION you
An application that rely on MPI eager buffers for correctness or
performance is an incorrect application. Among many other points simply
because MPI implementations without support for eager are legit. Moreover,
these applications also miss the point on performance. Among the overheads
I am not onl
Definitively not! You should never rely on the eager size to fix a complex
communication pattern. The rule of thumb should be: Is my application
working correctly if the MPI forces a zero-bytes eager size. As suggested
above, the most suitable approach is to define a communication scheme that
Biplab,
The eager is a constant for each BTL, and it represent the data that is
sent eagerly with the matching information out of the entire message. So,
if the question is how much memory is needed to store all the
eager messages then the answer will depend on the communication pattern of
your ap
On Mon, Mar 16, 2020 at 6:15 PM Konstantinos Konstantinidis via users <
users@lists.open-mpi.org> wrote:
> Hi, I have some questions regarding technical details of MPI collective
> communication methods and broadcast:
>
>- I want to understand when the number of receivers in a MPI_Bcast can
>
Martyn,
I don't know exactly what your code is doing, but based on your inquiry I
assume you are using MPI_BSEND multiple times and you run out of local
buffers.
The MPI standard does not mandate a wait until buffer space becomes
available, because that can lead to deadlocks (communication patter
:
> Hi,
>
> George Bosilca writes:
>
> > If I'm not mistaken, hcoll is playing with the opal_progress in a way
> > that conflicts with the blessed usage of progress in OMPI and prevents
> > other components from advancing and timely completing requests. The
&
If I'm not mistaken, hcoll is playing with the opal_progress in a way that
conflicts with the blessed usage of progress in OMPI and prevents other
components from advancing and timely completing requests. The impact is
minimal for sequential applications using only blocking calls, but is
jeopardizi
According to the error message you are using MPICH not Open MPI.
George.
On Tue, Jan 14, 2020 at 5:53 PM SOPORTE MODEMAT via users <
users@lists.open-mpi.org> wrote:
> Hello everyone.
>
>
>
> I would like somebody help me to figure out how can I make that the
> openmpi use the infiniband inte
wrote:
> Hi George, thank you very much for your answer. Can you please explain me
> a little more about "If you need to guarantee progress you might either
> have your own thread calling MPI functions (such as MPI_Test)". Regards
>
> Martín
>
> --------
Martin,
The MPI standard does not mandate progress outside MPI calls, thus
implementations are free to provide, or not, asynchronous progress. Calling
MPI_Test provides the MPI implementation with an opportunity to progress
it's internal communication queues. However, an implementation could try a
rn 1;
>>
>> }
>>
>> }
>>
>>
>>
>> for (int i = 0; i < num_threads; i++) {
>>
>> if(pthread_join(threads[i], NULL)) {
>>
>> fprintf(stderr, "Error joining threadn");
>>
&g
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote:
> On Wed, Nov 27, 2019 at 3:16 PM George Bosilca
> wrote:
>
>> Short and portable answer: you need to sync before the Isend or you will
>> send garbage data.
>>
> Ideally, I want to formulate my code into a
Short and portable answer: you need to sync before the Isend or you will
send garbage data.
Assuming you are willing to go for a less portable solution you can get the
OMPI streams and add your kernels inside, so that the sequential order will
guarantee correctness of your isend. We have 2 hidden
Charles,
Having implemented some of the underlying collective algorithms, I am
puzzled by the need to force the sync to 1 to have things flowing. I would
definitely appreciate a reproducer so that I can identify (and hopefully)
fix the underlying problem.
Thanks,
George.
On Tue, Oct 29, 2019
Charles,
There is a known issue with calling collectives on a tight loop, due to
lack of control flow at the network level. It results in a significant
slow-down, that might appear as a deadlock to users. The work around this
is to enable the sync collective module, that will insert a fake barrier
To completely disable UCX you need to disable the UCX MTL and not only the
BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1".
As you have a gdb session on the processes you can try to break on some of
the memory allocations function (malloc, realloc, calloc).
George.
Leo,
In a UMA system having the displacement and/or recvcounts arrays on managed
GPU memory should work, but it will incur overheads for at least 2 reasons:
1. the MPI API arguments are checked for correctness (here recvcounts)
2. the collective algorithm part that executes on the CPU uses the
dis
There is an ongoing discussion about this on issue #4067 (
https://github.com/open-mpi/ompi/issues/4067). Also the mailing list
contains few examples on how to tweak the collective algorithms to your
needs.
George.
On Thu, Jun 6, 2019 at 7:42 PM hash join via users
wrote:
> Hi all,
>
>
> I w
Depending on the alignment of the different types there might be small
holes in the low-level headers we exchange between processes It should not
be a concern for users.
valgrind should not stop on the first detected issue except
if --exit-on-first-error has been provided (the default value should
Jon,
The configure AC_HEADER_STDC macro is considered obsolete [1] as most of
the OSes are STDC compliant nowadays. To have it failing on a recent
version of OSX, is therefore something unexpected. Moreover, many of the
OMPI developers work on OSX Mojave with the default compiler but with the
same
If I add a loop to make sure I account for all receives on the master, and
correctly set the tags a basic application based on your scheme seems to
work as intended. Can you post a reproducer for you issue instead ?
Thanks,
George.
On Thu, Mar 28, 2019 at 6:52 AM carlos aguni wrote:
> Hi Gil
I was not able to reproduce the issue with openib on the 4.0, but instead I
randomly segfault in MPI finalize during the grdma cleanup.
I could however reproduce the TCP timeout part with both 4.0 and master, on
a pretty sane cluster (only 3 interfaces, lo, eth0 and virbr0). With no
surprise, the
inconvenience,
George.
On Wed, Feb 20, 2019 at 2:27 PM George Reeke
wrote:
> On Wed, 2019-02-20 at 13:21 -0500, George Bosilca wrote:
>
> > To obtain representative samples of the MPI community, we have
> > prepared a survey
> >
> >
>
International MPI Survey,
George Bosilca (UT/ICL)
Geoffroy Vallee (ORNL)
Emmanuel Jeannot (Inria)
Atsushi Hori (RIKEN)
Takahiro Ogura (RIKEN)
[1] https://github.com/bosilca/MPIsurvey/
[2] https://bosilca.github.io/MPIsurvey/
___
users mailing list
users
I think the return of ascontiguous will be reused by python before the data
is really transferred by the isend. The input buffer for the isend
operation should be const fire the entire duration of the isend+wait window.
George
On Fri, Feb 1, 2019, 12:27 Konstantinos Konstantinidis Hi, consider a
I'm trying to replicate using the same compiler (icc 2019) on my OSX over
TCP and shared memory with no luck so far. So either the segfault it's
something specific to OmniPath or to the memcpy implementation used on
Skylake. I tried to use the trace you sent, more specifically the
opal_datatype_cop
able
> to trace the call stack.
>
> Which OpenMPI 3.x version do you suggest ? A nightly snapshot ? Cloning
> the git repo ?
>
> Thanks
>
> Patrick
>
> George Bosilca wrote:
>
> Few days ago we have pushed a fix in master for a strikingly similar
> issue. The p
Few days ago we have pushed a fix in master for a strikingly similar issue.
The patch will eventually make it in the 4.0 and 3.1 but not on the 2.x
series. The best path forward will be to migrate to a more recent OMPI
version.
George.
On Tue, Sep 18, 2018 at 3:50 AM Patrick Begou <
patrick.be..
You will need to create a special variable that holds 2 entries, one for
the max operation (with whatever type you need) and an int for the rank of
the process. The MAXLOC is described on the OMPI man page [1] and you can
find an example on how to use it on the MPI Forum [2].
George.
[1] https:/
Yes, this is the behavior defined by the MPI standard. More precisely,
section 8.1.2 of the MPI 3.1 standard clearly states that the predefined
attributes only exists for MPI_COMM_WORLD.
George.
On Sun, Jul 8, 2018 at 1:55 AM Weiqun Zhang wrote:
> Hi,
>
> It appears that MPI_Comm_get_attr f
We had a similar issue few months back. After investigation it turned out
to be related to NUMA balancing [1] being enabled by default on recent
releases of Linux-based OSes.
In our case turning off NUMA balancing fixed most of the performance
incoherences we had. You can check its status in /proc
Shared memory communication is important for multi-core platforms,
especially when you have multiple processes per node. But this is only part
of your issue here.
You haven't specified how your processes will be mapped on your resources.
As a result rank 0 and 1 will be on the same node, so you ar
(which
level of threading) ? Can you send us the opal_config.h file please.
Thanks,
George.
On Sun, Apr 8, 2018 at 8:30 PM, George Bosilca wrote:
> Right, it has nothing to do with the tag. The sequence number is an
> internal counter that help OMPI to deliver the messages in the MPI re
nd we will get back to you for further debugging.
George.
On Sun, Apr 8, 2018 at 6:00 PM, Noam Bernstein
wrote:
> On Apr 8, 2018, at 3:58 PM, George Bosilca wrote:
>
> Noam,
>
> Thanks for your output, it highlight an usual outcome. It shows that a
> process (29662) ha
push
the same sequence number twice ...
More digging is required.
George.
On Fri, Apr 6, 2018 at 2:42 PM, Noam Bernstein
wrote:
>
> On Apr 6, 2018, at 1:41 PM, George Bosilca wrote:
>
> Noam,
>
> According to your stack trace the correct way to call the mca_pml_ob1_
wrote:
> On Apr 5, 2018, at 4:11 PM, George Bosilca wrote:
>
> I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm,
> 1)". This allows the debugger to make a call our function, and output
> internal information about the library status.
>
>
>
Yes, you can do this by adding --enable-debug to OMPI configure (and make
sure your don't have the configure flag --with-platform=optimize).
George.
On Thu, Apr 5, 2018 at 4:20 PM, Noam Bernstein
wrote:
>
> On Apr 5, 2018, at 4:11 PM, George Bosilca wrote:
>
> I attac
I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm,
1)". This allows the debugger to make a call our function, and output
internal information about the library status.
George.
On Thu, Apr 5, 2018 at 4:03 PM, Noam Bernstein
wrote:
> On Apr 5, 2018, at 3:
Noam,
The OB1 provide a mechanism to dump all pending communications in a
particular communicator. To do this I usually call mca_pml_ob1_dump(comm,
1), with comm being the MPI_Comm and 1 being the verbose mode. I have no
idea how you can find the pointer to the communicator out of your code, but
i
We can always build complicated solutions, but in some cases sane and
simple solutions exists. Let me clear some of the misinformation in this
thread.
The MPI standard is clear what type of conversion is allowed and how it
should be done (for more info read Chapter 4): no type conversion is
allowe
What are the settings of the firewall on your 2 nodes ?
George.
On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell wrote:
> When I try to run an MPI program on a network with a shared file system
> and connected by ethernet, I get the error message "tcp_peer_send_blocking:
> send() to socket
Hi Yvan,
You mention a test. Can you make it available either on the mailing list, a
github issue or privately ?
Thanks,
George.
On Sat, Jan 6, 2018 at 7:43 PM, wrote:
>
> Hello,
>
> I obtain false positives with OpenMPI when memcheck is enabled, using
> OpenMPI 3.0.0
>
> This is simil
he total time and the transmission time it took for
> the send-receive function to complete (the only difference is that I
> subtract the deserialization time from both counters since I don't want
> this counted in order to have a valid comparison with the previous
> implementation). It
dWorker::execShuffle, possible via an
MPI_Allgatherv toward the master process in MPI_COMM_WORLD (in this case
you can convert the "long long" into a double to facilitate the collective).
George.
>
> I know that this discussion is getting long but if you have some free time
o, do you suggest anything else or I am
> trapped in using the MPI_Bcast() as shown in Option 1?
>
> On Mon, Nov 6, 2017 at 8:58 AM, George Bosilca
> wrote:
>
>> On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis <
>> kostas1...@gmail.com> wrote:
>>
ating all communications
in a single temporal location you spread them out across time by imposing
your own communication logic. This basically translate a set of blocking
collective (bcast is a perfect target) into a pipelined mix. Instead of
describing such a scheme here I suggest you read the algorithmic
de
It really depends what are you trying to achieve. If the question is
rhetorical: "can I write a code that does in parallel broadcasts on
independent groups of processes ?" then the answer is yes, this is
certainly possible. If however you add a hint of practicality in your
question "can I write an
John,
To disable shared memory (sm or vader in Open MPI depending on the
version), you have to remove it from the list of approved underlying
network devices. As you specifically want TCP support, I would add "--mca
btl tcp,self" to my mpirun command line. However, by default Open MPI tries
to avo
John,
On the ULFM mailing list you pointed out, we converged toward a hardware
issue. Resources associated with the dead process were not correctly freed,
and follow-up processes on the same setup would inherit issues related to
these lingering messages. However, keep in mind that the setup was
di
On Thu, Sep 28, 2017 at 12:45 AM, Fab Tillier via users <
users@lists.open-mpi.org> wrote:
> Hi Llelan,
>
> Llelan D. wrote on Wed, 27 Sep 2017 at 19:06:23
>
> > On 09/27/2017 3:04 PM, Jeffrey A Cummings wrote:
> >> The MS-MPI developers disagree with your statement below and claim to
> >> be acti
All your processes send their data to a single destination, in same time.
Clearly you are reaching the capacity of your network and your data
transfers will be bound by this. This is a physical constraint that you can
only overcome by adding network capacity to your cluster.
At the software level
%rank was set to MPI_PROC_NULL. I was just suggesting that
> he change that to "IF(MPI_COMM_NULL .NE. MASTER_COMM)" -- i.e., he
> shouldn't make any assumptions about the value of MPI_PROC_NULL, etc.
>
>
>
> > On Aug 2, 2017, at 12:54 PM, George Bosilca wrote:
> >
Diego,
Setting the color to MPI_COMM_NULL is not good, as it results in some
random value (and not the MPI_UNDEFINED that do not generate a
communicator). Change the color to MPI_UNDEFINED and your application
should work just fine (in the sense that all processes not in the master
communicator wi
1 - 100 of 736 matches
Mail list logo