Re: [OMPI users] MPI Tool Information Interface (MPI_T), details on collective communication

2025-04-23 Thread 'George Bosilca' via Open MPI users
Anna, The monitoring PML tracks all activity on the PML but might choose to only expose that one that the user can be interested in, aka its own messages, and hide the rest of the traffic. This is easy in OMPI because all internal messages are generated using negative tags (which are not allowed f

Re: [OMPI users] Disable PMPI bindings?

2025-02-17 Thread 'George Bosilca' via Open MPI users
I'm not sure if I correctly understand the compiler complaint here, but I think it is complaining about a non-optional dummy argument being omitted from the call. In this case, I assume the issue is raised in the mpif Fortran interface (not the f08 interface), due to the fact that the error is not

Re: [OMPI users] On compiling out parameter checking

2024-07-18 Thread George Bosilca via users
Hi Eli, I agree with you, keep the checks enabled, and users that want them off can do it via our MCA parameters (command line or ${HOME}/.openmpi/mca-params.conf). I don't think it is ever effective to try to save a few branches in MPI functions that usually cost over a microsecond, and lose all

Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-29 Thread George Bosilca via users
can be seen as contiguous > (even if described otherwise)"? In what way could it be described > otherwise, but still be seen as contiguous? > > Thanks, > Pascal Boeschoten > > On Tue, 23 Apr 2024 at 16:05, George Bosilca wrote: > >> zero copy does not work with n

Re: [OMPI users] Not getting zero-copy with custom datatype

2024-04-23 Thread George Bosilca via users
zero copy does not work with non-contiguous datatypes (it would require both processes to know the memory layout used by the peer). As long as the memory layout described by the type can be seen as contiguous (even if described otherwise), it should work just fine. George. On Tue, Apr 23, 2024

Re: [OMPI users] UFLM only works on a single node???

2024-03-24 Thread George Bosilca via users
All the examples work for me on using ULFM ge87f595 compiled with minimalistic options: '--prefix=XXX --enable-picky --enable-debug --disable-heterogeneous --enable-contrib-no-build=vt --enable-mpirun-prefix-by-default --enable-mpi-ext=ftmpi --with-ft=mpi --with-pmi'. I run using ipoib, so I selec

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component >>> v5.0.1) >>> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v5.0.1) >>> MCA hook: comm_method (MCA v2.1.0, API v1.0.0, Component >>> v5.0.1) >>>

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v5.0.1) >> MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component >> v5.0.1) >> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v5.0.1) >> MCA par

Re: [OMPI users] Homebrew-installed OpenMPI 5.0.1 can't run a simple test program

2024-02-05 Thread George Bosilca via users
OMPI seems unable to create a communication medium between your processes. There are few known issues on OSX, please read https://github.com/open-mpi/ompi/issues/12273 for more info. Can you provide the header of the ompi_info command. What I'm interested on is the part about `Configure command li

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
; > MPI_ABORT was invoked on rank 3 in communicator MPI_COMM_WORLD > > with errorcode 79. > > > NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes. > > You may or may not see output from other processes, depending on > > exactly when Open MPI kill

Re: [OMPI users] [EXT] Re: Error handling

2023-07-19 Thread George Bosilca via users
does one simply have to always use > positive numbers? Why I would prefer Abort is because it seems safer. > > BR Alex > > > -- > *Von:* George Bosilca > *Gesendet:* Dienstag, 18. Juli 2023 18:47 > *An:* Open MPI Users > *Cc:* Alexander Stadi

Re: [OMPI users] Error handling

2023-07-18 Thread George Bosilca via users
Alex, How are your values "random" if you provide correct values ? Even for negative values you could use MIN to pick one value and return it. What is the problem with `MPI_Abort` ? it does seem to do what you want. George. On Tue, Jul 18, 2023 at 4:38 AM Alexander Stadik via users < users@li

Re: [OMPI users] OMPI compilation error in Making all datatypes

2023-07-12 Thread George Bosilca via users
I can't replicate this on my setting, but I am not using the tar archive from the OMPI website (I use the git tag). Can you do `ls -l opal/datatype/.lib` in your build directory. George. On Wed, Jul 12, 2023 at 7:14 AM Elad Cohen via users < users@lists.open-mpi.org> wrote: > Hi Jeff, thanks f

Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?

2023-04-17 Thread George Bosilca via users
layer, so I should look to that community / code for quantifying how large > those buffers get inside my application? > > Thanks again, and apologies for what is surely a woeful misuse of the > correct terminology here on some of this stuff. > > - Brian > > > On Mon, Ap

Re: [OMPI users] Q: Getting MPI-level memory use from OpenMPI?

2023-04-17 Thread George Bosilca via users
Brian, OMPI does not have an official mechanism to report how much memory OMPI allocates. But, there is hope: 1. We have a mechanism to help debug memory issues (OPAL_ENABLE_MEM_DEBUG). You could enable it and then provide your own flavor of memory tracking in opal/util/malloc.c 2. You can use a

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
Edgar is right, UCX_TLS has some role in the selection. You can see the current selection by running `uxc_info -c`. In my case, UCX_TLS is set to `all` somehow, and I had either a not-connected IB device or a GPU. However, I did not set UCX_TLS manually, and I can't see it anywhere in my system con

Re: [OMPI users] What is the best choice of pml and btl for intranode communication

2023-03-06 Thread George Bosilca via users
ucx PML should work just fine even on a single node scenario. As Jeff indicated you need to move the MCA param `--mca pml ucx` before your command. George. On Mon, Mar 6, 2023 at 9:48 AM Jeff Squyres (jsquyres) via users < users@lists.open-mpi.org> wrote: > If this run was on a single node, t

Re: [OMPI users] Subcommunicator communications do not complete intermittently

2022-09-11 Thread George Bosilca via users
Assuming a correct implementation the described communication pattern should work seamlessly. Would it be possible to either share a reproducer or provide the execution stack by attaching a debugger to the deadlocked application to see the state of the different processes. I wonder if all processe

Re: [OMPI users] OpenMPI and names of the nodes in a cluster

2022-06-16 Thread George Bosilca via users
This error seems to be initiated from the PMIX regex framework. Not sure exactly which one is used, but a good starting point is in one of the files in 3rd-party/openpmix/src/mca/preg/. Look for the generate_node_regex function in the different components, one of them is raising the error. George.

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread George Bosilca via users
mizations on the Neighbor collectives > at some point? > > regards > Michael > > On Wed, Jun 8, 2022 at 1:29 PM George Bosilca wrote: > >> Michael, >> >> As far as I know none of the implementations of the >> neighborhood collectives in OMPI are arc

Re: [OMPI users] Quality and details of implementation for Neighborhood collective operations

2022-06-08 Thread George Bosilca via users
Michael, As far as I know none of the implementations of the neighborhood collectives in OMPI are architecture-aware. The only 2 components that provide support for neighborhood collectives are basic (for the blocking version) and libnbc (for the non-blocking versions). George. On Wed, Jun 8,

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-05 Thread George Bosilca via users
That is weird, but maybe it is not a deadlock, but a very slow progress. In the child can you print the fdmax and i in the frame do_child. George. On Thu, May 5, 2022 at 11:50 AM Scott Sayres via users < users@lists.open-mpi.org> wrote: > Jeff, thanks. > from 1: > > (lldb) process attach --pid 9

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
40.dylib`opal_fd_read + 52 > > frame #2: 0x00010784b418 > mca_odls_default.so`odls_default_fork_local_proc > + 284 > > frame #3: 0x0001002c7914 > libopen-rte.40.dylib`orte_odls_base_spawn_proc > + 968 > > frame #4: 0x0001003d96dc > libeven

Re: [OMPI users] mpirun hangs on m1 mac w openmpi-4.1.3

2022-05-04 Thread George Bosilca via users
I compiled a fresh copy of the 4.1.3 branch on my M1 laptop, and I can run both MPI and non-MPI apps without any issues. Try running `lldb mpirun -- -np 1 hostname` and once it deadlocks, do a CTRL+C to get back on the debugger and then `backtrace` to see where it is waiting. George. On Wed, Ma

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-22 Thread George Bosilca via users
dylib > > "_opal_atomic_wmb", referenced from: > > import-atom in libopen-pal.dylib > > ld: symbol(s) not found for architecture x86_64 > > make[2]: *** [opal_wrapper] Error 1 > > make[1]: *** [all-recursive] Error 1 > > make: *** [all-recursive] E

Re: [OMPI users] help with M1 chip macOS openMPI installation

2022-04-21 Thread George Bosilca via users
1. I am not aware of any outstanding OMPI issues with the M1 chip that would prevent OMPI from compiling and running efficiently in an M1-based setup, assuming the compilation chain is working properly. 2. M1 supports x86 code via Rosetta, an app provided by Apple to ensure a smooth transition fro

Re: [OMPI users] Monitoring an openmpi cluster.

2022-04-08 Thread George Bosilca via users
Vladimir, A while back the best cluster monitoring tool was Ganglia ( http://ganglia.sourceforge.net/), but it has not been maintained for several years. There are quite a few alternatives out there, I found nightingale (https://github.com/didi/nightingale) to be simple to install and use. Good l

Re: [OMPI users] Regarding process binding on OS X with oversubscription

2022-03-17 Thread George Bosilca via users
Sajid, `--bind-to-core` should have generated the same warning on OSX. Not sure why this is happening, but I think the real bug here is the lack of warning when using the deprecated argument. Btw, the current master does not even accept 'bind-to-core', instead it complains about 'unrecognized opt

Re: [OMPI users] Regarding process binding on OS X with oversubscription

2022-03-17 Thread George Bosilca via users
OMPI cannot support process binding on OSX because, as the message indicates, there is no OS API for process binding (at least not exposed to the user-land applications). George. On Thu, Mar 17, 2022 at 3:25 PM Sajid Ali via users < users@lists.open-mpi.org> wrote: > Hi OpenMPI-developers, >

Re: [OMPI users] MPI_Intercomm_create error

2022-03-16 Thread George Bosilca via users
I see similar issues on platforms with multiple IP addresses, if some of them are not fully connected. In general, specifying which interface OMPI can use (with --mca btl_tcp_if_include x.y.z.t/s) solves the problem. George. On Wed, Mar 16, 2022 at 5:11 PM Mccall, Kurt E. (MSFC-EV41) via users

Re: [OMPI users] Call to MPI_Allreduce() returning value 15

2022-03-09 Thread George Bosilca via users
There are two ways the MPI_Allreduce returns MPI_ERR_TRUNCATE: 1. it is propagated from one of the underlying point-to-point communications, which means that at least one of the participants has an input buffer with a larger size. I know you said the size is fixed, but it only matters if all proces

Re: [OMPI users] Where can a graph communicator be used?

2022-02-15 Thread George Bosilca via users
via users < users@lists.open-mpi.org> wrote: > > > On Mon, Feb 14, 2022 at 9:01 PM George Bosilca > wrote: > >> On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users < >> users@lists.open-mpi.org> wrote: >> >>> 1. Where can I use this commun

Re: [OMPI users] Where can a graph communicator be used?

2022-02-14 Thread George Bosilca via users
On Mon, Feb 14, 2022 at 6:33 PM Neil Carlson via users < users@lists.open-mpi.org> wrote: > I've been successful at using MPI_Dist_graph_create_adjacent to create a > new communicator with graph topology, and using it with > MPI_Neighbor_alltoallv. But I have a few questions: > > 1. Where can I u

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-11 Thread George Bosilca via users
l like OSU INAM can get info from network > fabric and even > > switches related to a particular MPI job ... > > > There should be more info gathered in the background > > > ------ > *From:* George Bosilca > *Sent:* Friday, February

Re: [OMPI users] Using OSU benchmarks for checking Infiniband network

2022-02-11 Thread George Bosilca via users
Collecting data during execution is possible in OMPI either with an external tool, such as mpiP, or the internal infrastructure, SPC. Take a look at ./examples/spc_example.c or ./test/spc/spc_test.c to see how to use this. George. On Fri, Feb 11, 2022 at 9:43 AM Bertini, Denis Dr. via users <

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread George Bosilca via users
#include >int MPI_Type_create_resized(MPI_Datatype oldtype, MPI_Aint lb, > MPI_Aint extent, MPI_Datatype *newtype) > > > Jonas > On 16-12-2021 22:39, George Bosilca wrote: > > You are confusing the size and extent of the datatype. The size (aka the >

Re: [OMPI users] unexpected behavior when combining MPI_Gather and MPI_Type_vector

2021-12-16 Thread George Bosilca via users
You are confusing the size and extent of the datatype. The size (aka the physical number of bytes described by the memory layout) would be m*nloc*sizeof(type), while the extent will be related to where you expect the second element of the same type to start. If you do resize, you will incorporate t

Re: [OMPI users] MPI_ERR_TAG: invalid tag

2021-09-19 Thread George Bosilca via users
The error message is self explanatory, the application calls MPI_Recv with an invalid TAG. The MPI standard defines a valid tag as a positive integer between 0 and the value of the MPI_UB_TAG attribute on MPI_COMM_WORLD. At this point it seems plausible this is an application issue. Check that the

Re: [OMPI users] Question about MPI_T

2021-08-17 Thread George Bosilca via users
You need to enable the monitoring PML in order to get access to the pml_monitoring_messages_count MPI_T. For this you need to know what PML you are currently using and add monitoring to the pml MCA variable. As an example if you use ob1 you should add the following to your mpirun command "--mca pml

Re: [OMPI users] Allreduce with Op

2021-03-13 Thread George Bosilca via users
Hi Pierre, MPI is allowed to pipeline the collective communications. This explains why the MPI_Op takes the len of the buffers as an argument. Because your MPI_Op ignores this length it alters data outside the temporary buffer we use for the segment. Other versions of the MPI_Allreduce implementat

Re: [OMPI users] AVX errors building OpenMPI 4.1.0

2021-02-05 Thread George Bosilca via users
Carl, AVX support was introduced in 4.1 which explains why you did not have such issues before. What is your configure command in these 2 cases ? Please create an issue on github and attach your config.log. George. On Fri, Feb 5, 2021 at 2:44 PM Carl Ponder via users < users@lists.open-mpi.o

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-11 Thread George Bosilca via users
gt; 4.- The hostfile. > > > > The duration of the delay is just a few seconds, about 3 ~ 4. > > > > Essentially, the first error message I get from a waiting process is > "74: MPI_ERR_PROC_FAILED: Process Failure". > > > > Hope this information can he

Re: [OMPI users] Timeout in MPI_Bcast/MPI_Barrier?

2021-01-08 Thread George Bosilca via users
Daniel, There are no timeouts in OMPI with the exception of the initial connection over TCP, where we use the socket timeout to prevent deadlocks. As you already did quite a few communicator duplications and other collective communications before you see the timeout, we need more info about this.

Re: [OMPI users] MPI_type_free question

2020-12-04 Thread George Bosilca via users
ns over TCP/IP and hence rule out any memory > leak that could be triggered by your fast interconnect. > > > > In any case, a reproducer will greatly help us debugging this issue. > > > Cheers, > > > Gilles > > > > On 12/4/2020 7:20 AM, George Bosilca via

Re: [OMPI users] MPI_type_free question

2020-12-03 Thread George Bosilca via users
Patrick, I'm afraid there is no simple way to check this. The main reason being that OMPI use handles for MPI objects, and these handles are not tracked by the library, they are supposed to be provided by the user for each call. In your case, as you already called MPI_Type_free on the datatype, yo

Re: [OMPI users] Vader - Where to Look for Shared Memory Use

2020-07-22 Thread George Bosilca via users
John, There are many things in play in such an experiment. Plus, expecting linear speedup even at the node level is certainly overly optimistic. 1. A single core experiment has full memory bandwidth, so you will asymptotically reach the max flops. Adding more cores will increase the memory pressu

Re: [OMPI users] Error with MPI_GET_ADDRESS and MPI_TYPE_CREATE_RESIZED?

2020-05-17 Thread George Bosilca via users
Diego, I see nothing wrong with the way you create the datatype. In fact this is the perfect example on how to almost do it right in FORTRAN. The almost is because your code is highly dependent on the -r8 compiler option (otherwise the REAL in your type will not match the MPI_DOUBLE_PRECISION you

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-26 Thread George Bosilca via users
An application that rely on MPI eager buffers for correctness or performance is an incorrect application. Among many other points simply because MPI implementations without support for eager are legit. Moreover, these applications also miss the point on performance. Among the overheads I am not onl

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-25 Thread George Bosilca via users
Definitively not! You should never rely on the eager size to fix a complex communication pattern. The rule of thumb should be: Is my application working correctly if the MPI forces a zero-bytes eager size. As suggested above, the most suitable approach is to define a communication scheme that

Re: [OMPI users] Regarding eager limit relationship to send message size

2020-03-24 Thread George Bosilca via users
Biplab, The eager is a constant for each BTL, and it represent the data that is sent eagerly with the matching information out of the entire message. So, if the question is how much memory is needed to store all the eager messages then the answer will depend on the communication pattern of your ap

Re: [OMPI users] Limits of communicator size and number of parallel broadcast transmissions

2020-03-17 Thread George Bosilca via users
On Mon, Mar 16, 2020 at 6:15 PM Konstantinos Konstantinidis via users < users@lists.open-mpi.org> wrote: > Hi, I have some questions regarding technical details of MPI collective > communication methods and broadcast: > >- I want to understand when the number of receivers in a MPI_Bcast can >

Re: [OMPI users] Fault in not recycling bsend buffer ?

2020-03-17 Thread George Bosilca via users
Martyn, I don't know exactly what your code is doing, but based on your inquiry I assume you are using MPI_BSEND multiple times and you run out of local buffers. The MPI standard does not mandate a wait until buffer space becomes available, because that can lead to deadlocks (communication patter

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-04 Thread George Bosilca via users
: > Hi, > > George Bosilca writes: > > > If I'm not mistaken, hcoll is playing with the opal_progress in a way > > that conflicts with the blessed usage of progress in OMPI and prevents > > other components from advancing and timely completing requests. The &

Re: [OMPI users] Trouble with Mellanox's hcoll component and MPI_THREAD_MULTIPLE support?

2020-02-03 Thread George Bosilca via users
If I'm not mistaken, hcoll is playing with the opal_progress in a way that conflicts with the blessed usage of progress in OMPI and prevents other components from advancing and timely completing requests. The impact is minimal for sequential applications using only blocking calls, but is jeopardizi

Re: [OMPI users] HELP: openmpi is not using the specified infiniband interface !!

2020-01-14 Thread George Bosilca via users
According to the error message you are using MPICH not Open MPI. George. On Tue, Jan 14, 2020 at 5:53 PM SOPORTE MODEMAT via users < users@lists.open-mpi.org> wrote: > Hello everyone. > > > > I would like somebody help me to figure out how can I make that the > openmpi use the infiniband inte

Re: [OMPI users] Non-blocking send issue

2020-01-02 Thread George Bosilca via users
wrote: > Hi George, thank you very much for your answer. Can you please explain me > a little more about "If you need to guarantee progress you might either > have your own thread calling MPI functions (such as MPI_Test)". Regards > > Martín > > --------

Re: [OMPI users] Non-blocking send issue

2019-12-31 Thread George Bosilca via users
Martin, The MPI standard does not mandate progress outside MPI calls, thus implementations are free to provide, or not, asynchronous progress. Calling MPI_Test provides the MPI implementation with an opportunity to progress it's internal communication queues. However, an implementation could try a

Re: [OMPI users] CUDA mpi question

2019-11-28 Thread George Bosilca via users
rn 1; >> >> } >> >> } >> >> >> >> for (int i = 0; i < num_threads; i++) { >> >> if(pthread_join(threads[i], NULL)) { >> >> fprintf(stderr, "Error joining threadn"); >> &g

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
On Wed, Nov 27, 2019 at 5:02 PM Zhang, Junchao wrote: > On Wed, Nov 27, 2019 at 3:16 PM George Bosilca > wrote: > >> Short and portable answer: you need to sync before the Isend or you will >> send garbage data. >> > Ideally, I want to formulate my code into a

Re: [OMPI users] CUDA mpi question

2019-11-27 Thread George Bosilca via users
Short and portable answer: you need to sync before the Isend or you will send garbage data. Assuming you are willing to go for a less portable solution you can get the OMPI streams and add your kernels inside, so that the sequential order will guarantee correctness of your isend. We have 2 hidden

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles, Having implemented some of the underlying collective algorithms, I am puzzled by the need to force the sync to 1 to have things flowing. I would definitely appreciate a reproducer so that I can identify (and hopefully) fix the underlying problem. Thanks, George. On Tue, Oct 29, 2019

Re: [OMPI users] Program hangs when MPI_Bcast is called rapidly

2019-10-29 Thread George Bosilca via users
Charles, There is a known issue with calling collectives on a tight loop, due to lack of control flow at the network level. It results in a significant slow-down, that might appear as a deadlock to users. The work around this is to enable the sync collective module, that will insert a fake barrier

Re: [OMPI users] growing memory use from MPI application

2019-06-19 Thread George Bosilca via users
To completely disable UCX you need to disable the UCX MTL and not only the BTL. I would use "--mca pml ob1 --mca btl ^ucx —mca btl_openib_allow_ib 1". As you have a gdb session on the processes you can try to break on some of the memory allocations function (malloc, realloc, calloc). George.

Re: [OMPI users] Can displs in Scatterv/Gatherv/etc be a GPU array for CUDA-aware MPI?

2019-06-11 Thread George Bosilca via users
Leo, In a UMA system having the displacement and/or recvcounts arrays on managed GPU memory should work, but it will incur overheads for at least 2 reasons: 1. the MPI API arguments are checked for correctness (here recvcounts) 2. the collective algorithm part that executes on the CPU uses the dis

Re: [OMPI users] Open questions on MPI_Allreduce background implementation

2019-06-08 Thread George Bosilca via users
There is an ongoing discussion about this on issue #4067 ( https://github.com/open-mpi/ompi/issues/4067). Also the mailing list contains few examples on how to tweak the collective algorithms to your needs. George. On Thu, Jun 6, 2019 at 7:42 PM hash join via users wrote: > Hi all, > > > I w

Re: [OMPI users] OMPI 4.0.1 valgrind error on simple MPI_Send()

2019-04-30 Thread George Bosilca via users
Depending on the alignment of the different types there might be small holes in the low-level headers we exchange between processes It should not be a concern for users. valgrind should not stop on the first detected issue except if --exit-on-first-error has been provided (the default value should

Re: [OMPI users] 3.0.4, 4.0.1 build failure on OSX Mojave with LLVM

2019-04-24 Thread George Bosilca via users
Jon, The configure AC_HEADER_STDC macro is considered obsolete [1] as most of the OSes are STDC compliant nowadays. To have it failing on a recent version of OSX, is therefore something unexpected. Moreover, many of the OMPI developers work on OSX Mojave with the default compiler but with the same

Re: [OMPI users] Possible buffer overflow on Recv rank

2019-03-28 Thread George Bosilca
If I add a loop to make sure I account for all receives on the master, and correctly set the tags a basic application based on your scheme seems to work as intended. Can you post a reproducer for you issue instead ? Thanks, George. On Thu, Mar 28, 2019 at 6:52 AM carlos aguni wrote: > Hi Gil

Re: [OMPI users] OpenMPI v4.0.0 signal 11 (Segmentation fault)

2019-02-20 Thread George Bosilca
I was not able to reproduce the issue with openib on the 4.0, but instead I randomly segfault in MPI finalize during the grdma cleanup. I could however reproduce the TCP timeout part with both 4.0 and master, on a pretty sane cluster (only 3 interfaces, lo, eth0 and virbr0). With no surprise, the

Re: [OMPI users] [Request for Cooperation] -- MPI International Survey

2019-02-20 Thread George Bosilca
inconvenience, George. On Wed, Feb 20, 2019 at 2:27 PM George Reeke wrote: > On Wed, 2019-02-20 at 13:21 -0500, George Bosilca wrote: > > > To obtain representative samples of the MPI community, we have > > prepared a survey > > > > >

[OMPI users] [Request for Cooperation] -- MPI International Survey

2019-02-20 Thread George Bosilca
International MPI Survey, George Bosilca (UT/ICL) Geoffroy Vallee (ORNL) Emmanuel Jeannot (Inria) Atsushi Hori (RIKEN) Takahiro Ogura (RIKEN) [1] https://github.com/bosilca/MPIsurvey/ [2] https://bosilca.github.io/MPIsurvey/ ___ users mailing list users

Re: [OMPI users] Received values is different than sent after Isend() in MPI4py

2019-02-01 Thread George Bosilca
I think the return of ascontiguous will be reused by python before the data is really transferred by the isend. The input buffer for the isend operation should be const fire the entire duration of the isend+wait window. George On Fri, Feb 1, 2019, 12:27 Konstantinos Konstantinidis Hi, consider a

Re: [OMPI users] MPI_Reduce_Scatter Segmentation Fault with Intel 2019 Update 1 Compilers on OPA-1

2018-12-04 Thread George Bosilca
I'm trying to replicate using the same compiler (icc 2019) on my OSX over TCP and shared memory with no luck so far. So either the segfault it's something specific to OmniPath or to the memcpy implementation used on Skylake. I tried to use the trace you sent, more specifically the opal_datatype_cop

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-09-19 Thread George Bosilca
able > to trace the call stack. > > Which OpenMPI 3.x version do you suggest ? A nightly snapshot ? Cloning > the git repo ? > > Thanks > > Patrick > > George Bosilca wrote: > > Few days ago we have pushed a fix in master for a strikingly similar > issue. The p

Re: [OMPI users] [version 2.1.5] invalid memory reference

2018-09-18 Thread George Bosilca
Few days ago we have pushed a fix in master for a strikingly similar issue. The patch will eventually make it in the 4.0 and 3.1 but not on the 2.x series. The best path forward will be to migrate to a more recent OMPI version. George. On Tue, Sep 18, 2018 at 3:50 AM Patrick Begou < patrick.be..

Re: [OMPI users] know which CPU has the maximum value

2018-08-10 Thread George Bosilca
You will need to create a special variable that holds 2 entries, one for the max operation (with whatever type you need) and an int for the rank of the process. The MAXLOC is described on the OMPI man page [1] and you can find an example on how to use it on the MPI Forum [2]. George. [1] https:/

Re: [OMPI users] MPI_Comm_get_attr fails for sub-communicators created by MPI_Comm_split

2018-07-08 Thread George Bosilca
Yes, this is the behavior defined by the MPI standard. More precisely, section 8.1.2 of the MPI 3.1 standard clearly states that the predefined attributes only exists for MPI_COMM_WORLD. George. On Sun, Jul 8, 2018 at 1:55 AM Weiqun Zhang wrote: > Hi, > > It appears that MPI_Comm_get_attr f

Re: [OMPI users] MPI Windows: performance of local memory access

2018-05-23 Thread George Bosilca
We had a similar issue few months back. After investigation it turned out to be related to NUMA balancing [1] being enabled by default on recent releases of Linux-based OSes. In our case turning off NUMA balancing fixed most of the performance incoherences we had. You can check its status in /proc

Re: [OMPI users] peformance abnormality with openib and tcp framework

2018-05-14 Thread George Bosilca
Shared memory communication is important for multi-core platforms, especially when you have multiple processes per node. But this is only part of your issue here. You haven't specified how your processes will be mapped on your resources. As a result rank 0 and 1 will be on the same node, so you ar

Re: [OMPI users] mpi send/recv pair hangin

2018-04-09 Thread George Bosilca
(which level of threading) ? Can you send us the opal_config.h file please. Thanks, George. On Sun, Apr 8, 2018 at 8:30 PM, George Bosilca wrote: > Right, it has nothing to do with the tag. The sequence number is an > internal counter that help OMPI to deliver the messages in the MPI re

Re: [OMPI users] mpi send/recv pair hangin

2018-04-08 Thread George Bosilca
nd we will get back to you for further debugging. George. On Sun, Apr 8, 2018 at 6:00 PM, Noam Bernstein wrote: > On Apr 8, 2018, at 3:58 PM, George Bosilca wrote: > > Noam, > > Thanks for your output, it highlight an usual outcome. It shows that a > process (29662) ha

Re: [OMPI users] mpi send/recv pair hangin

2018-04-08 Thread George Bosilca
push the same sequence number twice ... More digging is required. George. On Fri, Apr 6, 2018 at 2:42 PM, Noam Bernstein wrote: > > On Apr 6, 2018, at 1:41 PM, George Bosilca wrote: > > Noam, > > According to your stack trace the correct way to call the mca_pml_ob1_

Re: [OMPI users] mpi send/recv pair hangin

2018-04-06 Thread George Bosilca
wrote: > On Apr 5, 2018, at 4:11 PM, George Bosilca wrote: > > I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm, > 1)". This allows the debugger to make a call our function, and output > internal information about the library status. > > >

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
Yes, you can do this by adding --enable-debug to OMPI configure (and make sure your don't have the configure flag --with-platform=optimize). George. On Thu, Apr 5, 2018 at 4:20 PM, Noam Bernstein wrote: > > On Apr 5, 2018, at 4:11 PM, George Bosilca wrote: > > I attac

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
I attach with gdb on the processes and do a "call mca_pml_ob1_dump(comm, 1)". This allows the debugger to make a call our function, and output internal information about the library status. George. On Thu, Apr 5, 2018 at 4:03 PM, Noam Bernstein wrote: > On Apr 5, 2018, at 3:

Re: [OMPI users] mpi send/recv pair hangin

2018-04-05 Thread George Bosilca
Noam, The OB1 provide a mechanism to dump all pending communications in a particular communicator. To do this I usually call mca_pml_ob1_dump(comm, 1), with comm being the MPI_Comm and 1 being the verbose mode. I have no idea how you can find the pointer to the communicator out of your code, but i

Re: [OMPI users] running mpi program between my PC and an ARM-architektur raspberry

2018-04-04 Thread George Bosilca
We can always build complicated solutions, but in some cases sane and simple solutions exists. Let me clear some of the misinformation in this thread. The MPI standard is clear what type of conversion is allowed and how it should be done (for more info read Chapter 4): no type conversion is allowe

Re: [OMPI users] tcp_peer_send_blocking: send() to socket 9 failed: Broken pipe (32)

2018-02-09 Thread George Bosilca
What are the settings of the firewall on your 2 nodes ? George. On Fri, Feb 9, 2018 at 3:08 PM, William Mitchell wrote: > When I try to run an MPI program on a network with a shared file system > and connected by ethernet, I get the error message "tcp_peer_send_blocking: > send() to socket

Re: [OMPI users] False positives with OpenMPI and memchecker

2018-01-06 Thread George Bosilca
Hi Yvan, You mention a test. Can you make it available either on the mailing list, a github issue or privately ? Thanks, George. On Sat, Jan 6, 2018 at 7:43 PM, wrote: > > Hello, > > I obtain false positives with OpenMPI when memcheck is enabled, using > OpenMPI 3.0.0 > > This is simil

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-12-11 Thread George Bosilca
he total time and the transmission time it took for > the send-receive function to complete (the only difference is that I > subtract the deserialization time from both counters since I don't want > this counted in order to have a valid comparison with the previous > implementation). It

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread George Bosilca
dWorker::execShuffle, possible via an MPI_Allgatherv toward the master process in MPI_COMM_WORLD (in this case you can convert the "long long" into a double to facilitate the collective). George. > > I know that this discussion is getting long but if you have some free time

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-07 Thread George Bosilca
o, do you suggest anything else or I am > trapped in using the MPI_Bcast() as shown in Option 1? > > On Mon, Nov 6, 2017 at 8:58 AM, George Bosilca > wrote: > >> On Sun, Nov 5, 2017 at 10:23 PM, Konstantinos Konstantinidis < >> kostas1...@gmail.com> wrote: >>

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-11-06 Thread George Bosilca
ating all communications in a single temporal location you spread them out across time by imposing your own communication logic. This basically translate a set of blocking collective (bcast is a perfect target) into a pipelined mix. Instead of describing such a scheme here I suggest you read the algorithmic de

Re: [OMPI users] Parallel MPI broadcasts (parameterized)

2017-10-31 Thread George Bosilca
It really depends what are you trying to achieve. If the question is rhetorical: "can I write a code that does in parallel broadcasts on independent groups of processes ?" then the answer is yes, this is certainly possible. If however you add a hint of practicality in your question "can I write an

Re: [OMPI users] Force TCP to go through network adapter for single-machine test.

2017-10-30 Thread George Bosilca
John, To disable shared memory (sm or vader in Open MPI depending on the version), you have to remove it from the list of approved underlying network devices. As you specifically want TCP support, I would add "--mca btl tcp,self" to my mpirun command line. However, by default Open MPI tries to avo

Re: [OMPI users] Open MPI internal error

2017-09-28 Thread George Bosilca
John, On the ULFM mailing list you pointed out, we converged toward a hardware issue. Resources associated with the dead process were not correctly freed, and follow-up processes on the same setup would inherit issues related to these lingering messages. However, keep in mind that the setup was di

Re: [OMPI users] OpenMPI v3.0 on Cygwin

2017-09-27 Thread George Bosilca
On Thu, Sep 28, 2017 at 12:45 AM, Fab Tillier via users < users@lists.open-mpi.org> wrote: > Hi Llelan, > > Llelan D. wrote on Wed, 27 Sep 2017 at 19:06:23 > > > On 09/27/2017 3:04 PM, Jeffrey A Cummings wrote: > >> The MS-MPI developers disagree with your statement below and claim to > >> be acti

Re: [OMPI users] Multi-threaded MPI communication

2017-09-21 Thread George Bosilca
All your processes send their data to a single destination, in same time. Clearly you are reaching the capacity of your network and your data transfers will be bound by this. This is a physical constraint that you can only overcome by adding network capacity to your cluster. At the software level

Re: [OMPI users] Groups and Communicators

2017-08-02 Thread George Bosilca
%rank was set to MPI_PROC_NULL. I was just suggesting that > he change that to "IF(MPI_COMM_NULL .NE. MASTER_COMM)" -- i.e., he > shouldn't make any assumptions about the value of MPI_PROC_NULL, etc. > > > > > On Aug 2, 2017, at 12:54 PM, George Bosilca wrote: > >

Re: [OMPI users] Groups and Communicators

2017-08-02 Thread George Bosilca
Diego, Setting the color to MPI_COMM_NULL is not good, as it results in some random value (and not the MPI_UNDEFINED that do not generate a communicator). Change the color to MPI_UNDEFINED and your application should work just fine (in the sense that all processes not in the master communicator wi

  1   2   3   4   5   6   7   8   >