lain C bindings, or other C++ abstractions such as Boost.MPI or
> Elementals for example)
>
> Cheers,
>
> Gilles
>
> On Wed, May 13, 2020 at 1:00 PM Konstantinos Konstantinidis via users
> wrote:
> >
> > Hi,
> > I have a naive question. I have built Open
her I need the C++ bindings (if still supported in
Open MPI), i.e., does mpicc needs Open MPI to be configured with
"--enable-mpi-cxx" for MPI4py to work?
I won't be coding in C++ at all.
Thanks,
Konstantinos Konstantinidis
Hi, I have some questions regarding technical details of MPI collective
communication methods and broadcast:
- I want to understand when the number of receivers in a MPI_Bcast can
be a problem slowing down the broadcast. There are a few implementations of
MPI_Bcast. Consider that of a bin
-- Forwarded message -
From: Konstantinos Konstantinidis
Date: Mon, Mar 18, 2019 at 9:21 PM
Subject: Re: [OMPI users] Received values is different than sent after
Isend() in MPI4py
To: George Bosilca
Even if this is an old post, for the sake of completeness, I fixed it by
Hi, consider a setup of one MPI sender process unicasting J messages to
each of N MPI receiver processes, i.e., the total number of transmissions
is J*N. Each transmission is a block of a matrix, which has been split both
horizontally and vertically. Each block has been stored as an element of a
2D
You should specify the MPI4py program file at the mpirun command. Since you
don't have a hostfile yet, you don't need to specify it in the command.
For example, in order to run a program named "test.py" with 4 MPI processes
you can use:
If using Python 2:
mpirun -np 4 python2 test.py
If using Py
Here are some instructions I have put together.
I am using Python 2 and Open MPI 2.1.2 so I changed the commands to work
for Python 3 and I tested them.
Hope it helps.
Regards,
Kostas
On Sun, Jun 3, 2018 at 1:56 PM, Neil k8it wrote:
> thanks, to all on this list for getting me this far.
> my
Consider matrices A: s x r and B: s x t. In the attached file, I am doing
matrix multiplication in a distributed manner with one master node and N
workers in order to compute C = A^T*B based on some algorithm.
For small matrices like if A and B are 10-by-10, I get the correct results
without any e
struction algorithm,
> then I suspect you’re just seeing natural floating-point precision loss
> inside only of the functions you’re calling there. Otherwise, if you made
> the second input by copying the output from the first, you just didn’t copy
> enough decimal places :-) .
>
> Ch
umpy are pickling the floating point values
> (vs. sending the exact bitmap of the floating point value), and some
> precision is being lost either in the pickling or the de-pickling. That's
> a guess, though.
>
>
>
> > On May 22, 2018, at 2:51 PM, Konstantinos Konstant
Assume an Python MPI program where a master node sends a pair of complex
matrices to each worker node and the worker node is supposed to compute
their product (conventional matrix product). The input matrices are
constructed at the master node according to some algorithm which there is
no need to e
l option
>
>
> Cheers,
>
>
> Gilles
>
>
> On 5/12/2018 6:58 AM, Konstantinos Konstantinidis wrote:
>
>> Yeap, exactly the hostfile I have is of the form
>>
>> node1 slots=1
>> node2 slots=1
>> node3 slots=1
>>
>> where the above hos
#x27;s the contents of your /etc/openmpi/openmpi-default-hostfile
> -- did you list some hostnames in there?
>
>
> > On May 11, 2018, at 4:43 AM, Konstantinos Konstantinidis <
> kostas1...@gmail.com> wrote:
> >
> > Hi,
> >
> > I have built O
Hi,
I have built Open MPI 2.1.2 multiple times on Ubuntu 16.04 and then I add
the line
orte_default_hostfile=/etc/openmpi/openmpi-default-hostfile
to the file
/etc/openmpi/openmpi-mca-params.conf
and I execute
sudo chown myUsername /etc/openmpi/openmpi-default-hostfile
For some reason this c
displs = (int*)malloc(comm_size * sizeof(int)); // allocate buffer
>>> to compute the displacements for each peer
>>> MPI_Allgather( &bytes_send_count, 1, MPI_LONG, recv_counts, 1, MPI_LONG,
>>> comm); // exchange the amount of sent data
>>> long total = 0;
forums such as https://stackoverflow.com
> are a better place for this.
>
> Cheers,
>
> Gilles
>
> On 11/30/2017 5:02 PM, Konstantinos Konstantinidis wrote:
>
>> Hi, I will use a small part of C++ code to demonstrate my problem during
>> shuffling. Assume that each sla
Hi, I will use a small part of C++ code to demonstrate my problem during
shuffling. Assume that each slave has to shuffle some unsigned char array
defined as *unsigned char* data *within some intracommunicator.
*unsigned lineSize = 100;*
*unsigned long long no_keys = 10;*
*int bytes_send_count = (
Hi, I have a communicator, say *comm*, and some shuffling of data takes
place within its nodes.
I have implemented the shuffling with broadcasts but now I am trying to
experiment with MPI_Allgather() and MPI_Allgatherv().
For demonstration purposes I am adding here a small part of the C++ code.
Y
I_Allgatherv( &(endata.data), endata.size*sizeof(char),
> MPI_UNSIGNED_CHAR, recv_buf, recv_counts, displs, MPI_UNSIGNED_CHAR, comm);
>
> George.
>
>
>
> On Tue, Nov 7, 2017 at 4:23 AM, Konstantinos Konstantinidis <
> kostas1...@gmail.com> wrote:
>
>> OK, I st
u can assume that nodes
>>> are connected by a network, able to move data at a rate B in both
>>> directions (full duplex). Assuming the implementation of the bcast
>>> algorithm is not entirely moronic, the bcast can saturate the network with
>>> a single proc
es
> will impose the real hard limit.
>
> That being said I have the impression you are trying to implement an
> MPI_Allgather(v) using a series of MPI_Bcast. Is that true ?
>
> George.
>
> PS: Few other constraints: the cost of creating the q^(k-1)]*(q-1)
> communicator migh
we exhaust all groups. My hope is
that the speedup can be such that the total number of broadcasts i.e.
[q^(k-1)]*(q-1)*k
to be executed in time equivalent to only [q^(k-1)]*k broadcasts.
Cheers,
Kostas.
On Tue, Oct 31, 2017 at 10:42 PM, Konstantinos Konstantinidis <
kostas1...@gmail.com>
Assume that we have K=q*k nodes (slaves) where q,k are positive integers >=
2.
Based on the scheme that I am currently using I create [q^(k-1)]*(q-1)
groups (along with their communicators). Each group consists of k nodes and
within each group exactly k broadcasts take place (each node broadcasts
sense
> you
> >>> imply. If you broadcast message size under the eager limit, the root
> may
> >>> return before any non-root processes enter the function. Data transfer
> may
> >>> happen prior to processes entering the function. Only rendezvous forces
> On Fri, Oct 20, 2017 at 3:27 PM Konstantinos Konstantinidis <
> kostas1...@gmail.com> wrote:
>
>> Hi,
>>
>> I am running some tests on Amazon EC2 and they require a lot of
>> communication among m3.large instances.
>>
>> I would like to give y
Hi,
I am running some tests on Amazon EC2 and they require a lot of
communication among m3.large instances.
I would like to give you an idea of what kind of communication takes place.
There are 40 m3.large instances. Now, 28672 groups of 5 instances are
formed in a specific manner (let's skip the
gt; better performances.
> you also have the option to write your own rules (e.g. which algo should
> be used based on communicator and message sizes) if you are not happy with
> the default rules.
> (that would be with the coll_tuned_dynamic_rules_filename MCA option)
>
> note c
I have implemented some algorithms in C++ which are greatly affected by
shuffling time among nodes which is done by some broadcast calls. Up to
now, I have been testing them by running something like
mpirun -mca btl ^openib -mca plm_rsh_no_tree_spawn 1 ./my_test
which I think make MPI_Bcast to wo
28 matches
Mail list logo