Re: [OMPI users] Python code inconsistency on complex multiplication in MPI (MPI4py)

Jeff Squyres (jsquyres) Tue, 22 May 2018 15:08:19 -0700

There are two issues:

1. You should be using MPI.C_COMPLEX, not MPI.COMPLEX.  MPI.COMPLEX is a 
Fortran datatype; MPI.C_COMPLEX is the C datatype (which is what NumPy is using 
behind the scenes).


2. Somehow the received B values are different between the two.

I derived this program from your two programs to show the difference:

    https://gist.github.com/jsquyres/2ed86736e475e9e9ccd08b66378ef968

I don't know offhand how mpi4py sends floating point values -- but I'm guessing 
that either mpi4py or numpy are pickling the floating point values (vs. sending 
the exact bitmap of the floating point value), and some precision is being lost 
either in the pickling or the de-pickling.  That's a guess, though.



> On May 22, 2018, at 2:51 PM, Konstantinos Konstantinidis 
> <kostas1...@gmail.com> wrote:
> 
> Assume an Python MPI program where a master node sends a pair of complex 
> matrices to each worker node and the worker node is supposed to compute their 
> product (conventional matrix product). The input matrices are constructed at 
> the master node according to some algorithm which there is no need to 
> explain. Now imagine for simplicity that we have only 2 MPI processes, one 
> master and one worker. I have created two versions of this program for this 
> case. The first one constructs two complex numbers (1-by-1 matrices for 
> simplicity) and sends them to the worker to compute the product. This program 
> is like a skeleton for what I am trying to do with multiple workers. In the 
> second program, I have omitted the algorithm and have just hard-coded these 
> two complex numbers into the code. The programs are supposed to give the same 
> product shown here:
> 
> a = 28534314.10478439+28534314.10478436j
> 
> b = -1.39818115e+09+1.39818115e+09j
> 
> a*b = -7.97922802e+16+48j
> 
> This has been checked in Matlab. Instead, the first program does not work and 
> the worker gives a*b = -7.97922801e+16+28534416.j while the second program 
> works correctly. Please note that the data is transmitted correctly from the 
> master to the worker and the data structures are the same in both cases (see 
> the print() functions). 
> 
> The first (wrong) program is program1.py and the second (correct) is 
> program2.py
> 
> I am using MPI4py 3.0.0. along with Python 2.7.14 and the kernel of Open MPI 
> 2.1.2. I have been straggling with this problem for a whole day and still 
> cannot figure out what's going on. I have tried numerous initializations like 
> np.zeros(), np.zeros_like(), np.empty_like() as well as both np.array and 
> np.matrix and functions np.dot(), np.matmul() and the operator *. 
> 
> Finally, I think that the problem is always with the imaginary part of the 
> product based on other examples I tried. Any suggestions?
> <program1.py><program2.py>_______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


-- 
Jeff Squyres
jsquy...@cisco.com

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Python code inconsistency on complex multiplication in MPI (MPI4py)

Reply via email to