Re: [OMPI users] Python code inconsistency on complex multiplication in MPI (MPI4py)

Lisandro Dalcin Wed, 23 May 2018 15:26:19 -0700

The beauty of floating point! Indeed, this is just a precision problem.
Using 16 significative digits in program2.py produces the same result as in
program1.py


@Jorge Pretty good example for your Numerical Calculus teaching, for those
kids that ask over and over again "What's the point of these nasty floating
point lectures? Who cares?"
On Wed, 23 May 2018 at 02:52, Ben Menadue <ben.mena...@nci.org.au> wrote:

> Hi Jeff, Konstantinos,

> I think you might want MPI.C_DOUBLE_COMPLEX for your datatype, since
np.complex128 is a double-precision. But I think it’s either ignoring this
and using the datatype of the object you’re sending or MPI4py is handling
the conversion in the backend somewhere. You could actually just drop the
datatype specification and let MPI4py select the datatype for you, as you
do on the receiver side.

> Modifying Jeff’s script to print out the product on the sender side as
well, I see this:

> Sender computed (first):
> [[-7.97922801e+16+28534416.j]]
> Receiver computed (first):
> [[-7.97922801e+16+28534416.j]]
> Sender computed (second):
> [[-7.97922802e+16+48.j]]
> Receiver computed (second):
> [[-7.97922802e+16+48.j]]

> Even the real part of the result is slightly different between the two
approaches (as is the case for your results). So the values are probably
being sent correctly, it’s just that the values that are being sent are
different. Adding np.set_printoptions(precision=20) to the program shows
this:

> Sender sent (first):
> [[28534314.10478439+28534314.10478436j]]
> [[-1.3981811475968072e+09+1.3981811485968091e+09j]]
> Sender sent (second):
> [[28534314.10478439+28534314.10478436j]]
> [[-1.39818115e+09+1.39818115e+09j]]

> If the second value is what you expect from your construction algorithm,
then I suspect you’re just seeing natural floating-point precision loss
inside only of the functions you’re calling there. Otherwise, if you made
the second input by copying the output from the first, you just didn’t copy
enough decimal places :-) .

> Cheers,
> Ben


> On 23 May 2018, at 8:38 am, Konstantinos Konstantinidis <
kostas1...@gmail.com> wrote:

> Thanks Jeff.

> I ran your code and saw your point. Based on that, it seems that my
comparison by just printing the values was misleading.

> I have two questions for you:

> 1. Can you please describe your setup i.e. Python version, Numpy version,
MPI4py version and Open MPI version? I'm asking since I am thinking of
doing a fresh build and trying Python 3. What do you think?

> 2. When I try the following code (which manually computes the imaginary
part of that same complex number) at any receiver:

> C_imag = np.dot(-28534314.10478436, 1.39818115e+09) +
np.dot(28534314.10478439, 1.39818115e+09)
> print(C_imag)

> I see that the answer is 48 which is correct. Do you think that this fact
points to MPI4py as the source of the precision loss problem, instead of
numpy?

> Honestly, I don't understand how they have that serious bugs unresolved.

> On Tue, May 22, 2018 at 5:05 PM, Jeff Squyres (jsquyres) <
jsquy...@cisco.com> wrote:

>> There are two issues:

>> 1. You should be using MPI.C_COMPLEX, not MPI.COMPLEX.  MPI.COMPLEX is a
Fortran datatype; MPI.C_COMPLEX is the C datatype (which is what NumPy is
using behind the scenes).

>> 2. Somehow the received B values are different between the two.

>> I derived this program from your two programs to show the difference:

>>      https://gist.github.com/jsquyres/2ed86736e475e9e9ccd08b66378ef968

>> I don't know offhand how mpi4py sends floating point values -- but I'm
guessing that either mpi4py or numpy are pickling the floating point values
(vs. sending the exact bitmap of the floating point value), and some
precision is being lost either in the pickling or the de-pickling.  That's
a guess, though.



>> > On May 22, 2018, at 2:51 PM, Konstantinos Konstantinidis <
kostas1...@gmail.com> wrote:
>> >
>> > Assume an Python MPI program where a master node sends a pair of
complex matrices to each worker node and the worker node is supposed to
compute their product (conventional matrix product). The input matrices are
constructed at the master node according to some algorithm which there is
no need to explain. Now imagine for simplicity that we have only 2 MPI
processes, one master and one worker. I have created two versions of this
program for this case. The first one constructs two complex numbers (1-by-1
matrices for simplicity) and sends them to the worker to compute the
product. This program is like a skeleton for what I am trying to do with
multiple workers. In the second program, I have omitted the algorithm and
have just hard-coded these two complex numbers into the code. The programs
are supposed to give the same product shown here:
>> >
>> > a = 28534314.10478439+28534314.10478436j
>> >
>> > b = -1.39818115e+09+1.39818115e+09j
>> >
>> > a*b = -7.97922802e+16+48j
>> >
>> > This has been checked in Matlab. Instead, the first program does not
work and the worker gives a*b = -7.97922801e+16+28534416.j while the second
program works correctly. Please note that the data is transmitted correctly
from the master to the worker and the data structures are the same in both
cases (see the print() functions).
>> >
>> > The first (wrong) program is program1.py and the second (correct) is
program2.py
>> >
>> > I am using MPI4py 3.0.0. along with Python 2.7.14 and the kernel of
Open MPI 2.1.2. I have been straggling with this problem for a whole day
and still cannot figure out what's going on. I have tried numerous
initializations like np.zeros(), np.zeros_like(), np.empty_like() as well
as both np.array and np.matrix and functions np.dot(), np.matmul() and the
operator *.
>> >
>> > Finally, I think that the problem is always with the imaginary part of
the product based on other examples I tried. Any suggestions?
>> >
<program1.py><program2.py>_______________________________________________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users


>> --
>> Jeff Squyres
>> jsquy...@cisco.com


> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users


> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
Lisandro Dalcin
============
Research Scientist
Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
Extreme Computing Research Center (ECRC)
King Abdullah University of Science and Technology (KAUST)
http://ecrc.kaust.edu.sa/

4700 King Abdullah University of Science and Technology
al-Khawarizmi Bldg (Bldg 1), Office # 0109
Thuwal 23955-6900, Kingdom of Saudi Arabia
http://www.kaust.edu.sa

Office Phone: +966 12 808-0459
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Re: [OMPI users] Python code inconsistency on complex multiplication in MPI (MPI4py)

Reply via email to