Thanks Ben and Lisandro.

You are right, my comparison based on the print() was misleading in terms
of precision so I probably didn't copy enough decimal places :)

I will also try to skip datatype specification and let MPI4py select the
datatype, and see what's going on.

On Tue, May 22, 2018 at 6:50 PM, Ben Menadue <ben.mena...@nci.org.au> wrote:

> Hi Jeff, Konstantinos,
>
> I think you might want MPI.C_DOUBLE_COMPLEX for your datatype, since
> np.complex128 is a double-precision. But I think it’s either ignoring this
> and using the datatype of the object you’re sending or MPI4py is handling
> the conversion in the backend somewhere. You could actually just drop the
> datatype specification and let MPI4py select the datatype for you, as you
> do on the receiver side.
>
> Modifying Jeff’s script to print out the product on the sender side as
> well, I see this:
>
> Sender computed (first):
> [[-7.97922801e+16+28534416.j]]
> Receiver computed (first):
> [[-7.97922801e+16+28534416.j]]
> Sender computed (second):
> [[-7.97922802e+16+48.j]]
> Receiver computed (second):
> [[-7.97922802e+16+48.j]]
>
> Even the real part of the result is slightly different between the two
> approaches (as is the case for your results). So the values are probably
> being sent correctly, it’s just that the values that are being sent are
> different. Adding np.set_printoptions(precision=20) to the program shows
> this:
>
> Sender sent (first):
> [[28534314.10478439+28534314.10478436j]]
> [[-1.3981811475968072e+09+1.3981811485968091e+09j]]
> Sender sent (second):
> [[28534314.10478439+28534314.10478436j]]
> [[-1.39818115e+09+1.39818115e+09j]]
>
> If the second value is what you expect from your construction algorithm,
> then I suspect you’re just seeing natural floating-point precision loss
> inside only of the functions you’re calling there. Otherwise, if you made
> the second input by copying the output from the first, you just didn’t copy
> enough decimal places :-) .
>
> Cheers,
> Ben
>
>
> On 23 May 2018, at 8:38 am, Konstantinos Konstantinidis <
> kostas1...@gmail.com> wrote:
>
> Thanks Jeff.
>
> I ran your code and saw your point. Based on that, it seems that my
> comparison by just printing the values was misleading.
>
> I have two questions for you:
>
> 1. Can you please describe your setup i.e. Python version, Numpy version,
> MPI4py version and Open MPI version? I'm asking since I am thinking of
> doing a fresh build and trying Python 3. What do you think?
>
> 2. When I try the following code (which manually computes the imaginary
> part of that same complex number) at any receiver:
>
> C_imag = np.dot(-28534314.10478436, 1.39818115e+09) +
> np.dot(28534314.10478439, 1.39818115e+09)
> print(C_imag)
>
> I see that the answer is 48 which is correct. Do you think that this fact
> points to MPI4py as the source of the precision loss problem, instead of
> numpy?
>
> Honestly, I don't understand how they have that serious bugs unresolved.
>
> On Tue, May 22, 2018 at 5:05 PM, Jeff Squyres (jsquyres) <
> jsquy...@cisco.com> wrote:
>
>> There are two issues:
>>
>> 1. You should be using MPI.C_COMPLEX, not MPI.COMPLEX.  MPI.COMPLEX is a
>> Fortran datatype; MPI.C_COMPLEX is the C datatype (which is what NumPy is
>> using behind the scenes).
>>
>> 2. Somehow the received B values are different between the two.
>>
>> I derived this program from your two programs to show the difference:
>>
>>     https://gist.github.com/jsquyres/2ed86736e475e9e9ccd08b66378ef968
>>
>> I don't know offhand how mpi4py sends floating point values -- but I'm
>> guessing that either mpi4py or numpy are pickling the floating point values
>> (vs. sending the exact bitmap of the floating point value), and some
>> precision is being lost either in the pickling or the de-pickling.  That's
>> a guess, though.
>>
>>
>>
>> > On May 22, 2018, at 2:51 PM, Konstantinos Konstantinidis <
>> kostas1...@gmail.com> wrote:
>> >
>> > Assume an Python MPI program where a master node sends a pair of
>> complex matrices to each worker node and the worker node is supposed to
>> compute their product (conventional matrix product). The input matrices are
>> constructed at the master node according to some algorithm which there is
>> no need to explain. Now imagine for simplicity that we have only 2 MPI
>> processes, one master and one worker. I have created two versions of this
>> program for this case. The first one constructs two complex numbers (1-by-1
>> matrices for simplicity) and sends them to the worker to compute the
>> product. This program is like a skeleton for what I am trying to do with
>> multiple workers. In the second program, I have omitted the algorithm and
>> have just hard-coded these two complex numbers into the code. The programs
>> are supposed to give the same product shown here:
>> >
>> > a = 28534314.10478439+28534314.10478436j
>> >
>> > b = -1.39818115e+09+1.39818115e+09j
>> >
>> > a*b = -7.97922802e+16+48j
>> >
>> > This has been checked in Matlab. Instead, the first program does not
>> work and the worker gives a*b = -7.97922801e+16+28534416.j while the second
>> program works correctly. Please note that the data is transmitted correctly
>> from the master to the worker and the data structures are the same in both
>> cases (see the print() functions).
>> >
>> > The first (wrong) program is program1.py and the second (correct) is
>> program2.py
>> >
>> > I am using MPI4py 3.0.0. along with Python 2.7.14 and the kernel of
>> Open MPI 2.1.2. I have been straggling with this problem for a whole day
>> and still cannot figure out what's going on. I have tried numerous
>> initializations like np.zeros(), np.zeros_like(), np.empty_like() as well
>> as both np.array and np.matrix and functions np.dot(), np.matmul() and the
>> operator *.
>> >
>> > Finally, I think that the problem is always with the imaginary part of
>> the product based on other examples I tried. Any suggestions?
>> > <program1.py><program2.py>__________________________________
>> _____________
>> > users mailing list
>> > users@lists.open-mpi.org
>> > https://lists.open-mpi.org/mailman/listinfo/users
>>
>>
>> --
>> Jeff Squyres
>> jsquy...@cisco.com
>>
>>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
>
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to