The beauty of floating point! Indeed, this is just a precision problem. Using 16 significative digits in program2.py produces the same result as in program1.py
@Jorge Pretty good example for your Numerical Calculus teaching, for those kids that ask over and over again "What's the point of these nasty floating point lectures? Who cares?" On Wed, 23 May 2018 at 02:52, Ben Menadue <ben.mena...@nci.org.au> wrote: > Hi Jeff, Konstantinos, > I think you might want MPI.C_DOUBLE_COMPLEX for your datatype, since np.complex128 is a double-precision. But I think it’s either ignoring this and using the datatype of the object you’re sending or MPI4py is handling the conversion in the backend somewhere. You could actually just drop the datatype specification and let MPI4py select the datatype for you, as you do on the receiver side. > Modifying Jeff’s script to print out the product on the sender side as well, I see this: > Sender computed (first): > [[-7.97922801e+16+28534416.j]] > Receiver computed (first): > [[-7.97922801e+16+28534416.j]] > Sender computed (second): > [[-7.97922802e+16+48.j]] > Receiver computed (second): > [[-7.97922802e+16+48.j]] > Even the real part of the result is slightly different between the two approaches (as is the case for your results). So the values are probably being sent correctly, it’s just that the values that are being sent are different. Adding np.set_printoptions(precision=20) to the program shows this: > Sender sent (first): > [[28534314.10478439+28534314.10478436j]] > [[-1.3981811475968072e+09+1.3981811485968091e+09j]] > Sender sent (second): > [[28534314.10478439+28534314.10478436j]] > [[-1.39818115e+09+1.39818115e+09j]] > If the second value is what you expect from your construction algorithm, then I suspect you’re just seeing natural floating-point precision loss inside only of the functions you’re calling there. Otherwise, if you made the second input by copying the output from the first, you just didn’t copy enough decimal places :-) . > Cheers, > Ben > On 23 May 2018, at 8:38 am, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: > Thanks Jeff. > I ran your code and saw your point. Based on that, it seems that my comparison by just printing the values was misleading. > I have two questions for you: > 1. Can you please describe your setup i.e. Python version, Numpy version, MPI4py version and Open MPI version? I'm asking since I am thinking of doing a fresh build and trying Python 3. What do you think? > 2. When I try the following code (which manually computes the imaginary part of that same complex number) at any receiver: > C_imag = np.dot(-28534314.10478436, 1.39818115e+09) + np.dot(28534314.10478439, 1.39818115e+09) > print(C_imag) > I see that the answer is 48 which is correct. Do you think that this fact points to MPI4py as the source of the precision loss problem, instead of numpy? > Honestly, I don't understand how they have that serious bugs unresolved. > On Tue, May 22, 2018 at 5:05 PM, Jeff Squyres (jsquyres) < jsquy...@cisco.com> wrote: >> There are two issues: >> 1. You should be using MPI.C_COMPLEX, not MPI.COMPLEX. MPI.COMPLEX is a Fortran datatype; MPI.C_COMPLEX is the C datatype (which is what NumPy is using behind the scenes). >> 2. Somehow the received B values are different between the two. >> I derived this program from your two programs to show the difference: >> https://gist.github.com/jsquyres/2ed86736e475e9e9ccd08b66378ef968 >> I don't know offhand how mpi4py sends floating point values -- but I'm guessing that either mpi4py or numpy are pickling the floating point values (vs. sending the exact bitmap of the floating point value), and some precision is being lost either in the pickling or the de-pickling. That's a guess, though. >> > On May 22, 2018, at 2:51 PM, Konstantinos Konstantinidis < kostas1...@gmail.com> wrote: >> > >> > Assume an Python MPI program where a master node sends a pair of complex matrices to each worker node and the worker node is supposed to compute their product (conventional matrix product). The input matrices are constructed at the master node according to some algorithm which there is no need to explain. Now imagine for simplicity that we have only 2 MPI processes, one master and one worker. I have created two versions of this program for this case. The first one constructs two complex numbers (1-by-1 matrices for simplicity) and sends them to the worker to compute the product. This program is like a skeleton for what I am trying to do with multiple workers. In the second program, I have omitted the algorithm and have just hard-coded these two complex numbers into the code. The programs are supposed to give the same product shown here: >> > >> > a = 28534314.10478439+28534314.10478436j >> > >> > b = -1.39818115e+09+1.39818115e+09j >> > >> > a*b = -7.97922802e+16+48j >> > >> > This has been checked in Matlab. Instead, the first program does not work and the worker gives a*b = -7.97922801e+16+28534416.j while the second program works correctly. Please note that the data is transmitted correctly from the master to the worker and the data structures are the same in both cases (see the print() functions). >> > >> > The first (wrong) program is program1.py and the second (correct) is program2.py >> > >> > I am using MPI4py 3.0.0. along with Python 2.7.14 and the kernel of Open MPI 2.1.2. I have been straggling with this problem for a whole day and still cannot figure out what's going on. I have tried numerous initializations like np.zeros(), np.zeros_like(), np.empty_like() as well as both np.array and np.matrix and functions np.dot(), np.matmul() and the operator *. >> > >> > Finally, I think that the problem is always with the imaginary part of the product based on other examples I tried. Any suggestions? >> > <program1.py><program2.py>_______________________________________________ >> > users mailing list >> > users@lists.open-mpi.org >> > https://lists.open-mpi.org/mailman/listinfo/users >> -- >> Jeff Squyres >> jsquy...@cisco.com > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users -- Lisandro Dalcin ============ Research Scientist Computer, Electrical and Mathematical Sciences & Engineering (CEMSE) Extreme Computing Research Center (ECRC) King Abdullah University of Science and Technology (KAUST) http://ecrc.kaust.edu.sa/ 4700 King Abdullah University of Science and Technology al-Khawarizmi Bldg (Bldg 1), Office # 0109 Thuwal 23955-6900, Kingdom of Saudi Arabia http://www.kaust.edu.sa Office Phone: +966 12 808-0459 _______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users