Thanks Ben and Lisandro. You are right, my comparison based on the print() was misleading in terms of precision so I probably didn't copy enough decimal places :)
I will also try to skip datatype specification and let MPI4py select the datatype, and see what's going on. On Tue, May 22, 2018 at 6:50 PM, Ben Menadue <ben.mena...@nci.org.au> wrote: > Hi Jeff, Konstantinos, > > I think you might want MPI.C_DOUBLE_COMPLEX for your datatype, since > np.complex128 is a double-precision. But I think it’s either ignoring this > and using the datatype of the object you’re sending or MPI4py is handling > the conversion in the backend somewhere. You could actually just drop the > datatype specification and let MPI4py select the datatype for you, as you > do on the receiver side. > > Modifying Jeff’s script to print out the product on the sender side as > well, I see this: > > Sender computed (first): > [[-7.97922801e+16+28534416.j]] > Receiver computed (first): > [[-7.97922801e+16+28534416.j]] > Sender computed (second): > [[-7.97922802e+16+48.j]] > Receiver computed (second): > [[-7.97922802e+16+48.j]] > > Even the real part of the result is slightly different between the two > approaches (as is the case for your results). So the values are probably > being sent correctly, it’s just that the values that are being sent are > different. Adding np.set_printoptions(precision=20) to the program shows > this: > > Sender sent (first): > [[28534314.10478439+28534314.10478436j]] > [[-1.3981811475968072e+09+1.3981811485968091e+09j]] > Sender sent (second): > [[28534314.10478439+28534314.10478436j]] > [[-1.39818115e+09+1.39818115e+09j]] > > If the second value is what you expect from your construction algorithm, > then I suspect you’re just seeing natural floating-point precision loss > inside only of the functions you’re calling there. Otherwise, if you made > the second input by copying the output from the first, you just didn’t copy > enough decimal places :-) . > > Cheers, > Ben > > > On 23 May 2018, at 8:38 am, Konstantinos Konstantinidis < > kostas1...@gmail.com> wrote: > > Thanks Jeff. > > I ran your code and saw your point. Based on that, it seems that my > comparison by just printing the values was misleading. > > I have two questions for you: > > 1. Can you please describe your setup i.e. Python version, Numpy version, > MPI4py version and Open MPI version? I'm asking since I am thinking of > doing a fresh build and trying Python 3. What do you think? > > 2. When I try the following code (which manually computes the imaginary > part of that same complex number) at any receiver: > > C_imag = np.dot(-28534314.10478436, 1.39818115e+09) + > np.dot(28534314.10478439, 1.39818115e+09) > print(C_imag) > > I see that the answer is 48 which is correct. Do you think that this fact > points to MPI4py as the source of the precision loss problem, instead of > numpy? > > Honestly, I don't understand how they have that serious bugs unresolved. > > On Tue, May 22, 2018 at 5:05 PM, Jeff Squyres (jsquyres) < > jsquy...@cisco.com> wrote: > >> There are two issues: >> >> 1. You should be using MPI.C_COMPLEX, not MPI.COMPLEX. MPI.COMPLEX is a >> Fortran datatype; MPI.C_COMPLEX is the C datatype (which is what NumPy is >> using behind the scenes). >> >> 2. Somehow the received B values are different between the two. >> >> I derived this program from your two programs to show the difference: >> >> https://gist.github.com/jsquyres/2ed86736e475e9e9ccd08b66378ef968 >> >> I don't know offhand how mpi4py sends floating point values -- but I'm >> guessing that either mpi4py or numpy are pickling the floating point values >> (vs. sending the exact bitmap of the floating point value), and some >> precision is being lost either in the pickling or the de-pickling. That's >> a guess, though. >> >> >> >> > On May 22, 2018, at 2:51 PM, Konstantinos Konstantinidis < >> kostas1...@gmail.com> wrote: >> > >> > Assume an Python MPI program where a master node sends a pair of >> complex matrices to each worker node and the worker node is supposed to >> compute their product (conventional matrix product). The input matrices are >> constructed at the master node according to some algorithm which there is >> no need to explain. Now imagine for simplicity that we have only 2 MPI >> processes, one master and one worker. I have created two versions of this >> program for this case. The first one constructs two complex numbers (1-by-1 >> matrices for simplicity) and sends them to the worker to compute the >> product. This program is like a skeleton for what I am trying to do with >> multiple workers. In the second program, I have omitted the algorithm and >> have just hard-coded these two complex numbers into the code. The programs >> are supposed to give the same product shown here: >> > >> > a = 28534314.10478439+28534314.10478436j >> > >> > b = -1.39818115e+09+1.39818115e+09j >> > >> > a*b = -7.97922802e+16+48j >> > >> > This has been checked in Matlab. Instead, the first program does not >> work and the worker gives a*b = -7.97922801e+16+28534416.j while the second >> program works correctly. Please note that the data is transmitted correctly >> from the master to the worker and the data structures are the same in both >> cases (see the print() functions). >> > >> > The first (wrong) program is program1.py and the second (correct) is >> program2.py >> > >> > I am using MPI4py 3.0.0. along with Python 2.7.14 and the kernel of >> Open MPI 2.1.2. I have been straggling with this problem for a whole day >> and still cannot figure out what's going on. I have tried numerous >> initializations like np.zeros(), np.zeros_like(), np.empty_like() as well >> as both np.array and np.matrix and functions np.dot(), np.matmul() and the >> operator *. >> > >> > Finally, I think that the problem is always with the imaginary part of >> the product based on other examples I tried. Any suggestions? >> > <program1.py><program2.py>__________________________________ >> _____________ >> > users mailing list >> > users@lists.open-mpi.org >> > https://lists.open-mpi.org/mailman/listinfo/users >> >> >> -- >> Jeff Squyres >> jsquy...@cisco.com >> >> > _______________________________________________ > users mailing list > users@lists.open-mpi.org > https://lists.open-mpi.org/mailman/listinfo/users > > >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users