On Feb 4, 2008, at 11:56 AM, Number Cruncher wrote:
George Bosilca wrote:Now, the overlapping case is a real exception. Obviously, it happenedfor at least two peoples (as per mailing list search) in about 4 years,but without affecting the correctness of the application. Is that a reason good enough to effect the overall performance of all parallel applications using Open MPI ? You can already guess my stance.Thanks for the reply. I agree with your pragmatic approach in general, and the lack of widespread problems certainly puts this low priority. However, there *is* a reason for the memmove/memcpy distinction,otherwise there'd only be a single API point in libc. And, as you state,that reason is performance. One day someone will write some optimized memcpy that *isn't* a simple forward copy. I'm old enough to remember the Z80 instructions LDDR and LDIR (http://www.sincuser.f9.co.uk/044/mcode.htm) for assembly-level memory copying. A memmove would have to choose between the two; memcpy couldlegitimately use either and would corrupt overlapping memory 50% of thetime.
I did start with the Z80 too ... but now it looks like it was in the "ice age" :)
However, I can imagine a way to rewrite the last step of the bruck algorithm to avoid this problem, and without affecting the overall performance.Totally agree. The vast majority of OpenMPI stuff uses memcpy fine. It would just be a local bug fix. Can I volunteer?
Of course, feel free to join the fun. Here is what I had in mind. The final step in the bruck algorithm can be completely discarded for the first half of the processes, if we compute the receive buffer smartly. For the other half, I guess we can do the copy one non overlapping piece of data at the time, eventually without the need or an additional buffer.
Thanks, george.
Regards, SimonThanks, George. On Jan 30, 2008, at 9:41 AM, Number Cruncher wrote:I'm getting many "Source and destination overlap in memcpy" errors whenrunning my application on an odd number of procs. I believe this is because the Allgather collective is using Bruck's algorithm and doing a shift on the buffer as a finalisation step (coll_tuned_allgather.c): tmprecv = (char*) rbuf; tmpsend = (char*) rbuf + (size - rank) * rcount * rext; err = ompi_ddt_copy_content_same_ddt(rdtype, rank * rcount, tmprecv, tmpsend);Unfortunately ompi_ddt_copy_content_same_ddt does a memcpy, instead ofthe memmove which is needed here. For this buffer-left-shift, any forward-copying memcpy should actually be OK as it won't overwriteitself during the copy, but this violates the precondition of memcpy andmay break for some implementations. I think this issue was dismissed too lightly previously: http://www.open-mpi.org/community/lists/users/2007/08/3873.php Thanks, Simon _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users------------------------------------------------------------------------ _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users
smime.p7s
Description: S/MIME cryptographic signature