On Feb 4, 2008, at 11:56 AM, Number Cruncher wrote:

George Bosilca wrote:

Now, the overlapping case is a real exception. Obviously, it happened
for at least two peoples (as per mailing list search) in about 4 years,
but without affecting the correctness of the application. Is that a
reason good enough to effect the overall performance of all parallel
applications using Open MPI ? You can already guess my stance.


Thanks for the reply. I agree with your pragmatic approach in general,
and the lack of widespread problems certainly puts this low priority.
However, there *is* a reason for the memmove/memcpy distinction,
otherwise there'd only be a single API point in libc. And, as you state,
that reason is performance. One day someone will write some optimized
memcpy that *isn't* a simple forward copy.

I'm old enough to remember the Z80 instructions LDDR and LDIR
(http://www.sincuser.f9.co.uk/044/mcode.htm) for assembly-level memory
copying. A memmove would have to choose between the two; memcpy could
legitimately use either and would corrupt overlapping memory 50% of the
time.

I did start with the Z80 too ... but now it looks like it was in the "ice age" :)

However, I can imagine a way to rewrite the last step of the bruck
algorithm to avoid this problem, and without affecting the overall
performance.

Totally agree. The vast majority of OpenMPI stuff uses memcpy fine. It
would just be a local bug fix. Can I volunteer?

Of course, feel free to join the fun. Here is what I had in mind. The final step in the bruck algorithm can be completely discarded for the first half of the processes, if we compute the receive buffer smartly. For the other half, I guess we can do the copy one non overlapping piece of data at the time, eventually without the need or an additional buffer.

  Thanks,
    george.



Regards,
Simon

 Thanks,
   George.

On Jan 30, 2008, at 9:41 AM, Number Cruncher wrote:

I'm getting many "Source and destination overlap in memcpy" errors when
running my application on an odd number of procs.

I believe this is because the Allgather collective is using Bruck's
algorithm and doing a shift on the buffer as a finalisation step
(coll_tuned_allgather.c):

tmprecv = (char*) rbuf;
tmpsend = (char*) rbuf + (size - rank) * rcount * rext;

err = ompi_ddt_copy_content_same_ddt(rdtype, rank * rcount,
                                             tmprecv, tmpsend);

Unfortunately ompi_ddt_copy_content_same_ddt does a memcpy, instead of
the memmove which is needed here. For this buffer-left-shift, any
forward-copying memcpy should actually be OK as it won't overwrite
itself during the copy, but this violates the precondition of memcpy and
may break for some implementations.

I think this issue was dismissed too lightly previously:
http://www.open-mpi.org/community/lists/users/2007/08/3873.php

Thanks,
Simon


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


------------------------------------------------------------------------

_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Attachment: smime.p7s
Description: S/MIME cryptographic signature

Reply via email to