Hello, I have been running some simple benchmarks and saw some strange behaviour: All tests are done on 4 nodes with 24 cores each (total of 96 mpi processes)
When I run MPI_Allreduce() I see the run time spike up (about 10x) when I go from reducing a total of 4096KB to 8192KB for example, when count is 2^21 (8192 kb of 4 byte ints): MPI_Allreduce(send_buf, recv_buf, count, MPI_SUM, MPI_COMM_WORLD) is slower than: MPI_Allreduce(send_buf, recv_buf, count*/2*, MPI_INT, MPI_SUM, MPI_COMM_WORLD) MPI_Allreduce(send_buf* + count/2*, recv_buf *+ count/2*, count*/2*,MPI_INT, MPI_SUM, MPI_COMM_WORLD) Just wondering if anyone knows what the cause of this behaviour is. Thanks! Cooper Cooper Burns Senior Research Engineer <https://www.linkedin.com/company/convergent-science-inc> <https://www.facebook.com/ConvergentScience> <https://twitter.com/convergecfd> <https://www.youtube.com/user/convergecfd> <https://vimeo.com/convergecfd> (608) 230-1551 convergecfd.com <https://convergecfd.com/?utm_source=Email&utm_medium=signature&utm_campaign=CSIEmailSignature>
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users