Hi, I am getting strange performance results for allgatherv operation for the same number of procs and data, but with varying binding width. For example here are two cases with about 180x difference in performance.
Each machine has 4 sockets each with 6 cores totaling 24 cores per node (topology attached). Case 1 ---- 12 procs per node each bound to 1 core times 30 nodes --> 1929 ms Case 2 ---- 12 procs per node each bound to 2 cores times 30 nodes --> 357209 ms Another set of variations for 2 procs per node and 4 procs per node is given below in the chart. Is such variation expected with binding width? I am a bit puzzled and would appreciate any help to understand this. [image: Inline image 1] Thank you, Saliya -- Saliya Ekanayake Ph.D. Candidate | Research Assistant School of Informatics and Computing | Digital Science Center Indiana University, Bloomington Cell 812-391-4914 http://saliya.org