Simone,

I think most of the issues with the numbers you're getting are coming from the 
internal protocols of Open MPI and the way the compilers "optimize" the memcpy 
function. In fact the memcpy function translate to different execution path 
based on the size of the data. For large memory copies MMX or SSExxx 
instructions are used. For smaller copies some compilers use the movsb 
instruction to implement the memcpy. This leads to a significantly smaller 
number of branches in the PAPI reading, because the movsb __always__ counts as 
a single branch.

In a similar context we ended up highjacking the memcpy function in order to be 
able to count the number of branches/misses/instructions and then remove it 
from the number seen by the upper level. This gives a more consistent view of 
the number of branches as the compiler choice of the memcpy variant is outside 
your counting.

  george.


On Apr 19, 2012, at 06:53 , Simone Pellegrini wrote:

> Enough with the context, this is what I am observing. At 16 MB there is a 
> clear increase in the number of instructions and branch instructions (and 
> this can be explained by my settings of eager_limit and max send size).
> 
> However something weird already happens at 32K where I clearly see an 
> increase in the number of branches and total instructions. The fact is that 
> there are almost 0 branch instructions until 32KB and starting from 32KB to 
> 16MB there is a linear increase. At 16MB there is another jump and then again 
> linear increase.
> It seems that there is another threshold driving this behavior. I tried to 
> set these other parameters for the SM BTL, btl_sm_fifo_size, 
> btl_sm_exclusivity but nothing changed. For my understanding of MPI, this 
> should be a kind of pipe-lining of the message which is being transferred by 
> chunks (of probably 32KB size).
> 
> How can I override this behavior? Is there any parameter I can set?
> 
> 
> I also noticed that while this is happening for the MPI_Send, the MPI_Recv 
> operation behaves differently. For the receive routine there is no bump in 
> terms of branch and total instructions. The increase is linear starting from 
> 64 bytes. The increase of branch instructions slows down however after the 
> 16MB threshold. My idea about that is that probably the receive is busy 
> waiting for the message and therefore the number of branches grows 
> proportionally with the time spent for the message to arrive.
> 
> This is my hypothesis but you probably know better.
> graphs are attached. Thanks in advance for your help.


Reply via email to