On Oct 30, 2012, at 09:57 , Jeff Squyres <jsquy...@cisco.com> wrote:

> On Oct 30, 2012, at 9:51 AM, Hodge, Gary C wrote:
> 
>> FYI, recently, I was tracking down the source of page faults in our 
>> application that has real-time requirements.  I found that disabling the sm 
>> component (--mca btl ^sm) eliminated many page faults I was seeing.  
> 
> Good point.  This is likely true; the shared memory component will definitely 
> cause more page faults.  Using huge pages may alleviate this (e.g., less TLB 
> usage), but we haven't studied it much.

This will depend on the communication pattern of the application and the size 
of the messages. A rise in the number of page faults is not a normal behavior 
and it is mostly unexpected in most of the common execution scenarios. We reuse 
the memory pages in the SM BTL, minimizing the page faults as well as the TLB 
misses.

If the sharp increase in the number of page fault is indeed to be blamed on the 
SM BTL, this is more than worrisome, as it might in indicate a wrong usage of 
the reserved memory pages (like a FIFO instead of a LIFO). Can you provide us 
with more precise information regarding this please.

Thanks,
  george.


>> I now have much better deterministic performance in that I no longer see 
>> outlier measurements (jobs that usually take 3 ms would sometimes take 15 
>> ms).  
> 
> I'm not sure I grok that; are you benchmarking an entire *job* (i.e., a 
> single "mpirun") that varies between 3 and 15 milliseconds?  If so, I'd say 
> that both are pretty darn good, because mpirun invokes a lot of overhead for 
> launching and completing jobs.  Furthermore, benchmarking an entire job that 
> lasts significantly less than 1 second is probably not the most stable 
> measurement, regardless of page faults or not -- there's lots of other 
> distributed and OS effects that can cause a jump from 3 to 15 milliseconds. 
> 
>> I did not notice a performance penalty using a network stack.
> 
> Depends on the app.  Some MPI apps are latency bound; some are not.
> 
> Latency-bound applications will definitely benefit from faster point-to-point 
> performance.  Shared memory will definitely have the fastest point-to-point 
> latency compared to any network stack (i.e., hundreds of nanos vs. 1+ micro).
> 
> -- 
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to: 
> http://www.cisco.com/web/about/doing_business/legal/cri/
> 
> 
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


Reply via email to