Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

Hodge, Gary C Fri, 2 Nov 2012 09:58:00 -0400

There is 8GB memory on each node, with 6GB available, swap is off by commenting 
it out in the /etc/fstab


I cannot try the alternate mechanisms right now, thanks for the info, will try 
it when we move up to 1.6.1

-----Original Message-----
From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Jeff Squyres
Sent: Friday, November 02, 2012 9:32 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

What's the memory usage on your nodes -- are you invoking swap, perchance?

Can you try one of the other shared memory mechanisms (sysv or posix)?  (I just 
described how in my previous email)


On Nov 1, 2012, at 11:24 AM, Hodge, Gary C wrote:

> George,
> 
> We move 40K and 160K size messages from process to process on the same node.  
> Our app does mlock(MCL_CURRENT | MCL_FUTURE) before MPI_INIT.
> I measure the page faults using getrusage and record when they increase. I 
> observe increasing ru_minflt values and no ru_majflt increase.
> Increased values reported are 40, 80, or 120; our page size is 4K.  The page 
> reclaims/faults are checked after MPI receive processing,
> after our application processing, and after MPI send processing.  Our 
> application processing is not the source of increasing reclaims/faults.
> I observe the disk I/O light flashing on nodes when we report increasing 
> reclaims/faults.
> When I turn off the SM BTL, the reclaims stop increasing and the disk I/O 
> light does not blink.
> 
> -----Original Message-----
> From: George Bosilca [mailto:bosi...@icl.utk.edu] 
> Sent: Thursday, November 01, 2012 12:25 AM
> To: Open MPI Users
> Cc: Hodge, Gary C
> Subject: Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature
> 
> 
> On Oct 30, 2012, at 09:57 , Jeff Squyres <jsquy...@cisco.com> wrote:
> 
>> On Oct 30, 2012, at 9:51 AM, Hodge, Gary C wrote:
>> 
>>> FYI, recently, I was tracking down the source of page faults in our 
>>> application that has real-time requirements.  I found that disabling the sm 
>>> component (--mca btl ^sm) eliminated many page faults I was seeing.  
>> 
>> Good point.  This is likely true; the shared memory component will 
>> definitely cause more page faults.  Using huge pages may alleviate this 
>> (e.g., less TLB usage), but we haven't studied it much.
> 
> This will depend on the communication pattern of the application and the size 
> of the messages. A rise in the number of page faults is not a normal behavior 
> and it is mostly unexpected in most of the common execution scenarios. We 
> reuse the memory pages in the SM BTL, minimizing the page faults as well as 
> the TLB misses.
> 
> If the sharp increase in the number of page fault is indeed to be blamed on 
> the SM BTL, this is more than worrisome, as it might in indicate a wrong 
> usage of the reserved memory pages (like a FIFO instead of a LIFO). Can you 
> provide us with more precise information regarding this please.
> 
> Thanks,
>  george.
> 
> 
>>> I now have much better deterministic performance in that I no longer see 
>>> outlier measurements (jobs that usually take 3 ms would sometimes take 15 
>>> ms).  
>> 
>> I'm not sure I grok that; are you benchmarking an entire *job* (i.e., a 
>> single "mpirun") that varies between 3 and 15 milliseconds?  If so, I'd say 
>> that both are pretty darn good, because mpirun invokes a lot of overhead for 
>> launching and completing jobs.  Furthermore, benchmarking an entire job that 
>> lasts significantly less than 1 second is probably not the most stable 
>> measurement, regardless of page faults or not -- there's lots of other 
>> distributed and OS effects that can cause a jump from 3 to 15 milliseconds. 
>> 
>>> I did not notice a performance penalty using a network stack.
>> 
>> Depends on the app.  Some MPI apps are latency bound; some are not.
>> 
>> Latency-bound applications will definitely benefit from faster 
>> point-to-point performance.  Shared memory will definitely have the fastest 
>> point-to-point latency compared to any network stack (i.e., hundreds of 
>> nanos vs. 1+ micro).
>> 
>> -- 
>> Jeff Squyres
>> jsquy...@cisco.com
>> For corporate legal information go to: 
>> http://www.cisco.com/web/about/doing_business/legal/cri/
>> 
>> 
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> 
> <ompi-output.tar.bz2>_______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Re: [OMPI users] EXTERNAL: Re: openmpi shared memory feature

Reply via email to