On 8/30/2012 10:28 PM, Yong Qin wrote: > On Thu, Aug 30, 2012 at 5:12 AM, Jeff Squyres<jsquy...@cisco.com> wrote: >> On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: >> >>> This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but >>> not on 1.4.5 (tcp btl is always fine). The application is VASP and >>> only one specific dataset is identified during the testing, and the OS >>> is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that >>> when a certain type of load is put on OMPI 1.6.x, khugepaged thread >>> always runs with 100% CPU load, and it looks to me like that OMPI is >>> waiting for some memory to be available thus appears to be hung. >>> Reducing the per node processes would sometimes ease the problem a bit >>> but not always. So I did some further testing by playing around with >>> the kernel transparent hugepage support. >>> >>> 1. Disable transparent hugepage support completely (echo never >>>> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow >>> the program to progress as normal (as in 1.4.5). Total run time for an >>> iteration is 3036.03 s. >> >> I'll admit that we have not tested using transparent hugepages. I wonder if >> there's some kind of bad interaction going on here... > > The transparent hugepage is "transparent", which means it is > automatically applied to all applications unless it is explicitly told > otherwise. I highly suspect that it is not working properly in this > case.
Like Jeff said - I don't think we've ever tested OMPI with transparent huge pages. >> >> What exactly does changing this setting do? > > Here (http://lwn.net/Articles/423592/) is a pretty good documentation > on what these settings would do to the behaviour of the THP. I don't > think I can explain it better than the article so I will leave it to > you to digest. :) > >> >>> 2. Disable VM defrag effort (echo never >>>> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the >>> program to run as well, but the performance is horrible. The same >>> iteration takes 4967.40 s. >>> >>> 3. Disable defrag in khugepaged (echo no >>>> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This >>> allows the program to run, and the performance is worse than #1 but >>> better than #2. The same iteration takes 3348.10 s. >>> >>> 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar >>> performance as #3. >>> >>> So my question is, looks to me like this has to do with the memory >>> management in the openib btl, are we using huge pages in 1.6.x? If >>> that is true, is there a better way to resolve or workaround it within >>> OMPI itself without disabling transparent hugepage support? We'd like >>> to keep the hugepage support if possible. >> >> Mellanox -- can you comment on this? Actually, I don't think that THP were really tested with OFED. I can think of lots of ways thing can go wrong there. This might be a good question to address to Linux-RDMA mailing list. -- YK