On Aug 29, 2012, at 2:25 PM, Yong Qin wrote: > This issue has been observed on OMPI 1.6 and 1.6.1 with openib btl but > not on 1.4.5 (tcp btl is always fine). The application is VASP and > only one specific dataset is identified during the testing, and the OS > is SL 6.2 with kernel 2.6.32-220.23.1.el6.x86_64. The issue is that > when a certain type of load is put on OMPI 1.6.x, khugepaged thread > always runs with 100% CPU load, and it looks to me like that OMPI is > waiting for some memory to be available thus appears to be hung. > Reducing the per node processes would sometimes ease the problem a bit > but not always. So I did some further testing by playing around with > the kernel transparent hugepage support. > > 1. Disable transparent hugepage support completely (echo never >> /sys/kernel/mm/redhat_transparent_hugepage/enabled). This would allow > the program to progress as normal (as in 1.4.5). Total run time for an > iteration is 3036.03 s.
I'll admit that we have not tested using transparent hugepages. I wonder if there's some kind of bad interaction going on here... What exactly does changing this setting do? > 2. Disable VM defrag effort (echo never >> /sys/kernel/mm/redhat_transparent_hugepage/defrag). This allows the > program to run as well, but the performance is horrible. The same > iteration takes 4967.40 s. > > 3. Disable defrag in khugepaged (echo no >> /sys/kernel/mm/redhat_transparent_hugepage/khugepaged/defrag). This > allows the program to run, and the performance is worse than #1 but > better than #2. The same iteration takes 3348.10 s. > > 4. Disable both VM defrag and khugepaged defrag (#2 + #3). Similar > performance as #3. > > So my question is, looks to me like this has to do with the memory > management in the openib btl, are we using huge pages in 1.6.x? If > that is true, is there a better way to resolve or workaround it within > OMPI itself without disabling transparent hugepage support? We'd like > to keep the hugepage support if possible. Mellanox -- can you comment on this? > Also is this related to the > register memory imbalance issue that Jeff was mentioning recently > (http://blogs.cisco.com/performance/registered-memory-imbalances/) > because we definitely have this issue with this dataset from the > symptoms that I can tell, but I wouldn't expect it to hang on > khugepaged, or is this just a corner case? It *could* be... but I really have no idea (haven't thought about huge page support w.r.t. registered memory exhaustion / imbalance). Mellanox? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/