On 10/16/2014 07:32 PM, Jeff Squyres (jsquyres) wrote:
Gus --Can you send the output of configure and your config.log?
Hi Jeff. Sure. This is for the OMPI 1.8.3 build with Intel compilers that I've been using to compile and run IMB. The config.log is attached. The configure command and environment are (is that what you meant by "the output of configure"?): export CC=icc export CXX=icpc export FC=ifort export CFLAGS='-msse2 -fp-model precise -Wall' export CXXFLAGS=${CFLAGS} export FCFLAGS='-msse2 -fp-model precise -warn all' ../configure \ --prefix=${MYINSTALLDIR} \ --with-tm=/opt/torque/4.2.5/gnu-4.4.7 \ --with-verbs=/usr \ --with-knem=/opt/knem-1.1.1 \ 2>&1 | tee configure_${build_id}.log Many thanks, Gus
On Oct 16, 2014, at 4:24 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:On 10/16/2014 05:38 PM, Nathan Hjelm wrote:On Thu, Oct 16, 2014 at 05:27:54PM -0400, Gus Correa wrote:Thank you, Aurelien! Aha, "vader btl", that is new to me! I tought Vader was that man dressed in black in Star Wars, Obi-Wan Kenobi's nemesis. That was a while ago, my kids were children, and Alec Guiness younger than Harrison Ford is today. Oh, how nostalgic code developers can get when it comes to naming things ... If I am using "vader", it is totally inadvertent. There was no such a thing in Open MPI 1.6 and earlier. Now that you mentioned, I can see lots of it in the 1.8.3 ompi_info output. In addition, my stderr files show messages like this: imb.e38352:[1,5]<stddiag>:[node13:16334] mca: bml: Not using sm btl to [[59987,1],26] on node node13 because vader btl has higher exclusivity (65536 > 65535) So, you are right, "vader" is taking over and knocking off "sm" (and openib and everybody else). Darn Vader! Probably knem is going down the tubes along with sm, right?Depends. If there is a reason to continue supporting knem then vader will be updated to support it. I don't currently see a reason to at this time though (since sm continues to live for now).Right now knem is not working in OMPI 1.8.3, even if I turn off vader, and leave only sm,self,openib. I just sent another email documenting that.I was used to sm, openib, self and tcp BTLs. I normally just do "btl = ^tcp" in the MCA parameters file, to stick to sm, openib, and self. That worked fine in 1.6.5 (and earlier), and knem worked flawlessly there. The same settings in 1.8.3 don't bring up the knem functionality. So, this seems to be yet another change in 1.8.3 that I need to learn. Can you or some other list subscriber elaborate a bit about this 'vader' btl? The Open MPI FAQ doesn't have anthing about it. What is it after all? Does it play the same role as "sm", i.e., an intra-node btl? Considering the name, is "vader" good or bad? Or better: In which circumstances is "vader" good and when is it bad?Vader is a btl I originally wrote to support Cray's XPMEM shared memory interface. It was designed to be cleaner than btl/sm have better small message latency, bandwidth, and message rates. Because its latency is so much better than sm I removed the XPMEM requirement and added CMA support.I presume this requires kernel 3.X, as Aurelien pointed out. As a matter of policy, and to keep your user base broad, I would suggest to keep a generous range of backwards compatible support built into OMPI. This would be sm, knem, etc, which I suppose can coexist with vader, or not? I can't speak for others but we run production codes in standard Linux distributions (Centos 6.X, 5.X) whith 2.6.Y kernels. I suppose other people have similar situations.Should I give in to the dark side of the force and keep "vader" turned on, or should I just do something like "btl = ^tcp,^vader" ?You can turn off vader if you want to use knem. I would run some tests to see if there is much of a difference between sm/knem and vader though. I don't have any systems that have knem installed so I haven't been able to run these tests myself. I would primarily focus on the memory usage and the bandwidth. -NathanPlease, see my last email. Turning off vader and sm on, still doesn't make knem work, unless I made some big mistake along the way. I would love to use 1.8.3 in production, as long as sm+knem support works, hence it it would be great if somebody points out any mistake that I may have made. Also, for large messages, IMB with 1.6.5+sm+knem gives me ~30% speedups w.r.t. 1.8.3+sm+(broken)-knem or w.r.t. 1.8.3+vader, although admittedly due to our 2.6 kernel, no CMA, etc, the environment is not favorable to vader to begin with. [And yet another good reason to fix/keep sm+knem in OMPI 1.8.] Thank you, Gus Correa_______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/10/25516.php_______________________________________________ users mailing list us...@open-mpi.org Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users Link to this post: http://www.open-mpi.org/community/lists/users/2014/10/25521.php
config.log.bz2
Description: application/bzip