I've been testing an application that turns out to be ~30% slower with OMPI 1.6.5 than (the Red Hat packaged version of) 1.5.4, with the same mca-params and the same binary, just flipping the runtime. It's running over openib, and the profile it prints says that alltoall is a factor of four slower in 1.6.5. (I haven't tried to profile it externally, but I've no reason to doubt what it says.)
How should I go about finding out why and -- I hope -- fixing it? A possibly relevant side question: Is there a way of dumping all the MCA parameters in effect? ompi_info --all doesn't show collective algorithms, for instance, though I thought I'd got those out of it at one time.