Hi Andrew, Brock, and everyone else,
Andrew Friedley wrote:
If this is what I think it is, try using this MCA parameter:
-mca btl_openib_ib_timeout 20
Just FYI, in addition to the above, I retried using the gigabit links
('--mca btl tcp,self', right?) and that failed too, so at least in /my/
case, it isn't a problem related to the IB fabrics. I'm recompiling
OpenMPI-1.2.4 with PGI-6.2-5 right now, and recompiling CCSM with this
will take an hour or two, but I'll send a status update after that. I'm
98% certain that that configuration has worked before on a 32-bit Xeon
with gigabit links, so while there are still lots of variables, it
should help me narrow things down.
Cheers,
- Brian
Brian Dobbins
Yale University HPC