Right, that's the maximum number of open MX channels, i.e. processes
than can run on the node using MX. With MX (1.2.0c I think), I get
weird messages if I run a second mpirun quickly after the first one
failed. The myrinet guys, I quite sure, can explain why and how.
Somehow, when an application segfault while the MX port is open
things are not cleaned up right away. It take few seconds (not more
than one minute) to have everything running correctly after that.

Supposedly I am a "myrinet guy" ;-) Yeah, the endpoint cleanup stuff could take a few seconds after an ungraceful exit. But, if you're getting some behavior that looks like you ought not be getting, please let us know!
-reese
Myricom, Inc.


Reply via email to