[OMPI users] Compiling and Building OPENMPI for checkpointing using self

2009-06-06 Thread Kritiraj Sajadah
HI All, I have successfully install and configured openmpi to perfrom checkpointing using the BLCR mechanism. However, i now want to to try checkpointing using self. Has anyone do that? If so, i would very much appreciate if anyone of you could sent be the steps necessary to enable slef

[OMPI users] MPI inside MPI

2009-06-06 Thread Carlos Henrique da Silva Santos
Dear, I developed one application using openmpi in c++. This application should start internally (by system call) another application which is also developed in c++ and openmpi. When this external application is called with C system function the following messages are showed: [localhost.localdom

Re: [OMPI users] oob-tcp problem, unreachable in orted_comm

2009-06-06 Thread Ralph Castain
Yeah, I've started seeing this on clusters where the TCP stack is a little congested. We default to trying 60 times to send a message, but it is done in rapid succession and doesn't really provide a lot of time. Try setting -mca oob_tcp_peer_retries 1000 (or some number much bigger than 60)

[OMPI users] oob-tcp problem, unreachable in orted_comm

2009-06-06 Thread Åke Sandgren
Just got this in a user job. Any idea why it complains like this. The original error was the infamous "RETRY EXCEEDED ERROR" but instead of killing the job it showed this and never died. I have never seen this happen before. openmpi 1.3.2, built with intel 10.1 This binary is used ALOT (+50% of th