Re: [OMPI users] OFED-1.5rc1 with OpenMPI and IB

2009-12-07 Thread Jeff Squyres
You need to check on how the defaults are set on your systems. Sometimes it can be that a daemon is started with low limits (e.g., 64) and then any shells/windows that that daemon spawns inherit those low limits. For example, see the Open MPI FAQ on this topic: http://www.open-mpi.org/faq/?cat

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-07 Thread George Bosilca
There are many papers published at this subject. Google scholar with a search for "system noise" will give you a starting point. george. On Dec 7, 2009, at 10:13 , Douglas Guptill wrote: >> In most MPI applications if even one task is sharing its CPU with >> other processes, like users doing

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-07 Thread Douglas Guptill
On Mon, Dec 07, 2009 at 08:21:46AM -0500, Richard Treumann wrote: > The need for a "better" timeout depends on what else there is for the CPU > to do. > > If you get creative and shift from {99% MPI_WAIT , 1% OS_idle_process} to > {1% MPI_Wait, 99% OS_idle_process} at a cost of only a few extra >

Re: [OMPI users] OFED-1.5rc1 with OpenMPI and IB

2009-12-07 Thread Stefan Kuhne
Stefan Kuhne schrieb: > Stefan Kuhne schrieb: > Hello, >> I'll try it on monday. >> > with: > user@head:~$ ulimit -l > unlimited > user@head:~$ > > it works. > it works in ssh and FreeNX, but an Terminal on real X11 tells 64 again. But i need X11 for testing an MPE issue. Regards, Stefan Kuhne

Re: [OMPI users] Mimicking timeout for MPI_Wait

2009-12-07 Thread Richard Treumann
The need for a "better" timeout depends on what else there is for the CPU to do. If you get creative and shift from {99% MPI_WAIT , 1% OS_idle_process} to {1% MPI_Wait, 99% OS_idle_process} at a cost of only a few extra microseconds added lag on MPI_Wait, you may be pleased by the CPU load statist

Re: [OMPI users] Job fails after hours of running on a specific node

2009-12-07 Thread Sangamesh B
Hello Pasha, As the error was not repeating frequently, I didn't look into the issue from a long time. But now I started to diagnose it: Initially I tested with ibv_rc_pingpong (Master node to all compute nodes using a for loop). Its working for each of the nodes. The files generated o