Re: [OMPI users] LS-DYNA profiling [was: OpenMPI Hangs, No Error]

2010-07-14 Thread Robert Walters
Hello all, I have finally solved the issue, or as it should be said, discovered my oversight. And it's a mistake that will have je mad at myself for a while. I'm new to MPI, though, and not versed in the MPP communications of LS-DYNA at all though, so it was an oversight easily made. The key t

Re: [OMPI users] first cluster [was trouble using openmpi under slurm]

2010-07-14 Thread Jeff Squyres
On Jul 9, 2010, at 12:43 PM, Douglas Guptill wrote: > After some lurking and reading, I plan this: > Debian (lenny) > + fai - for compute-node operating system install > + Torque- job scheduler/manager > + MPI (Intel MPI) - for the application > +

Re: [OMPI users] OpenMPI how large its buffer size ?

2010-07-14 Thread Jeff Squyres
+1 on all that has been said. As Eugene stated: this is not an internal Open MPI bug. Your application is calling some form of an MPI receive with a buffer that is too small. The MPI specification defines this as a truncation error; hence, Open MPI gives you an ERR_TRUNCATE. You can fix the

Re: [OMPI users] perhaps an openmpi bug, how best to identify?

2010-07-14 Thread Jeff Squyres
On Jul 12, 2010, at 11:14 AM, Olivier Marsden wrote: > Hi again, > after testing as suggested, it is indeed a massive slowdown rather than > a full-blown machine hang. Ok. > Would the next test be to run with debug flags for openmpi ? You might want to run with mpirun --mca mpi_yield_when_

[OMPI users] LS-DYNA profiling [was: OpenMPI Hangs, No Error]

2010-07-14 Thread Eugene Loh
I started today reading e-mail quickly and out of order.  So, I'm going back to an earlier message now, but still with the new Subject heading, which better reflects where you are in your progress.  I'm extracting some questions from this thread, from bottom/old to top/new: 1)  What tools to u

[OMPI users] LS-DYNA profiling [was: OpenMPI Hangs, No Error]

2010-07-14 Thread Eugene Loh
I took the liberty of changing the subject line. Yes, MPI_Barrier waits until all other processes in the communicator catch up.  So, long barrier time usually indicates there is some "load imbalance"... one or more processes reach the synchronization point well before the others.  Other commun

Re: [OMPI users] OpenMPI Hangs, No Error

2010-07-14 Thread Robert Walters
Also, I finally got some graphical output from Sun Studio Analyzer. I see MPI_Recv and MPI_Wait taking a lot of time, but I would think that is ok, this program does heavy number crunching and I would expect it to need to Wait or wait to Receive very often since there is a decent amount of tim

Re: [OMPI users] Killing openmpi job via programming language

2010-07-14 Thread Ralph Castain
You need to call MPI_Abort, not Finalize. Finalize will block until all procs call it. Abort causes the system to terminate everyone immediately. On Jul 14, 2010, at 5:06 AM, Saygin Arkan wrote: > Hi, > I'm executing an mpi program, using C++ bindings. > > if( rank == 0) > { > ... > ... > if( !

[OMPI users] Killing openmpi job via programming language

2010-07-14 Thread Saygin Arkan
Hi, I'm executing an mpi program, using C++ bindings. if( rank == 0) { ... ... if( !isFileFound){ LOG4CXX_ERROR(log, "There are not any files related with the given probe ID"); Finalize(); exit(0); } } Here rank zero stops working, I print the error log. But th

Re: [OMPI users] [openib] segfault when using openib btl

2010-07-14 Thread Eloi Gaudry
Hi Rolf, thanks for your input. You're right, I miss the coll_tuned_use_dynamic_rules option. I'll check if I the segmentation fault disappears when using the basic bcast linear algorithm using the proper command line you provided. Regards, Eloi On Tuesday 13 July 2010 20:39:59 Rolf vandeVaar