On Tue, 2009-05-19 at 14:01 -0400, Noam Bernstein wrote:
I'm glad you got to the bottom of it.
> With one of them, apparently, CP2K will silently go on if
> the
> file is missing, but then lock up in an MPI call (maybe it leaves
> some
> variables uninitialized, and then uses them in the call
On May 19, 2009, at 12:13 PM, Ashley Pittman wrote:
That is indeed odd but it shouldn't be too hard to track down, how
often
does the failure occur? Presumably when you say you have three
invocations of the program they communicate via files, is the location
of these files changing?
Yeay.
On May 19, 2009, at 12:13 PM, Ashley Pittman wrote:
On Tue, 2009-05-19 at 11:01 -0400, Noam Bernstein wrote:
I'd suspect the filesystem too, except that it's hung up in an MPI
call. As I said
before, the whole thing is bizarre. It doesn't matter where the
executable is,
just what CWD is (i.
On May 19, 2009, at 12:13 PM, Ashley Pittman wrote:
Finally if you could run it with "--mca btl ^ofed" to rule out the
ofed
stack causing the problem that would be useful. You'd need to check
the
syntax here.
--mca btl ^openib
We're stuck with that old name for now -- see http://www.open
On Tue, 2009-05-19 at 11:01 -0400, Noam Bernstein wrote:
> I'd suspect the filesystem too, except that it's hung up in an MPI
> call. As I said
> before, the whole thing is bizarre. It doesn't matter where the
> executable is,
> just what CWD is (i.e. I can do mpirun /scratch/exec or mpirun
On May 19, 2009, at 9:32 AM, Ashley Pittman wrote:
Can you confirm that *all* processes are in PMPI_Allreduce at some
point, the collectives commonly get blamed for a lot of hangs and it's
not always the correct place to look.
For the openmpi run, every single process showed one of those
two
On May 19, 2009, at 9:13 AM, Noam Bernstein wrote:
The MPI code isn't calling fork or system. The serial code is calling
system("mpirun cp2k.popt"). That runs to completion, processes the
output files, and calls system("mpirun cp2k.popt") again, and so on.
Is that in fact likely to be a problem
On Mon, 2009-05-18 at 17:05 -0400, Noam Bernstein wrote:
> The code is complicated, the input files are big and lead to long
> computation
> times, so I don't think I'll be able to make a simple test case.
> Instead
> I attached to the hanging processes (all 8 of them) with gdb
> during the h
On May 19, 2009, at 8:29 AM, Jeff Squyres wrote:
fork() support in OpenFabrics has always been dicey -- it can lead
to random behavior like this. Supposedly it works in a specific set
of circumstances, but I don't have a recent enough kernel on my
machines to test.
It's best not to use
fork() support in OpenFabrics has always been dicey -- it can lead to
random behavior like this. Supposedly it works in a specific set of
circumstances, but I don't have a recent enough kernel on my machines
to test.
It's best not to use calls to system() if they can be avoided.
Indeed,
Hi all - I have a bizarre OpenMPI hanging problem. I'm running an MPI
code
called CP2K (related to, but not the same as cpmd). The complications
of the
software aside, here are the observations:
At the base is a serial code that uses system() calls to repeatedly
invoke
mpirun cp2k.pop
11 matches
Mail list logo