[OMPI users] --bynode vs --byslot

2008-05-23 Thread Cally K
Hi, I have a question about --bynode and --byslot that i would like to clarify Say, for example, I have a hostfile #Hostfile __ node0 node1 slots=2 max_slots=2 node2 slots=2 max_slots=2 node3 slots=4 max_slots=4 ___ There are 4 nodes and 9 slots,

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jim Kusznir
I've asked for verification, but I recall the original verbal complaint claiming the wall time was random and sometimes as short as 2 minutes into a job. They have said they've run more tests with more instrumentation on their code, and it always fails in a random placeSame job, different resu

Re: [OMPI users] More OpenMPI errors: how to debug?

2008-05-23 Thread Jim Kusznir
Well, it turns out that the path OpenMPI looks for things seems at least partially hard-coded. I've got some "wierd pathing" here on my rocks cluster: /opt is local; /share/apps is exported from the headnode and available on all nodes. On the head node, /opt is symlinked to /share/apps I set my

Re: [OMPI users] More OpenMPI errors: how to debug?

2008-05-23 Thread Jeff Squyres
On May 22, 2008, at 12:52 PM, Jim Kusznir wrote: I installed openmpi 1.2.6 on my system, but now my users are complaining about even more errors. I'm getting this: [compute-0-23.local:26164] [NO-NAME] ORTE_ERROR_LOG: Not found in file runtime/orte_init_stage1.c at line 182

Re: [OMPI users] OpenMPI+PGI errors

2008-05-23 Thread Jeff Squyres
This may be a dumb question, but is there a chance that his job is running beyond 30 minutes, and PBS/Torque/whatever is killing it? On May 20, 2008, at 4:23 PM, Jim Kusznir wrote: Hello all: I've got a user on our ROCKS 4.3 cluster that's having some strange errors. I have other users usin

Re: [OMPI users] Open MPI Linux Expectations

2008-05-23 Thread George Bosilca
I build on Debian 4.0 and run on Suse 10 and Fedore Core 6. The only thing I had to enforce is the availability of the corresponding libc library (the one I build with) on the target OS. Moreover, as my nodes have different processors, I have to enforce strict x86 code. george. On May 22