Re: [OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-28 Thread Gus Correa
Hi Ralph Thank you for your attention to this again. More inline. Ralph Castain wrote: I've never seen that problem before, and I wasn't able to track down what "keyval parser" is being referenced. :-( There was a posting in 2006, but somehow it was hijacked and there is no clear solution,

Re: [OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-28 Thread Ralph Castain
I've never seen that problem before, and I wasn't able to track down what "keyval parser" is being referenced. :-( I have a very simple suggestion, however. Just edit the .txt files and remove the comments. If you are getting a workable executable, then I would think it must be reading past th

Re: [OMPI users] keyval parser: error 1 reading file mpicc-wrapper-data.txt

2011-03-28 Thread Gus Correa
Hi Ralph, list Sorry, I spoke too soon. The problem still seems to be there, in all compiler wrappers (I.e. it probably tracks down to the opal_wrapper): $ mpicc -o hello_c hello_c.c [mycomputer.mydomain:09148] keyval parser: error 1 reading file /my/path/to/openmpi/1.4.3/gnu-intel/openmpi/mpi

Re: [OMPI users] Cannot launch slots on more than 2 remote machines

2011-03-28 Thread Igor
Thank you for your help! The issue is definitely the firewall. I guess, since I don't plan on having any communication between "slave" nodes of my cluster (SPMD with no cross-talk), and it is fairly small, I'll stick with option 2 for now. On Mon, Mar 28, 2011 at 3:43 PM, Ralph Castain wrote: > I

Re: [OMPI users] Cannot launch slots on more than 2 remote machines

2011-03-28 Thread Ralph Castain
It is hanging because your last nodes are not receiving the launch command. The daemons receive a message from mpirun telling them what to launch. That message is sent via a tree-like routing algorithm. So mpirun sends to the first two daemons, each of which relays it on to some number of daemon

[OMPI users] Cannot launch slots on more than 2 remote machines

2011-03-28 Thread Igor
Hello, First off, complete MPI newbie here. I have installed openmpi-1.4.3-1.fc13.i686 on an IBM blade cluster running Fedora. I can open as many slots as I want on remote machines, as long as I only connect to two machines (doesn't matter which two). For example, I run my mpi task from "cluster"

Re: [OMPI users] gadget2 infiniband openmpi hang

2011-03-28 Thread Gretchen
The gadget code hangs at the same spot (i.e. number of steps completed AND same section of code) when I run with --mca btl_openib_cpc_include rdmacm (code is doing MPI_Sendrecv). Thanks, Gretchen List-Post: users@lists.open-mpi.org Date: Thu, 17 Mar 2011 12:45:32 -0400 > From: Jeff Squyres > Su

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
I run ompi_info --param btl sm and this is the output MCA btl: parameter "btl_base_debug" (current value: "0") If btl_base_debug is 1 standard debug is output, if > 1 verbose debug is output MCA btl: parameter "btl" (current value: )

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Ralph Castain
The fact that this exactly matches the time you measured with shared memory is suspicious. My guess is that you aren't actually using shared memory at all. Does your "ompi_info" output show shared memory as being available? Jeff or others may be able to give you some params that would let you ch

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
What happens with 2 processes on the same node with tcp? With --mca btl self,tcp my app runs in 23s. 2011/3/28 Jeff Squyres (jsquyres) > Ah, I didn't catch before that there were more variables than just tcp vs. > shmem. > > What happens with 2 processes on the same node with tcp? > > Eg, when b

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince
On 3/28/2011 3:29 AM, Michele Marena wrote: Each node have two processors (no dual-core). which seems to imply that the 2 processors share memory space and a single memory buss, and the question is not about what I originally guessed. -- Tim Prince

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Tim Prince
On 3/28/2011 3:44 AM, Jeff Squyres (jsquyres) wrote: Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? In fact, I made the gues

Re: [OMPI users] help with hybrid openmp and mpi

2011-03-28 Thread Jeff Squyres (jsquyres)
Your program is invalid for several reasons - try correcting all the compiler warnings first. Sent from my phone. No type good. On Mar 27, 2011, at 10:08 PM, "job hunter" wrote: > Hi all, >I explained the problem I'm facing @ http://www.ideone.com/EGMMn > please help > thanks >

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Jeff Squyres (jsquyres)
Ah, I didn't catch before that there were more variables than just tcp vs. shmem. What happens with 2 processes on the same node with tcp? Eg, when both procs are on the same node, are you thrashing caches or memory? Sent from my phone. No type good. On Mar 28, 2011, at 6:27 AM, "Michele Mar

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
Each node have two processors (no dual-core). 2011/3/28 Michele Marena > However, I thank you Tim, Ralh and Jeff. > My sequential application runs in 24s (wall clock time). > My parallel application runs in 13s with two processes on different nodes. > With shared memory, when two processes are o

Re: [OMPI users] Shared Memory Performance Problem.

2011-03-28 Thread Michele Marena
However, I thank you Tim, Ralh and Jeff. My sequential application runs in 24s (wall clock time). My parallel application runs in 13s with two processes on different nodes. With shared memory, when two processes are on the same node, my app runs in 23s. I'm not understand why. 2011/3/28 Jeff Squyr

Re: [OMPI users] OMPI error terminate w/o reasons

2011-03-28 Thread Jeff Squyres
On Mar 27, 2011, at 9:14 PM, Jack Bryan wrote: > The job queue has a time budget, which has been set in my job script. > > For example, my current job queue is 24 hours. > > But, my program got SIGKILL (signal 9) within not more than 2 hours since it > began to run. Torque has decided to kil