Hi Ralph
Thank you for your attention to this again.
More inline.
Ralph Castain wrote:
I've never seen that problem before,
and I wasn't able to track down what "keyval parser" is being referenced. :-(
There was a posting in 2006, but somehow it was hijacked and
there is no clear solution,
I've never seen that problem before, and I wasn't able to track down what
"keyval parser" is being referenced. :-(
I have a very simple suggestion, however. Just edit the .txt files and remove
the comments.
If you are getting a workable executable, then I would think it must be reading
past th
Hi Ralph, list
Sorry, I spoke too soon.
The problem still seems to be there, in all compiler wrappers
(I.e. it probably tracks down to the opal_wrapper):
$ mpicc -o hello_c hello_c.c
[mycomputer.mydomain:09148] keyval parser: error 1 reading file
/my/path/to/openmpi/1.4.3/gnu-intel/openmpi/mpi
Thank you for your help! The issue is definitely the firewall. I
guess, since I don't plan on having any communication between "slave"
nodes of my cluster (SPMD with no cross-talk), and it is fairly small,
I'll stick with option 2 for now.
On Mon, Mar 28, 2011 at 3:43 PM, Ralph Castain wrote:
> I
It is hanging because your last nodes are not receiving the launch command.
The daemons receive a message from mpirun telling them what to launch. That
message is sent via a tree-like routing algorithm. So mpirun sends to the first
two daemons, each of which relays it on to some number of daemon
Hello,
First off, complete MPI newbie here. I have installed
openmpi-1.4.3-1.fc13.i686 on an IBM blade cluster running Fedora. I
can open as many slots as I want on remote machines, as long as I only
connect to two machines (doesn't matter which two).
For example, I run my mpi task from "cluster"
The gadget code hangs at the same spot (i.e. number of steps completed AND
same section of code) when I run with --mca btl_openib_cpc_include rdmacm
(code is doing MPI_Sendrecv).
Thanks,
Gretchen
List-Post: users@lists.open-mpi.org
Date: Thu, 17 Mar 2011 12:45:32 -0400
> From: Jeff Squyres
> Su
I run ompi_info --param btl sm and this is the output
MCA btl: parameter "btl_base_debug" (current value: "0")
If btl_base_debug is 1 standard debug is output,
if > 1 verbose debug is output
MCA btl: parameter "btl" (current value: )
The fact that this exactly matches the time you measured with shared memory is
suspicious. My guess is that you aren't actually using shared memory at all.
Does your "ompi_info" output show shared memory as being available? Jeff or
others may be able to give you some params that would let you ch
What happens with 2 processes on the same node with tcp?
With --mca btl self,tcp my app runs in 23s.
2011/3/28 Jeff Squyres (jsquyres)
> Ah, I didn't catch before that there were more variables than just tcp vs.
> shmem.
>
> What happens with 2 processes on the same node with tcp?
>
> Eg, when b
On 3/28/2011 3:29 AM, Michele Marena wrote:
Each node have two processors (no dual-core).
which seems to imply that the 2 processors share memory space and a
single memory buss, and the question is not about what I originally guessed.
--
Tim Prince
On 3/28/2011 3:44 AM, Jeff Squyres (jsquyres) wrote:
Ah, I didn't catch before that there were more variables than just tcp
vs. shmem.
What happens with 2 processes on the same node with tcp?
Eg, when both procs are on the same node, are you thrashing caches or
memory?
In fact, I made the gues
Your program is invalid for several reasons - try correcting all the compiler
warnings first.
Sent from my phone. No type good.
On Mar 27, 2011, at 10:08 PM, "job hunter" wrote:
> Hi all,
>I explained the problem I'm facing @ http://www.ideone.com/EGMMn
> please help
> thanks
>
Ah, I didn't catch before that there were more variables than just tcp vs.
shmem.
What happens with 2 processes on the same node with tcp?
Eg, when both procs are on the same node, are you thrashing caches or memory?
Sent from my phone. No type good.
On Mar 28, 2011, at 6:27 AM, "Michele Mar
Each node have two processors (no dual-core).
2011/3/28 Michele Marena
> However, I thank you Tim, Ralh and Jeff.
> My sequential application runs in 24s (wall clock time).
> My parallel application runs in 13s with two processes on different nodes.
> With shared memory, when two processes are o
However, I thank you Tim, Ralh and Jeff.
My sequential application runs in 24s (wall clock time).
My parallel application runs in 13s with two processes on different nodes.
With shared memory, when two processes are on the same node, my app runs in
23s.
I'm not understand why.
2011/3/28 Jeff Squyr
On Mar 27, 2011, at 9:14 PM, Jack Bryan wrote:
> The job queue has a time budget, which has been set in my job script.
>
> For example, my current job queue is 24 hours.
>
> But, my program got SIGKILL (signal 9) within not more than 2 hours since it
> began to run.
Torque has decided to kil
17 matches
Mail list logo