Hello Ralph & Jeff,
This is the same issue - but this time the job is running on a single node.
The two systems on which the jobs are run, have the same hardware/OS
configuration. The only differences are:
One node has 4 GB RAM and it is part of infiniband connected nodes.
The other node ha
Hi!
Valgrind seems to think that there is an use of uninitialized value in
mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) {
Backtracking that i found that mem_node is not set during initializing
in mca_mpool_sm_init.
The resources parameter is never used and the mpool_module->mem_node is
Hi Sangamesh,
As far as I can tell there should be no difference if you run CPMD on a
single node whether with or without ib. One easy thing that you could do is
to repeat your runs on the infiniband node(s) with and without infiniband
using --mca btl ^tcp and --mca btl ^openib respectively. But si
can you try Open MPI 1.3,
Lenny.
On 3/10/09, Tee Wen Kai wrote:
>
> Hi,
>
> I am using version 1.2.8.
>
> Thank you.
>
> Regards,
> Wenkai
>
> --- On *Mon, 9/3/09, Ralph Castain * wrote:
>
>
> From: Ralph Castain
> Subject: Re: [OMPI users] Problem with MPI_Comm_spawn_multiple &
> MPI_Info_free
Hi,
can you try Open MPI 1.3 version.
On 3/9/09, Prasanna Ranganathan wrote:
>
> Hi all,
>
> I have a distributed program running on 400+ nodes and using OpenMPI. I
> have run the same binary with nearly the same setup successfully previously.
> However in my last two runs the program seems t
Hello,
We have a number of applications built with Open MPI 1.2 in a shared
multi-user environment. The Open MPI library upgrade has been always
transparent and painless within the v1.2 branch. Now we would like to
switch to Open MPI 1.3 as seamlessly. However, an application built with
ompi
Latest status - 1.4a1r20757 (yesterday);
the job now starts with a little output but quickly runs into trouble with
a lot of
'oob-tcp: Communication retries exceeded. Can not communicate with peer '
errors?
e.g.
[s0158:22513] [[41245,0],4]-[[41245,0],21] oob-tcp: Communication retries exceeded.
Åke Sandgren wrote:
Hi!
Valgrind seems to think that there is an use of uninitialized value in
mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) {
Backtracking that i found that mem_node is not set during initializing
in mca_mpool_sm_init.
The resources parameter is never used and the mp
On Tue, 2009-03-10 at 09:23 -0800, Eugene Loh wrote:
> Åke Sandgren wrote:
>
> >Hi!
> >
> >Valgrind seems to think that there is an use of uninitialized value in
> >mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) {
> >Backtracking that i found that mem_node is not set during initializing
Hi,
I downloaded 1.3, and try to compile it with gcc(4.2.0) and g95(0.91)
on RHEL AS 4. But I got following errors. Any help are greatly
appreciated. (If I compile with gcc(3.4.5) and g95(0.91), it passes.)
...
libtool: compile: /usr/local/gcc/bin/gcc -DHAVE_CONFIG_H -I. -
I../../../../o
Not really. I've run much bigger jobs than this without problem, so I
don't think there is a fundamental issue here.
It looks like the TCP fabric between the various nodes is breaking
down. I note in the enclosed messages that the problems are all with
comm between daemons 4 and 21. We keep
Can you send all the information listed here:
http://www.open-mpi.org/community/help/
On Mar 10, 2009, at 2:07 PM, Yu Chen wrote:
Hi,
I downloaded 1.3, and try to compile it with gcc(4.2.0) and g95(0.91)
on RHEL AS 4. But I got following errors. Any help are greatly
appreciated. (If I c
Unfortunately, binary compatibility between Open MPI release versions
has never been guaranteed (even between subreleases).
That being said, we have fixed this issue and expect to support binary
compatibility between Open MPI releases starting with v1.3.2 (v1.3.1
should be released soon; we
Maybe I know why now but it's not pleasant, e.g. 2 machines in the same
cluster have their ethernets such as:
Machine s0157
eth2 Link encap:Ethernet HWaddr 00:1E:68:DA:74:A8
BROADCAST MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
Ick. We don't have a way currently to allow you to ignore an interface
on a node-by-node basis. If you do:
-mca oob_tcp_if_exclude eth0
we will exclude that private Ethernet. The catch is that we will
exclude "eth0" on -every- node. On the two machines you note here,
that will still let us
You *could* have a per-machine mca param config file that could be
locally staged on each machine and setup with the exclude for whatever
you need on *that* node. Ugly, but it could work...?
On Mar 10, 2009, at 4:26 PM, Ralph Castain wrote:
Ick. We don't have a way currently to allow you
I queued up a job to try this - will let you know.
I do have the authority to ifdown those rogue eth0 as they are only an
artifact of our install (no cables) and will do that afterwards.
Thanks.
On Tue, 10 Mar 2009, Ralph Castain wrote:
Ick. We don't have a way currently to allow you to ignore
Thank you, it's very good news. If the issue has been fixed, then does
it mean that v1.3.2 will allow to run applications compiled with v1.2.9?
Or is it starting with v1.3.2 and subsequent releases will be backward
compatible with each other?
Jeff Squyres wrote:
Unfortunately, binary compatib
On Mar 10, 2009, at 6:53 PM, Serge wrote:
Thank you, it's very good news. If the issue has been fixed, then does
it mean that v1.3.2 will allow to run applications compiled with
v1.2.9?
Or is it starting with v1.3.2 and subsequent releases will be backward
compatible with each other?
The l
19 matches
Mail list logo