Re: [OMPI users] Lower performance on a Gigabit node compared toinfiniband node

2009-03-10 Thread Sangamesh B
Hello Ralph & Jeff, This is the same issue - but this time the job is running on a single node. The two systems on which the jobs are run, have the same hardware/OS configuration. The only differences are: One node has 4 GB RAM and it is part of infiniband connected nodes. The other node ha

[OMPI users] valgrind complaint in openmpi 1.3 (mca_mpool_sm_alloc)

2009-03-10 Thread Åke Sandgren
Hi! Valgrind seems to think that there is an use of uninitialized value in mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) { Backtracking that i found that mem_node is not set during initializing in mca_mpool_sm_init. The resources parameter is never used and the mpool_module->mem_node is

Re: [OMPI users] Lower performance on a Gigabit node compared toinfiniband node

2009-03-10 Thread Igor Kozin
Hi Sangamesh, As far as I can tell there should be no difference if you run CPMD on a single node whether with or without ib. One easy thing that you could do is to repeat your runs on the infiniband node(s) with and without infiniband using --mca btl ^tcp and --mca btl ^openib respectively. But si

Re: [OMPI users] Problem with MPI_Comm_spawn_multiple & MPI_Info_fre

2009-03-10 Thread Lenny Verkhovsky
can you try Open MPI 1.3, Lenny. On 3/10/09, Tee Wen Kai wrote: > > Hi, > > I am using version 1.2.8. > > Thank you. > > Regards, > Wenkai > > --- On *Mon, 9/3/09, Ralph Castain * wrote: > > > From: Ralph Castain > Subject: Re: [OMPI users] Problem with MPI_Comm_spawn_multiple & > MPI_Info_free

Re: [OMPI users] OpenMPI program getting stuck at poll()

2009-03-10 Thread Lenny Verkhovsky
Hi, can you try Open MPI 1.3 version. On 3/9/09, Prasanna Ranganathan wrote: > > Hi all, > > I have a distributed program running on 400+ nodes and using OpenMPI. I > have run the same binary with nearly the same setup successfully previously. > However in my last two runs the program seems t

[OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-03-10 Thread Serge
Hello, We have a number of applications built with Open MPI 1.2 in a shared multi-user environment. The Open MPI library upgrade has been always transparent and painless within the v1.2 branch. Now we would like to switch to Open MPI 1.3 as seamlessly. However, an application built with ompi

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
Latest status - 1.4a1r20757 (yesterday); the job now starts with a little output but quickly runs into trouble with a lot of 'oob-tcp: Communication retries exceeded. Can not communicate with peer ' errors? e.g. [s0158:22513] [[41245,0],4]-[[41245,0],21] oob-tcp: Communication retries exceeded.

Re: [OMPI users] valgrind complaint in openmpi 1.3 (mca_mpool_sm_alloc)

2009-03-10 Thread Eugene Loh
Åke Sandgren wrote: Hi! Valgrind seems to think that there is an use of uninitialized value in mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) { Backtracking that i found that mem_node is not set during initializing in mca_mpool_sm_init. The resources parameter is never used and the mp

Re: [OMPI users] valgrind complaint in openmpi 1.3 (mca_mpool_sm_alloc)

2009-03-10 Thread Åke Sandgren
On Tue, 2009-03-10 at 09:23 -0800, Eugene Loh wrote: > Åke Sandgren wrote: > > >Hi! > > > >Valgrind seems to think that there is an use of uninitialized value in > >mca_mpool_sm_alloc, i.e. the if(mpool_sm->mem_node >= 0) { > >Backtracking that i found that mem_node is not set during initializing

[OMPI users] pcart_coords_f.c:74: error: lvalue required as decrement operand

2009-03-10 Thread Yu Chen
Hi, I downloaded 1.3, and try to compile it with gcc(4.2.0) and g95(0.91) on RHEL AS 4. But I got following errors. Any help are greatly appreciated. (If I compile with gcc(3.4.5) and g95(0.91), it passes.) ... libtool: compile: /usr/local/gcc/bin/gcc -DHAVE_CONFIG_H -I. - I../../../../o

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Ralph Castain
Not really. I've run much bigger jobs than this without problem, so I don't think there is a fundamental issue here. It looks like the TCP fabric between the various nodes is breaking down. I note in the enclosed messages that the problems are all with comm between daemons 4 and 21. We keep

Re: [OMPI users] pcart_coords_f.c:74: error: lvalue required asdecrement operand

2009-03-10 Thread Jeff Squyres
Can you send all the information listed here: http://www.open-mpi.org/community/help/ On Mar 10, 2009, at 2:07 PM, Yu Chen wrote: Hi, I downloaded 1.3, and try to compile it with gcc(4.2.0) and g95(0.91) on RHEL AS 4. But I got following errors. Any help are greatly appreciated. (If I c

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-03-10 Thread Jeff Squyres
Unfortunately, binary compatibility between Open MPI release versions has never been guaranteed (even between subreleases). That being said, we have fixed this issue and expect to support binary compatibility between Open MPI releases starting with v1.3.2 (v1.3.1 should be released soon; we

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
Maybe I know why now but it's not pleasant, e.g. 2 machines in the same cluster have their ethernets such as: Machine s0157 eth2 Link encap:Ethernet HWaddr 00:1E:68:DA:74:A8 BROADCAST MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Ralph Castain
Ick. We don't have a way currently to allow you to ignore an interface on a node-by-node basis. If you do: -mca oob_tcp_if_exclude eth0 we will exclude that private Ethernet. The catch is that we will exclude "eth0" on -every- node. On the two machines you note here, that will still let us

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Jeff Squyres
You *could* have a per-machine mca param config file that could be locally staged on each machine and setup with the exclude for whatever you need on *that* node. Ugly, but it could work...? On Mar 10, 2009, at 4:26 PM, Ralph Castain wrote: Ick. We don't have a way currently to allow you

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
I queued up a job to try this - will let you know. I do have the authority to ifdown those rogue eth0 as they are only an artifact of our install (no cables) and will do that afterwards. Thanks. On Tue, 10 Mar 2009, Ralph Castain wrote: Ick. We don't have a way currently to allow you to ignore

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-03-10 Thread Serge
Thank you, it's very good news. If the issue has been fixed, then does it mean that v1.3.2 will allow to run applications compiled with v1.2.9? Or is it starting with v1.3.2 and subsequent releases will be backward compatible with each other? Jeff Squyres wrote: Unfortunately, binary compatib

Re: [OMPI users] Upgrade from Open MPI 1.2 to 1.3

2009-03-10 Thread Jeff Squyres
On Mar 10, 2009, at 6:53 PM, Serge wrote: Thank you, it's very good news. If the issue has been fixed, then does it mean that v1.3.2 will allow to run applications compiled with v1.2.9? Or is it starting with v1.3.2 and subsequent releases will be backward compatible with each other? The l