Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Ralph Castain
Patch is built and under review... Thanks again Ralph On Dec 2, 2009, at 5:37 PM, Nicolas Bock wrote: > Thanks > > On Wed, Dec 2, 2009 at 17:04, Ralph Castain wrote: > Yeah, that's the one all right! Definitely missing from 1.3.x. > > Thanks - I'll build a patch for the next bug-fix release >

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
Oh bugger, I did miss the obvious. The "old" code which I had ifdef'd out contained an actual construction of the list itself. OBJ_CONSTRUCT(&opal_if_list, opal_list_t); If I make sure I do one of those, I now get a different set of messages but we are back to running again. mpirun -v -

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I would be leery of the hard-coded stuff. Indeed, so I changed it to: intf.if_mask = prefix( sin_addr->sin_addr.s_addr); which seems to match what the "old" code was doing: still blowing up though. > Reason: the IPv6 code has been a continual source of trouble, > while the IPv4 code has wor

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Ralph Castain
I would be leery of the hard-coded stuff. Reason: the IPv6 code has been a continual source of trouble, while the IPv4 code has worked quite well. Could be a lot of reasons, especially the fact that the IPv6 code is hardly exercised by the devel team...so changes that cause problems are rarely

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Nicolas Bock
Thanks On Wed, Dec 2, 2009 at 17:04, Ralph Castain wrote: > Yeah, that's the one all right! Definitely missing from 1.3.x. > > Thanks - I'll build a patch for the next bug-fix release > > > On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote: > > > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I believe this line is incorrect: > >>opal_list_append(&opal_if_list, (opal_list_item_t*) >> intf_ptr); > > It needs to be > > opal_list_append(&opal_if_list, &intf_ptr->super); Didn't seem to change things. Any thoughts on the: /* * hardcoded netmask, adri

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Ralph Castain
Yeah, that's the one all right! Definitely missing from 1.3.x. Thanks - I'll build a patch for the next bug-fix release On Dec 2, 2009, at 4:37 PM, Abhishek Kulkarni wrote: > On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain wrote: >> Indeed - that is very helpful! Thanks! >> Looks like we aren't

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Ralph Castain
I believe this line is incorrect: >opal_list_append(&opal_if_list, (opal_list_item_t*) intf_ptr); It needs to be opal_list_append(&opal_if_list, &intf_ptr->super); On Dec 2, 2009, at 4:46 PM, kevin.buck...@ecs.vuw.ac.nz wrote: >> I have actually already taken the IPv6 block and si

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> I have actually already taken the IPv6 block and simply tried to > replace any IPv6 stuff with IPv4 "equivalents", eg: At the risk of showing a lot of ignorance, here's the block I coddled together based on the IPv6 block. I have tried to keep it looking as close to the original IPv6 block as p

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Abhishek Kulkarni
On Wed, Dec 2, 2009 at 5:00 PM, Ralph Castain wrote: > Indeed - that is very helpful! Thanks! > Looks like we aren't cleaning up high enough - missing the directory level. > I seem to recall seeing that error go by and that someone fixed it on our > devel trunk, so this is likely a repair that did

[OMPI users] Application Schema for LAM to OpenMPI

2009-12-02 Thread Nathan Glenn
Currently, I am in the process of converting an MPMD program of mine from LAM to OpenMPI. The old LAM setup used an application schema to handle the launching of the server and remote processes on all the nodes in the cluster; however, I have run into an issue due to the difference in how mpirun w

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Ralph Castain
Indeed - that is very helpful! Thanks! Looks like we aren't cleaning up high enough - missing the directory level. I seem to recall seeing that error go by and that someone fixed it on our devel trunk, so this is likely a repair that didn't get moved over to the release branch as it should have

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Kevin . Buckley
> Given that it is working for us at the moment, and my current > priorities, I doubt I'll get to this over the next 2-3 weeks. > So if you have time and care to look at it before then, please > do! I have actually already taken the IPv6 block and simply tried to replace any IPv6 stuff with IPv4

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Nicolas Bock
On Wed, Dec 2, 2009 at 14:23, Ralph Castain wrote: > Hmmif you are willing to keep trying, could you perhaps let it run for > a brief time, ctrl-z it, and then do an ls on a directory from a process > that has already terminated? The pids will be in order, so just look for an > early number (

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Ralph Castain
Hmmif you are willing to keep trying, could you perhaps let it run for a brief time, ctrl-z it, and then do an ls on a directory from a process that has already terminated? The pids will be in order, so just look for an early number (not mpirun or the parent, of course). It would help if yo

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Nicolas Bock
On Wed, Dec 2, 2009 at 12:12, Ralph Castain wrote: > > On Dec 2, 2009, at 10:24 AM, Nicolas Bock wrote: > > > > On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote: > >> >> >> On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote: >> >>> You may want to check your limits as defined by the shell/system

Re: [OMPI users] MPI::WORLD_COMM.Send Complex class structure definedin boost::ptr_vector

2009-12-02 Thread Jeff Squyres
boost.MPI is probably your best bet. They export some nice C++ functionality through MPI. On Dec 2, 2009, at 2:37 PM, Ivan Marin wrote: > Hello all, > > I'm developing an groundwater simulation application that will use openmpi to > distribute the data and solve a linear system. The problem i

[OMPI users] MPI::WORLD_COMM.Send Complex class structure defined in boost::ptr_vector

2009-12-02 Thread Ivan Marin
Hello all, I'm developing an groundwater simulation application that will use openmpi to distribute the data and solve a linear system. The problem is that my primary data structure is composed of a base class and derived classes, and they are inserted in a boost ptr_vector, as they are of differe

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Ralph Castain
On Dec 2, 2009, at 10:24 AM, Nicolas Bock wrote: > > > On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote: > > > On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote: > You may want to check your limits as defined by the shell/system. I can also > run this for as long as I'm willing to let it r

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-02 Thread Brock Palen
On Dec 1, 2009, at 11:15 AM, Ashley Pittman wrote: On Tue, 2009-12-01 at 10:46 -0500, Brock Palen wrote: The attached code, is an example where openmpi/1.3.2 will lock up, if ran on 48 cores, of IB (4 cores per node), The code loops over recv from all processors on rank 0 and sends from all oth

Re: [OMPI users] MPI_Comm_spawn lots of times

2009-12-02 Thread Nicolas Bock
On Tue, Dec 1, 2009 at 20:58, Nicolas Bock wrote: > > > On Tue, Dec 1, 2009 at 18:03, Ralph Castain wrote: > >> You may want to check your limits as defined by the shell/system. I can >> also run this for as long as I'm willing to let it run, so something else >> appears to be going on. >> >> >>

Re: [OMPI users] ompi-restart using different nodes

2009-12-02 Thread Jonathan Ferland
Hi Josh, In case it help, I am running 1.3.3 compiled as follow : ../configure --enable-ft-thread --with-ft=cr --enable-mpi-threads --with-blcr=... --with-blcr-libdir=...--disable-openib-rdmacm --prefix= I ran my application like this : mpirun -am ft-enable-cr --hostfile host -np 2 ./a.out

Re: [OMPI users] ompi-restart using different nodes

2009-12-02 Thread Josh Hursey
Though I do not test this scenario (using hostfiles) very often, it used to work. The ompi-restart command takes a --hostfile (or -- machinefile) argument that is passed directly to the mpirun command. I wonder if something broke recently with this handoff. I can certainly checkpoint with on

Re: [OMPI users] Pointers for understanding failure messages onNetBSD

2009-12-02 Thread Jeff Squyres
Sorry to jump into this late -- yes, opal/util/if.c is the exact place for this stuff. Ralph is exactly correct that this code has been touched by multiple people over a few years, so it's possible that it's a little krufty. I certainly hope it isn't working by accident -- but given the contex

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-02 Thread Ralph Castain
Given that it is working for us at the moment, and my current priorities, I doubt I'll get to this over the next 2-3 weeks. So if you have time and care to look at it before then, please do! Thanks On Dec 1, 2009, at 8:45 PM, kevin.buck...@ecs.vuw.ac.nz wrote: >> Interesting - especially since

[OMPI users] ompi-restart using different nodes

2009-12-02 Thread Jonathan Ferland
Hi, I am trying to use BLCR checkpointing in mpi. I am currently able to run my application using some hostfile, checkpoint the run, and then restart the application using the same hostfile. The thing I would like to do is to restart the application with a different hostfile. But this leads to

Re: [OMPI users] Elementary question on openMPI application location when using PBS submission

2009-12-02 Thread Joshua Hursey
The --preload-* options to 'mpirun' currently use the ssh/scp commands (or rsh/rcp via an MCA parameter) to move files from the machine local to the 'mpirun' command to the compute nodes during launch. This assumes that you have Open MPI already installed on all of the machines. It was an option

Re: [OMPI users] exceedingly virtual memory consumption of MPI, environment if higher-setting "ulimit -s"

2009-12-02 Thread David Singleton
I think the issue is that if you *dont* specifically use pthread_attr_setstacksize the pthread library will (can?) give each thread a stack of size equal to the stacksize rlimit. You are correct - this is not specifically an Open MPI issue although if it is Open MPI spawning the threads, maybe i

Re: [OMPI users] MPI Processes and Auto Vectorization

2009-12-02 Thread Terry Frankcombe
On Tue, 2009-12-01 at 05:47 -0800, Tim Prince wrote: > amjad ali wrote: > > Hi, > > thanks T.Prince, > > > > Your saying: > > "I'll just mention that we are well into the era of 3 levels of > > programming parallelization: vectorization, threaded parallel (e.g. > > OpenMP), and process parallel

Re: [OMPI users] Program deadlocks, on simple send/recv loop

2009-12-02 Thread Eugene Loh
John R. Cary wrote: Jeff Squyres wrote: (for the web archives) Brock and I talked about this .f90 code a bit off list -- he's going to investigate with the test author a bit more because both of us are a bit confused by the F90 array syntax used. Attached is a simple send/recv code writte

Re: [OMPI users] mpirun is using one PBS node only

2009-12-02 Thread Belaid MOA
>PBS loves to read the nodes' list backwards. > If you want to start with WN1, > put it last on the Torque/PBS "nodes" file. Nice to know. Thanks Gus for the tip! Best Regards. ~Belaid. > > Gus Correa > - > Gustavo Correa > Lam