Re: [OMPI users] openmpi 1.2.8 on Xgrid noob issue

2011-08-04 Thread Jeff Squyres
I'm afraid our Xgrid support has lagged, and Apple hasn't show much interest in MPI + Xgrid support -- much less HPC. :-\ Have you see the FAQ items about Xgrid? http://www.open-mpi.org/faq/?category=osx#xgrid-howto On Aug 4, 2011, at 4:16 AM, Christopher Jones wrote: > Hi there, > > I'

Re: [OMPI users] Program hangs on send when run with nodes on remote machine

2011-08-04 Thread Jeff Squyres
I notice that in the worker, you have: eth2 Link encap:Ethernet HWaddr 00:1b:21:77:c5:d4 inet addr:192.168.1.155 Bcast:192.168.1.255 Mask:255.255.255.0 inet6 addr: fe80::21b:21ff:fe77:c5d4/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-04 Thread Anthony Chan
If you want to debug this on BGP, you could set BG_COREDUMPONERROR=1 and look at the backtrace in the light weight core files (you probably need to recompile everything with -g). A.Chan - Original Message - > Hi Dmitry, > Thanks for a prompt and fairly detailed response. I have also > fo

Re: [OMPI users] OpenMPI causing WRF to crash

2011-08-04 Thread Jeff Squyres
Signal 15 is usually SIGTERM on Linux, meaning that some external entity probably killed the job. The OMPI error message you describe is also typical for that kind of scenario -- i.e., a process exited without calling MPI_Finalize could mean that it called exit() or some external process killed

[OMPI users] Program hangs on send when run with nodes on remote machine

2011-08-04 Thread Keith Manville
I am having trouble running my MPI program on multiple nodes. I can run multiple processes on a single node, and I can spawn processes on on remote nodes, but when I call Send from a remote node, the node never returns, even though there is an appropriate Recv waiting. I'm pretty sure this is an is

[OMPI users] openmpi 1.2.8 on Xgrid noob issue

2011-08-04 Thread Christopher Jones
Hi there, I'm currently trying to set up a small xgrid between two mac pros (a single quadcore and a 2 duo core), where both are directly connected via an ethernet cable. I've set up xgrid using the password authentication (rather than the kerberos), and from what I can tell in the Xgrid admin