Re: [OMPI users] OpenMPI data transfer error

2011-07-26 Thread Ashley Pittman
On 26 Jul 2011, at 19:59, Jack Bryan wrote: > Any help is appreciated. Your best option is to distill this down to a short example program which shows what's happening v's what you think should be happening. Ashley. -- Ashley Pittman, Bath, UK. Padb - A parallel job inspection tool for clu

Re: [OMPI users] Seg fault with PBS Pro 10.4

2011-07-26 Thread Ralph Castain
I don't believe we ever got anywhere with this due to lack of response. If you get some info on what happened to tm_init, please pass it along. Best guess: something changed in a recent PBS Pro release. Since none of us have access to it, we don't know what's going on. :-( On Jul 26, 2011, at

Re: [OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Ralph Castain
On Jul 26, 2011, at 3:56 PM, Gus Correa wrote: > Thank you very much, Ralph. > > Heck, it had to be something stupid like this. > Sorry for taking your time. > Yes, switching from "slots" to "slot" fixes the rankfile problem, > and both cases work. > > I must have been carried along by the host

[OMPI users] OpenMPI data transfer error

2011-07-26 Thread Jack Bryan
Hi, I am using Open MPI to do data transfer from master node to worker nodes. But, the worker node can the data which is not what it should get. I have checked destination node rank, taskTag, datatype, all of them are correct. I do an experiment. Node 0 sends data to node 1 , 2 ,3. Only nod

Re: [OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Gus Correa
Thank you very much, Ralph. Heck, it had to be something stupid like this. Sorry for taking your time. Yes, switching from "slots" to "slot" fixes the rankfile problem, and both cases work. I must have been carried along by the hostfile syntax, where the "slots" reign, but when it comes to bindi

Re: [OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Ralph Castain
I normally hide my eyes when rankfiles appear, but since you provide so much help on this list yourself... :-) I believe the problem is that you have the keyword "slots" wrong - it is supposed to be "slot": rank 1=host1 slot=1:0,1 rank 0=host2 slot=0:* rank 2=host4 slot=1-2 rank

[OMPI users] Rankfile problem with Open MPI 1.4.3

2011-07-26 Thread Gus Correa
Dear Open MPI pros I am having trouble to get the mpiexec rankfile option right. I would appreciate any help to solve the problem. Also is there a way to tell Open MPI to print out its own numbering of the "slots", and perhaps how they're mapped to the socket:core pair? I am using Open MPI 1.4.

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
On Jul 26, 2011, at 1:58 PM, Reuti wrote: allocation_rule$fill_up >>> >>> Here you specify to fill one machine after the other completely before >>> gathering slots from the next machine. You can change this to $round_robin >>> to get one slot form each node before taking a second from

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Reuti
Am 26.07.2011 um 21:51 schrieb Ralph Castain: > > On Jul 26, 2011, at 1:39 PM, Reuti wrote: > >> Hi, >> >> Am 26.07.2011 um 21:19 schrieb Lane, William: >> >>> I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 >>> slots w/both the btl_tcp_if_exclude and btl_tcp_if_incl

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
On Jul 26, 2011, at 1:39 PM, Reuti wrote: > Hi, > > Am 26.07.2011 um 21:19 schrieb Lane, William: > >> I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 >> slots w/both the btl_tcp_if_exclude and btl_tcp_if_include switches >> passed to mpirun. >> >> SGE always allocat

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
On Jul 26, 2011, at 1:19 PM, Lane, William wrote: > Ralph, > > I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 > slots w/both the btl_tcp_if_exclude and btl_tcp_if_include switches > passed to mpirun. Understood - just pointing out that this is an error, even though w

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Reuti
Hi, Am 26.07.2011 um 21:19 schrieb Lane, William: > I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 > slots w/both the btl_tcp_if_exclude and btl_tcp_if_include switches > passed to mpirun. > > SGE always allocates the qsub jobs from the 24 slot nodes first -- up to th

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Lane, William
Ralph, I can successfully run the MPI testcode via OpenMPI 1.3.3 on less than 87 slots w/both the btl_tcp_if_exclude and btl_tcp_if_include switches passed to mpirun. SGE always allocates the qsub jobs from the 24 slot nodes first -- up to the 96 slots that these 4 nodes have available (on the

[OMPI users] Seg fault with PBS Pro 10.4

2011-07-26 Thread Wood, Justin Contractor, SAIC
I'm having a problem using OpenMPI under PBS Pro 10.4. I tried both 1.4.3 and 1.5.3, both behave the same. I'm able to run just fine if I don't use PBS and go direct to the nodes. Also, if I run under PBS and use only 1 node, it works fine, but as soon as I span nodes, I get the following: [

Re: [OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Ralph Castain
A few thoughts: * including both btl_tcp_if_include and btl_tcp_if_exclude is problematic as they are mutually exclusive options. I'm not sure which one will take precedence. I would suggest only using one of them. * the default mapping algorithm is byslot - i.e., OMPI will place procs on each

[OMPI users] Can run OpenMPI testcode on 86 or fewer slots in cluster, but nothing more than that

2011-07-26 Thread Lane, William
Please help me resolve the following problems with a 306 node Rocks cluster using SGE. Please note I can run the job successfully on <87 slots, but not anymore than that. We're running SGE and I'm submitting my jobs via the SGE CLI utility qsub and the following lines from a script: mpirun -n $