[OMPI users] delimiter in appfile

2012-09-03 Thread Siegmar Gross
Hi, I get strange results if I use a tab instead of a space as a delimiter in an appfile. Perhaps I've missed something but I can't remember that I read that tabs are not allowed. Tab between 2 and -host. -np 2 -host tyr.informatik.hs-fulda.de rank_size tyr small_prog 144 mpiexec -app app_ra

[OMPI users] problem with rankfile

2012-09-03 Thread Siegmar Gross
Hi, the man page for "mpiexec" shows the following: cat myrankfile rank 0=aa slot=1:0-2 rank 1=bb slot=0:0,1 rank 2=cc slot=1-2 mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that Rank 0 runs on node aa, bound to socket 1, cores 0-2. Ra

Re: [OMPI users] problem with rankfile

2012-09-03 Thread Ralph Castain
Are *all* the machines Sparc? Or just the 3rd one (rs0)? On Sep 3, 2012, at 12:43 PM, Siegmar Gross wrote: > Hi, > > the man page for "mpiexec" shows the following: > > cat myrankfile > rank 0=aa slot=1:0-2 > rank 1=bb slot=0:0,1 > rank 2=cc slot=1-2 >

Re: [OMPI users] delimiter in appfile

2012-09-03 Thread Ralph Castain
Possible - yes. Likely to happen immediately - less so as most of us are quite busy right now. I'll add it to the "requested feature" list, but can make no promises on if/when it might happen. Certainly won't be included in anything prior to the upcoming 1.7 series. On Sep 3, 2012, at 12:42 PM

[OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-03 Thread Andrea Negri
I have asked to my admin and he said that no log messages were present in /var/log, apart my login on the compute node. No killed processes, no full stack errors, the memory is ok, 1GB is used and 2GB are free. Actually I don't know what kind of problem coud be, does someone have ideas? Or at least

[OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Reuti
Hi all, I just compiled Open MPI 1.6.1 and before digging any deeper: does anyone else notice, that the command: $ mpiexec -n 4 -machinefile mymachines ./mpihello will ignore the argument "-machinefile mymachines" and use the file "openmpi-default-hostfile" instead all the time? == SGE issue

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-03 Thread Ralph Castain
It looks to me like the network is losing connections - your error messages all state "no route to host", which implies that the network interface dropped out. On Sep 3, 2012, at 1:39 PM, Andrea Negri wrote: > I have asked to my admin and he said that no log messages were present > in /var/log,

Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Ralph Castain
On Sep 3, 2012, at 2:12 PM, Reuti wrote: > Hi all, > > I just compiled Open MPI 1.6.1 and before digging any deeper: does anyone > else notice, that the command: > > $ mpiexec -n 4 -machinefile mymachines ./mpihello > > will ignore the argument "-machinefile mymachines" and use the file > "

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-09-03 Thread Ralph Castain
Give the attached patch a try - this works for me, but I'd like it verified before it goes into the next 1.6 release (singleton comm_spawn is so rarely used that it can easily be overlooked for some time). Thx Ralph singleton_comm_spawn.diff Description: Binary data On Aug 31, 2012, at 3:32

Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Reuti
Hi Ralph, Am 03.09.2012 um 23:34 schrieb Ralph Castain: > > On Sep 3, 2012, at 2:12 PM, Reuti wrote: > >> Hi all, >> >> I just compiled Open MPI 1.6.1 and before digging any deeper: does anyone >> else notice, that the command: >> >> $ mpiexec -n 4 -machinefile mymachines ./mpihello >> >>

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-03 Thread Andrea Negri
In which ways can I check the failure of the ethernet connections? 2012/9/3 : > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a me

Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Ralph Castain
On Sep 3, 2012, at 2:40 PM, Reuti wrote: > Hi Ralph, > > Am 03.09.2012 um 23:34 schrieb Ralph Castain: > >> >> On Sep 3, 2012, at 2:12 PM, Reuti wrote: >> >>> Hi all, >>> >>> I just compiled Open MPI 1.6.1 and before digging any deeper: does anyone >>> else notice, that the command: >>>

Re: [OMPI users] some mpi processes "disappear" on a cluster of servers

2012-09-03 Thread Ralph Castain
This is something you probably need to work on with your sys admin - it sounds like there is something unreliable in your network, and that's usually a somewhat hard thing to diagnose. On Sep 3, 2012, at 2:49 PM, Andrea Negri wrote: > In which ways can I check the failure of the ethernet conn

Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Reuti
Am 04.09.2012 um 00:07 schrieb Ralph Castain: > I'm leaning towards fixing it - it came due to discussions on how to handle > hostfile when there is an allocation. For now, though, that should work. Oh, did I miss this on the list? If there is a hostfile given as argument, it should override th

Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken

2012-09-03 Thread Ralph Castain
On Sep 3, 2012, at 3:50 PM, Reuti wrote: > Am 04.09.2012 um 00:07 schrieb Ralph Castain: > >> I'm leaning towards fixing it - it came due to discussions on how to handle >> hostfile when there is an allocation. For now, though, that should work. > > Oh, did I miss this on the list? If there i

Re: [OMPI users] MPI::Intracomm::Spawn and cluster configuration

2012-09-03 Thread Brian Budge
Great. I'll try applying this tomorrow and I'll let you know if it works for me. Brian On Mon, Sep 3, 2012 at 2:36 PM, Ralph Castain wrote: > Give the attached patch a try - this works for me, but I'd like it verified > before it goes into the next 1.6 release (singleton comm_spawn is so rar