Re: [OMPI users] problem with rankfile in openmpi-1.8.2rc3

2014-08-07 Thread Siegmar Gross
Hello Ralph, > Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 > and see if that works I get even more warnings with the new option. It seems that I always get the bindings only for the local machine. I used Solaris Sparc (tyr), Solaris x86_64 (sunpc1), and Linux x86_64 (li

Re: [OMPI users] problem with rankfile in openmpi-1.8.2rc3

2014-08-07 Thread Ralph Castain
Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 and see if that works On Aug 7, 2014, at 4:04 AM, Siegmar Gross wrote: > Hi, > >> I can't replicate - this worked fine for me. I'm at a loss as >> to how you got that error as it would require some strange >> error in the

Re: [OMPI users] problem with rankfile in openmpi-1.8.2rc3

2014-08-07 Thread Siegmar Gross
Hi, > I can't replicate - this worked fine for me. I'm at a loss as > to how you got that error as it would require some strange > error in the report-bindngs option. If you remove that option > from your cmd line, does the problem go away? Yes. tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf r

Re: [OMPI users] problem with rankfile in openmpi-1.8.2rc3

2014-08-05 Thread Ralph Castain
I can't replicate - this worked fine for me. I'm at a loss as to how you got that error as it would require some strange error in the report-bindngs option. If you remove that option from your cmd line, does the problem go away? On Aug 5, 2014, at 12:56 AM, Siegmar Gross wrote: > Hi, > > ye

[OMPI users] problem with rankfile in openmpi-1.8.2rc3

2014-08-05 Thread Siegmar Gross
Hi, yesterday I installed openmpi-1.8.2rc3 on my machines (Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE Linux 12.1 x86_64) with Sun C 5.12. I get an error, if I use a rankfile for all three architectures. The error message depends on the local machine, which I use to run "mpiexec". I get a di

Re: [OMPI users] problem with rankfile in openmpi-1.7.4rc2r30323

2014-01-23 Thread Ralph Castain
Okay, so this is a Sparc issue, not a rankfile one. I'm afraid my lack of time and access to that platform will mean this won't get fixed for 1.7.4, but I'll try to take a look at it when time permits. On Jan 22, 2014, at 10:52 PM, Siegmar Gross wrote: > Dear Ralph, > > the same problems oc

Re: [OMPI users] problem with rankfile in openmpi-1.7.4rc2r30323

2014-01-23 Thread Siegmar Gross
Dear Ralph, the same problems occur without rankfiles. tyr fd1026 102 which mpicc /usr/local/openmpi-1.7.4_64_cc/bin/mpicc tyr fd1026 103 mpiexec --report-bindings -np 2 \ -host tyr,sunpc1 hostname tyr fd1026 104 /opt/solstudio12.3/bin/sparcv9/dbx \ /usr/local/openmpi-1.7.4_64_cc/bin/mpiexe

Re: [OMPI users] problem with rankfile in openmpi-1.7.4rc2r30323

2014-01-22 Thread Ralph Castain
Hard to know how to address all that, Siegmar, but I'll give it a shot. See below. On Jan 22, 2014, at 5:34 AM, Siegmar Gross wrote: > Hi, > > yesterday I installed openmpi-1.7.4rc2r30323 on our machines > ("Solaris 10 x86_64", "Solaris 10 Sparc", and "openSUSE Linux > 12.1 x86_64" with Sun C

[OMPI users] problem with rankfile in openmpi-1.7.4rc2r30323

2014-01-22 Thread Siegmar Gross
Hi, yesterday I installed openmpi-1.7.4rc2r30323 on our machines ("Solaris 10 x86_64", "Solaris 10 Sparc", and "openSUSE Linux 12.1 x86_64" with Sun C 5.12). My rankfile "rf_linpc_sunpc_tyr" contains the following lines. rank 0=linpc0 slot=0:0-1;1:0-1 rank 1=linpc1 slot=0:0-1 rank 2=sunpc1 slot=1

[OMPI users] problem with rankfile in openmpi-1.7.2rc3r28550

2013-05-24 Thread Siegmar Gross
Hi I installed openmpi-1.7.2rc3r28550 on "openSuSE Linux 12.1", "Solaris 10 x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 32- and 64-bit versions. Unfortunately "rank_files" don't work as expected. sunpc1 rankfiles 109 more rf_ex_sunpc_linpc # mpiexec -report-bindings -rf rf_ex_sunpc_linp

Re: [OMPI users] problem with rankfile and openmpi-1.6.4rc3r27923

2013-01-29 Thread Ralph Castain
Aha - I'm able to replicate it, will fix. On Jan 29, 2013, at 11:57 AM, Ralph Castain wrote: > Using an svn checkout of the current 1.6 branch, if works fine for me: > > [rhc@odin ~/v1.6]$ cat rf > rank 0=odin127 slot=0:0-1,1:0-1 > rank 1=odin128 slot=1 > > [rhc@odin ~/v1.6]$ mpirun -n 2 -rf .

Re: [OMPI users] problem with rankfile and openmpi-1.6.4rc3r27923

2013-01-29 Thread Ralph Castain
Using an svn checkout of the current 1.6 branch, if works fine for me: [rhc@odin ~/v1.6]$ cat rf rank 0=odin127 slot=0:0-1,1:0-1 rank 1=odin128 slot=1 [rhc@odin ~/v1.6]$ mpirun -n 2 -rf ./rf --report-bindings hostname [odin127.cs.indiana.edu:12078] MCW rank 0 bound to socket 0[core 0-1] socket 1

[OMPI users] problem with rankfile and openmpi-1.6.4rc3r27923

2013-01-29 Thread Siegmar Gross
Hi today I have installed openmpi-1.6.4rc3r27923. Unfortunately I still have a problem with rankfiles, if I start a process on a remote machine. tyr rankfiles 114 ssh linpc1 ompi_info | grep "Open MPI:" Open MPI: 1.6.4rc3r27923 tyr rankfiles 115 cat rf_linpc1 rank 0=linpc1 slot

Re: [OMPI users] problem with rankfile in openmpi-1.6.4rc2

2013-01-25 Thread Ralph Castain
Found it! A trivial error (missing a break in a switch statement) that only impacts things if multiple sockets are specified in the slot_list. CMR filed to include the fix in 1.6.4 Thanks for your patience Ralph On Jan 24, 2013, at 7:50 PM, Ralph Castain wrote: > I built the current 1.6 branc

Re: [OMPI users] problem with rankfile in openmpi-1.6.4rc2

2013-01-24 Thread Ralph Castain
I built the current 1.6 branch (which hasn't seen any changes that would impact this function) and was able to execute it just fine on a single socket machine. I then gave it your slot-list, which of course failed as I don't have two active sockets (one is empty), but it appeared to parse the li

[OMPI users] problem with rankfile in openmpi-1.6.4rc2

2013-01-19 Thread Siegmar Gross
Hi I have installed openmpi-1.6.4rc2 and have still a problem with my rankfile. linpc1 rankfiles 113 ompi_info | grep "Open MPI:" Open MPI: 1.6.4rc2r27861 linpc1 rankfiles 114 cat rf_linpc1 rank 0=linpc1 slot=0:0-1,1:0-1 linpc1 rankfiles 115 mpiexec -report-bindings -np 1 \ -

[OMPI users] problem with rankfile

2013-01-11 Thread Siegmar Gross
Hi do you know when you will have time to solve the problem with a rankfile? In the past you told me that my rankfile is correct. linpc1 rankfiles 120 ompi_info | grep "Open MPI:" Open MPI: 1.6.4a1r27766 linpc1 rankfiles 121 mpiex

Re: [OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Ralph Castain
I filed a bug fix for this one. However, something you should note. If you fail to provide a "-np N" argument to mpiexec, we assume you want ALL all available slots filled. The rankfile will contain only those procs that you want specifically bound. The remaining procs will be unbound. So with

Re: [OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Ralph Castain
I saw your earlier note about this too. Just a little busy right now, but hope to look at it soon. Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used code path. On Oct 3, 2012, at 3:03 AM, Siegmar Gross wrote: > Hi, > > I want to test process bindings with a ran

[OMPI users] problem with rankfile and openmpi-1.6.2

2012-10-03 Thread Siegmar Gross
Hi, I want to test process bindings with a rankfile in openmpi-1.6.2. Both machines are dual-processor dual-core machines running Solaris 10 x86_64. tyr fd1026 138 cat host_sunpc0_1 sunpc0 slots=4 sunpc1 slots=4 tyr fd1026 139 cat rankfile rank 0=sunpc0 slot=0:0-1,1:0-1 rank 1=sunpc1 slot=0:0-

[OMPI users] problem with rankfile in openmpi-1.6.2

2012-10-01 Thread Siegmar Gross
Hi, I installed openmpi-1.6.2 on our heterogeneous platform (Solaris 10 Sparc, Solaris 10 x86_84, and Linux x86_64). tyr small_prog 125 mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 \ -bysocket -bind-to-core date Mon Oct 1 07:53:15 CEST 2012 [sunpc0:02084] MCW rank 0 bound to socket 0[co

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Jeff Squyres
We actually include hwloc v1.3.2 in the OMPI v1.6 series. Can you download and try that on your machines? http://www.open-mpi.org/software/hwloc/v1.3/ In particular try the hwloc-bind executable (outside of OMPI), and see if binding works properly on your machines. I typically run a te

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Ralph Castain
Hmmm...well, let's try to isolate this a little. Would you mind installing a copy of the current trunk on this machine and trying it? I ask because I'd like to better understand if the problem is in the actual binding mechanism (i.e., hwloc), or in the code that computes where to bind the proce

Re: [OMPI users] problem with rankfile

2012-09-10 Thread Siegmar Gross
Hi, > > are the following outputs helpful to find the error with > > a rankfile on Solaris? > > If you can't bind on the new Solaris machine, then the rankfile > won't do you any good. It looks like we are getting the incorrect > number of cores on that machine - is it possible that it has > hard

Re: [OMPI users] problem with rankfile

2012-09-07 Thread Ralph Castain
On Sep 7, 2012, at 5:41 AM, Siegmar Gross wrote: > Hi, > > are the following outputs helpful to find the error with > a rankfile on Solaris? If you can't bind on the new Solaris machine, then the rankfile won't do you any good. It looks like we are getting the incorrect number of cores on th

Re: [OMPI users] problem with rankfile

2012-09-07 Thread Siegmar Gross
Hi, are the following outputs helpful to find the error with a rankfile on Solaris? I wrapped long lines so that they are easier to read. Have you had time to look at the segmentation fault with a rankfile which I reported in my last email (see below)? "tyr" is a two processor single core machine

Re: [OMPI users] problem with rankfile

2012-09-05 Thread Ralph Castain
I couldn't really say for certain - I don't see anything obviously wrong with your syntax, and the code appears to be working or else it would fail on the other nodes as well. The fact that it fails solely on that machine seems suspect. Set aside the rankfile for the moment and try to just bind

Re: [OMPI users] problem with rankfile

2012-09-05 Thread Siegmar Gross
Hi, I'm new to rankfiles so that I played a little bit with different options. I thought that the following entry would be similar to an entry in an appfile and that MPI could place the process with rank 0 on any core of any processor. rank 0=tyr.informatik.hs-fulda.de Unfortunately it's not all

Re: [OMPI users] problem with rankfile

2012-09-04 Thread Siegmar Gross
Hi, > Are *all* the machines Sparc? Or just the 3rd one (rs0)? Yes, both machines are Sparc. I tried first in a homogeneous environment. tyr fd1026 106 psrinfo -v Status of virtual processor 0 as of: 09/04/2012 07:32:14 on-line since 08/31/2012 15:44:42. The sparcv9 processor operates at 160

Re: [OMPI users] problem with rankfile

2012-09-03 Thread Ralph Castain
Are *all* the machines Sparc? Or just the 3rd one (rs0)? On Sep 3, 2012, at 12:43 PM, Siegmar Gross wrote: > Hi, > > the man page for "mpiexec" shows the following: > > cat myrankfile > rank 0=aa slot=1:0-2 > rank 1=bb slot=0:0,1 > rank 2=cc slot=1-2 >

[OMPI users] problem with rankfile

2012-09-03 Thread Siegmar Gross
Hi, the man page for "mpiexec" shows the following: cat myrankfile rank 0=aa slot=1:0-2 rank 1=bb slot=0:0,1 rank 2=cc slot=1-2 mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that Rank 0 runs on node aa, bound to socket 1, cores 0-2. Ra