Hello Ralph,
> Try replacing --report-bindings with -mca hwloc_base_report_bindings 1
> and see if that works
I get even more warnings with the new option. It seems that I
always get the bindings only for the local machine. I used
Solaris Sparc (tyr), Solaris x86_64 (sunpc1), and Linux x86_64
(li
Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 and see
if that works
On Aug 7, 2014, at 4:04 AM, Siegmar Gross
wrote:
> Hi,
>
>> I can't replicate - this worked fine for me. I'm at a loss as
>> to how you got that error as it would require some strange
>> error in the
Hi,
> I can't replicate - this worked fine for me. I'm at a loss as
> to how you got that error as it would require some strange
> error in the report-bindngs option. If you remove that option
> from your cmd line, does the problem go away?
Yes.
tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf r
I can't replicate - this worked fine for me. I'm at a loss as to how you got
that error as it would require some strange error in the report-bindngs option.
If you remove that option from your cmd line, does the problem go away?
On Aug 5, 2014, at 12:56 AM, Siegmar Gross
wrote:
> Hi,
>
> ye
Hi,
yesterday I installed openmpi-1.8.2rc3 on my machines
(Solaris 10 Sparc, Solaris 10 x86_64, and openSUSE
Linux 12.1 x86_64) with Sun C 5.12. I get an error,
if I use a rankfile for all three architectures.
The error message depends on the local machine, which
I use to run "mpiexec". I get a di
Okay, so this is a Sparc issue, not a rankfile one. I'm afraid my lack of time
and access to that platform will mean this won't get fixed for 1.7.4, but I'll
try to take a look at it when time permits.
On Jan 22, 2014, at 10:52 PM, Siegmar Gross
wrote:
> Dear Ralph,
>
> the same problems oc
Dear Ralph,
the same problems occur without rankfiles.
tyr fd1026 102 which mpicc
/usr/local/openmpi-1.7.4_64_cc/bin/mpicc
tyr fd1026 103 mpiexec --report-bindings -np 2 \
-host tyr,sunpc1 hostname
tyr fd1026 104 /opt/solstudio12.3/bin/sparcv9/dbx \
/usr/local/openmpi-1.7.4_64_cc/bin/mpiexe
Hard to know how to address all that, Siegmar, but I'll give it a shot. See
below.
On Jan 22, 2014, at 5:34 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.7.4rc2r30323 on our machines
> ("Solaris 10 x86_64", "Solaris 10 Sparc", and "openSUSE Linux
> 12.1 x86_64" with Sun C
Hi,
yesterday I installed openmpi-1.7.4rc2r30323 on our machines
("Solaris 10 x86_64", "Solaris 10 Sparc", and "openSUSE Linux
12.1 x86_64" with Sun C 5.12). My rankfile "rf_linpc_sunpc_tyr"
contains the following lines.
rank 0=linpc0 slot=0:0-1;1:0-1
rank 1=linpc1 slot=0:0-1
rank 2=sunpc1 slot=1
Hi
I installed openmpi-1.7.2rc3r28550 on "openSuSE Linux 12.1", "Solaris 10
x86_64", and "Solaris 10 sparc" with "Sun C 5.12" in 32- and 64-bit
versions. Unfortunately "rank_files" don't work as expected.
sunpc1 rankfiles 109 more rf_ex_sunpc_linpc
# mpiexec -report-bindings -rf rf_ex_sunpc_linp
Aha - I'm able to replicate it, will fix.
On Jan 29, 2013, at 11:57 AM, Ralph Castain wrote:
> Using an svn checkout of the current 1.6 branch, if works fine for me:
>
> [rhc@odin ~/v1.6]$ cat rf
> rank 0=odin127 slot=0:0-1,1:0-1
> rank 1=odin128 slot=1
>
> [rhc@odin ~/v1.6]$ mpirun -n 2 -rf .
Using an svn checkout of the current 1.6 branch, if works fine for me:
[rhc@odin ~/v1.6]$ cat rf
rank 0=odin127 slot=0:0-1,1:0-1
rank 1=odin128 slot=1
[rhc@odin ~/v1.6]$ mpirun -n 2 -rf ./rf --report-bindings hostname
[odin127.cs.indiana.edu:12078] MCW rank 0 bound to socket 0[core 0-1] socket
1
Hi
today I have installed openmpi-1.6.4rc3r27923. Unfortunately I
still have a problem with rankfiles, if I start a process on a
remote machine.
tyr rankfiles 114 ssh linpc1 ompi_info | grep "Open MPI:"
Open MPI: 1.6.4rc3r27923
tyr rankfiles 115 cat rf_linpc1
rank 0=linpc1 slot
Found it! A trivial error (missing a break in a switch statement) that only
impacts things if multiple sockets are specified in the slot_list. CMR filed to
include the fix in 1.6.4
Thanks for your patience
Ralph
On Jan 24, 2013, at 7:50 PM, Ralph Castain wrote:
> I built the current 1.6 branc
I built the current 1.6 branch (which hasn't seen any changes that would impact
this function) and was able to execute it just fine on a single socket machine.
I then gave it your slot-list, which of course failed as I don't have two
active sockets (one is empty), but it appeared to parse the li
Hi
I have installed openmpi-1.6.4rc2 and have still a problem with my
rankfile.
linpc1 rankfiles 113 ompi_info | grep "Open MPI:"
Open MPI: 1.6.4rc2r27861
linpc1 rankfiles 114 cat rf_linpc1
rank 0=linpc1 slot=0:0-1,1:0-1
linpc1 rankfiles 115 mpiexec -report-bindings -np 1 \
-
Hi
do you know when you will have time to solve the problem with a
rankfile? In the past you told me that my rankfile is correct.
linpc1 rankfiles 120 ompi_info | grep "Open MPI:"
Open MPI: 1.6.4a1r27766
linpc1 rankfiles 121 mpiex
I filed a bug fix for this one. However, something you should note.
If you fail to provide a "-np N" argument to mpiexec, we assume you want ALL
all available slots filled. The rankfile will contain only those procs that you
want specifically bound. The remaining procs will be unbound.
So with
I saw your earlier note about this too. Just a little busy right now, but hope
to look at it soon.
Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used
code path.
On Oct 3, 2012, at 3:03 AM, Siegmar Gross
wrote:
> Hi,
>
> I want to test process bindings with a ran
Hi,
I want to test process bindings with a rankfile in openmpi-1.6.2. Both
machines are dual-processor dual-core machines running Solaris 10 x86_64.
tyr fd1026 138 cat host_sunpc0_1
sunpc0 slots=4
sunpc1 slots=4
tyr fd1026 139 cat rankfile
rank 0=sunpc0 slot=0:0-1,1:0-1
rank 1=sunpc1 slot=0:0-
Hi,
I installed openmpi-1.6.2 on our heterogeneous platform (Solaris 10
Sparc, Solaris 10 x86_84, and Linux x86_64).
tyr small_prog 125 mpiexec -report-bindings -np 4 -host sunpc0,sunpc1 \
-bysocket -bind-to-core date
Mon Oct 1 07:53:15 CEST 2012
[sunpc0:02084] MCW rank 0 bound to socket 0[co
We actually include hwloc v1.3.2 in the OMPI v1.6 series.
Can you download and try that on your machines?
http://www.open-mpi.org/software/hwloc/v1.3/
In particular try the hwloc-bind executable (outside of OMPI), and see if
binding works properly on your machines. I typically run a te
Hmmm...well, let's try to isolate this a little. Would you mind installing a
copy of the current trunk on this machine and trying it?
I ask because I'd like to better understand if the problem is in the actual
binding mechanism (i.e., hwloc), or in the code that computes where to bind the
proce
Hi,
> > are the following outputs helpful to find the error with
> > a rankfile on Solaris?
>
> If you can't bind on the new Solaris machine, then the rankfile
> won't do you any good. It looks like we are getting the incorrect
> number of cores on that machine - is it possible that it has
> hard
On Sep 7, 2012, at 5:41 AM, Siegmar Gross
wrote:
> Hi,
>
> are the following outputs helpful to find the error with
> a rankfile on Solaris?
If you can't bind on the new Solaris machine, then the rankfile won't do you
any good. It looks like we are getting the incorrect number of cores on th
Hi,
are the following outputs helpful to find the error with
a rankfile on Solaris? I wrapped long lines so that they
are easier to read. Have you had time to look at the
segmentation fault with a rankfile which I reported in my
last email (see below)?
"tyr" is a two processor single core machine
I couldn't really say for certain - I don't see anything obviously wrong with
your syntax, and the code appears to be working or else it would fail on the
other nodes as well. The fact that it fails solely on that machine seems
suspect.
Set aside the rankfile for the moment and try to just bind
Hi,
I'm new to rankfiles so that I played a little bit with different
options. I thought that the following entry would be similar to an
entry in an appfile and that MPI could place the process with rank 0
on any core of any processor.
rank 0=tyr.informatik.hs-fulda.de
Unfortunately it's not all
Hi,
> Are *all* the machines Sparc? Or just the 3rd one (rs0)?
Yes, both machines are Sparc. I tried first in a homogeneous
environment.
tyr fd1026 106 psrinfo -v
Status of virtual processor 0 as of: 09/04/2012 07:32:14
on-line since 08/31/2012 15:44:42.
The sparcv9 processor operates at 160
Are *all* the machines Sparc? Or just the 3rd one (rs0)?
On Sep 3, 2012, at 12:43 PM, Siegmar Gross
wrote:
> Hi,
>
> the man page for "mpiexec" shows the following:
>
> cat myrankfile
> rank 0=aa slot=1:0-2
> rank 1=bb slot=0:0,1
> rank 2=cc slot=1-2
>
Hi,
the man page for "mpiexec" shows the following:
cat myrankfile
rank 0=aa slot=1:0-2
rank 1=bb slot=0:0,1
rank 2=cc slot=1-2
mpirun -H aa,bb,cc,dd -rf myrankfile ./a.out So that
Rank 0 runs on node aa, bound to socket 1, cores 0-2.
Ra
31 matches
Mail list logo