Hello Ralph,
> Try replacing --report-bindings with -mca hwloc_base_report_bindings 1
> and see if that works
I get even more warnings with the new option. It seems that I
always get the bindings only for the local machine. I used
Solaris Sparc (tyr), Solaris x86_64 (sunpc1), and Linux x86_64
(li
Try replacing --report-bindings with -mca hwloc_base_report_bindings 1 and see
if that works
On Aug 7, 2014, at 4:04 AM, Siegmar Gross
wrote:
> Hi,
>
>> I can't replicate - this worked fine for me. I'm at a loss as
>> to how you got that error as it would require some strange
>> error in the
Hi,
> I can't replicate - this worked fine for me. I'm at a loss as
> to how you got that error as it would require some strange
> error in the report-bindngs option. If you remove that option
> from your cmd line, does the problem go away?
Yes.
tyr openmpi_1.7.x_or_newer 468 mpiexec -np 4 -rf r
I can't replicate - this worked fine for me. I'm at a loss as to how you got
that error as it would require some strange error in the report-bindngs option.
If you remove that option from your cmd line, does the problem go away?
On Aug 5, 2014, at 12:56 AM, Siegmar Gross
wrote:
> Hi,
>
> ye
Okay, so this is a Sparc issue, not a rankfile one. I'm afraid my lack of time
and access to that platform will mean this won't get fixed for 1.7.4, but I'll
try to take a look at it when time permits.
On Jan 22, 2014, at 10:52 PM, Siegmar Gross
wrote:
> Dear Ralph,
>
> the same problems oc
Dear Ralph,
the same problems occur without rankfiles.
tyr fd1026 102 which mpicc
/usr/local/openmpi-1.7.4_64_cc/bin/mpicc
tyr fd1026 103 mpiexec --report-bindings -np 2 \
-host tyr,sunpc1 hostname
tyr fd1026 104 /opt/solstudio12.3/bin/sparcv9/dbx \
/usr/local/openmpi-1.7.4_64_cc/bin/mpiexe
Hard to know how to address all that, Siegmar, but I'll give it a shot. See
below.
On Jan 22, 2014, at 5:34 AM, Siegmar Gross
wrote:
> Hi,
>
> yesterday I installed openmpi-1.7.4rc2r30323 on our machines
> ("Solaris 10 x86_64", "Solaris 10 Sparc", and "openSUSE Linux
> 12.1 x86_64" with Sun C
Aha - I'm able to replicate it, will fix.
On Jan 29, 2013, at 11:57 AM, Ralph Castain wrote:
> Using an svn checkout of the current 1.6 branch, if works fine for me:
>
> [rhc@odin ~/v1.6]$ cat rf
> rank 0=odin127 slot=0:0-1,1:0-1
> rank 1=odin128 slot=1
>
> [rhc@odin ~/v1.6]$ mpirun -n 2 -rf .
Using an svn checkout of the current 1.6 branch, if works fine for me:
[rhc@odin ~/v1.6]$ cat rf
rank 0=odin127 slot=0:0-1,1:0-1
rank 1=odin128 slot=1
[rhc@odin ~/v1.6]$ mpirun -n 2 -rf ./rf --report-bindings hostname
[odin127.cs.indiana.edu:12078] MCW rank 0 bound to socket 0[core 0-1] socket
1
Found it! A trivial error (missing a break in a switch statement) that only
impacts things if multiple sockets are specified in the slot_list. CMR filed to
include the fix in 1.6.4
Thanks for your patience
Ralph
On Jan 24, 2013, at 7:50 PM, Ralph Castain wrote:
> I built the current 1.6 branc
I built the current 1.6 branch (which hasn't seen any changes that would impact
this function) and was able to execute it just fine on a single socket machine.
I then gave it your slot-list, which of course failed as I don't have two
active sockets (one is empty), but it appeared to parse the li
I filed a bug fix for this one. However, something you should note.
If you fail to provide a "-np N" argument to mpiexec, we assume you want ALL
all available slots filled. The rankfile will contain only those procs that you
want specifically bound. The remaining procs will be unbound.
So with
I saw your earlier note about this too. Just a little busy right now, but hope
to look at it soon.
Your rankfile looks fine, so undoubtedly a bug has crept into this rarely-used
code path.
On Oct 3, 2012, at 3:03 AM, Siegmar Gross
wrote:
> Hi,
>
> I want to test process bindings with a ran
We actually include hwloc v1.3.2 in the OMPI v1.6 series.
Can you download and try that on your machines?
http://www.open-mpi.org/software/hwloc/v1.3/
In particular try the hwloc-bind executable (outside of OMPI), and see if
binding works properly on your machines. I typically run a te
Hmmm...well, let's try to isolate this a little. Would you mind installing a
copy of the current trunk on this machine and trying it?
I ask because I'd like to better understand if the problem is in the actual
binding mechanism (i.e., hwloc), or in the code that computes where to bind the
proce
Hi,
> > are the following outputs helpful to find the error with
> > a rankfile on Solaris?
>
> If you can't bind on the new Solaris machine, then the rankfile
> won't do you any good. It looks like we are getting the incorrect
> number of cores on that machine - is it possible that it has
> hard
On Sep 7, 2012, at 5:41 AM, Siegmar Gross
wrote:
> Hi,
>
> are the following outputs helpful to find the error with
> a rankfile on Solaris?
If you can't bind on the new Solaris machine, then the rankfile won't do you
any good. It looks like we are getting the incorrect number of cores on th
Hi,
are the following outputs helpful to find the error with
a rankfile on Solaris? I wrapped long lines so that they
are easier to read. Have you had time to look at the
segmentation fault with a rankfile which I reported in my
last email (see below)?
"tyr" is a two processor single core machine
I couldn't really say for certain - I don't see anything obviously wrong with
your syntax, and the code appears to be working or else it would fail on the
other nodes as well. The fact that it fails solely on that machine seems
suspect.
Set aside the rankfile for the moment and try to just bind
Hi,
I'm new to rankfiles so that I played a little bit with different
options. I thought that the following entry would be similar to an
entry in an appfile and that MPI could place the process with rank 0
on any core of any processor.
rank 0=tyr.informatik.hs-fulda.de
Unfortunately it's not all
Hi,
> Are *all* the machines Sparc? Or just the 3rd one (rs0)?
Yes, both machines are Sparc. I tried first in a homogeneous
environment.
tyr fd1026 106 psrinfo -v
Status of virtual processor 0 as of: 09/04/2012 07:32:14
on-line since 08/31/2012 15:44:42.
The sparcv9 processor operates at 160
Are *all* the machines Sparc? Or just the 3rd one (rs0)?
On Sep 3, 2012, at 12:43 PM, Siegmar Gross
wrote:
> Hi,
>
> the man page for "mpiexec" shows the following:
>
> cat myrankfile
> rank 0=aa slot=1:0-2
> rank 1=bb slot=0:0,1
> rank 2=cc slot=1-2
>
22 matches
Mail list logo