Re: [OMPI users] Fwd: [EXTERNAL] Re: How to find MPI ranks located in remote nodes?

2014-11-28 Thread Nick Papior Andersen
So I reworked the idea and got it working.
I also got it compiled.
The non-standard flags are now with OMPI_ while the standard has MPI_.
I also added to more split types.
The manual is also updated.

>> Note to devs:
I had problems right after the autogen.pl script.
Procedure:
$> git clone .. ompi
$> cd ompi
$> ./autogen.pl
My build versions:
m4: 1.4.17
automake: 1.14
autoconf: 2.69
libtool: 2.4.3
the autogen completes successfully (attached is the autogen output if
needed)
$> mkdir build
$> cd build
$> ../configure --with-platform=optimized
I have attached the config.log (note that I have tested it with both the
shipped 1.9.1 and 1.10.0 hwloc)
$> make all
Error message is:
make[2]: Entering directory '/home/nicpa/test/build/opal/libltdl'
CDPATH="${ZSH_VERSION+.}:" && cd ../../../opal/libltdl && /bin/bash
/home/nicpa/test/config/missing aclocal-1.14 -I ../../config
aclocal-1.14: error: ../../config/autogen_found_items.m4:308: file
'opal/mca/backtrace/configure.m4' does not exist
this error message is the same as found:
http://www.open-mpi.org/community/lists/devel/2013/07/12504.php
My work-around is simple
It has to do with the created ACLOCAL_AMFLAGS variable
in build/opal/libltdl/Makefile
OLD:
ACLOCAL_AMFLAGS = -I ../../config
CORRECT:
ACLOCAL_AMFLAGS = -I ../../
Either the configure script creates the wrong include paths for the m4
scripts, or the m4 scripts are not copied fully to the config directory.
Ok, it works and the fix is simple. I just wonder why?
<< End note to devs

First here is my test system 1:
$> hwloc-info
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 2 L2Cache (type #4)
depth 4: 2 L1dCache (type #4)
depth 5: 2 L1iCache (type #4)
depth 6: 2 Core (type #5)
depth 7: 4 PU (type #6)
Special depth -3: 2 Bridge (type #9)
Special depth -4: 4 PCI Device (type #10)
Special depth -5: 5 OS Device (type #11)
and my test system 2:
depth 0: 1 Machine (type #1)
depth 1: 1 Socket (type #3)
depth 2: 1 L3Cache (type #4)
depth 3: 4 L2Cache (type #4)
depth 4: 4 L1dCache (type #4)
depth 5: 4 L1iCache (type #4)
depth 6: 4 Core (type #5)
depth 7: 8 PU (type #6)
Special depth -3: 3 Bridge (type #9)
Special depth -4: 3 PCI Device (type #10)
Special depth -5: 4 OS Device (type #11)

Here is an excerpt of what it can do (I have attached a fortran program
that creates a communicator using all types):

Example of MPI_Comm_Split_Type

Currently using 4 nodes.

Comm using CU Node: 2 local rank: 2 out of 4 ranks
Comm using CU Node: 3 local rank: 3 out of 4 ranks
Comm using CU Node: 1 local rank: 1 out of 4 ranks
Comm using CU Node: 0 local rank: 0 out of 4 ranks

Comm using Host Node: 0 local rank: 0 out of 4 ranks
Comm using Host Node: 2 local rank: 2 out of 4 ranks
Comm using Host Node: 3 local rank: 3 out of 4 ranks
Comm using Host Node: 1 local rank: 1 out of 4 ranks

Comm using Board Node: 2 local rank: 2 out of 4 ranks
Comm using Board Node: 3 local rank: 3 out of 4 ranks
Comm using Board Node: 1 local rank: 1 out of 4 ranks
Comm using Board Node: 0 local rank: 0 out of 4 ranks

Comm using Node Node: 0 local rank: 0 out of 4 ranks
Comm using Node Node: 1 local rank: 1 out of 4 ranks
Comm using Node Node: 2 local rank: 2 out of 4 ranks
Comm using Node Node: 3 local rank: 3 out of 4 ranks

Comm using Shared Node: 0 local rank: 0 out of 4 ranks
Comm using Shared Node: 3 local rank: 3 out of 4 ranks
Comm using Shared Node: 1 local rank: 1 out of 4 ranks
Comm using Shared Node: 2 local rank: 2 out of 4 ranks

Comm using Numa Node: 0 local rank: 0 out of 1 ranks
Comm using Numa Node: 2 local rank: 0 out of 1 ranks
Comm using Numa Node: 3 local rank: 0 out of 1 ranks
Comm using Numa Node: 1 local rank: 0 out of 1 ranks

Comm using Socket Node: 1 local rank: 0 out of 1 ranks
Comm using Socket Node: 2 local rank: 0 out of 1 ranks
Comm using Socket Node: 3 local rank: 0 out of 1 ranks
Comm using Socket Node: 0 local rank: 0 out of 1 ranks

Comm using L3 Node: 0 local rank: 0 out of 1 ranks
Comm using L3 Node: 3 local rank: 0 out of 1 ranks
Comm using L3 Node: 1 local rank: 0 out of 1 ranks
Comm using L3 Node: 2 local rank: 0 out of 1 ranks

Comm using L2 Node: 2 local rank: 0 out of 1 ranks
Comm using L2 Node: 3 local rank: 0 out of 1 ranks
Comm using L2 Node: 1 local rank: 0 out of 1 ranks
Comm using L2 Node: 0 local rank: 0 out of 1 ranks

Comm using L1 Node: 0 local rank: 0 out of 1 ranks
Comm using L1 Node: 1 local rank: 0 out of 1 ranks
Comm using L1 Node: 2 local rank: 0 out of 1 ranks
Comm using L1 Node: 3 local rank: 0 out of 1 ranks

Comm using Core Node: 0 local rank: 0 out of 1 ranks
Comm using Core Node: 3 local rank: 0 out of 1 ranks
Comm using Core Node: 1 local rank: 0 out of 1 ranks
Comm using Core Node: 2 local rank: 0 out of 1 ranks

Comm using HW Node: 2 local rank: 0 out of 1 ranks
Comm using HW Node: 3 local rank: 0 out of 1 ranks
Comm using HW Node: 1 local rank: 0 out of 1 ranks
Comm using HW Node: 0 local rank: 0 out of 1 ranks

This is the output o

Re: [OMPI users] Fwd: [EXTERNAL] Re: How to find MPI ranks located in remote nodes?

2014-11-28 Thread George Bosilca
The same functionality can be trivially achieved at the user level using
Adam's approach. If we provide a shortcut in Open MPI, we should emphasize
this is an MPI extension, and offer the opportunity to other MPI to provide
a compatible support

Thus, I would name all new types MPIX_ instead of OMPI_ and remove them
from the default mpi.h (or "include mpi") to force the users to use
mpiext.h and "include mpiext" in order to be able to access them.

  George.


On Fri, Nov 28, 2014 at 3:20 AM, Nick Papior Andersen 
wrote:

> So I reworked the idea and got it working.
> I also got it compiled.
> The non-standard flags are now with OMPI_ while the standard has MPI_.
> I also added to more split types.
> The manual is also updated.
>
> >> Note to devs:
> I had problems right after the autogen.pl script.
> Procedure:
> $> git clone .. ompi
> $> cd ompi
> $> ./autogen.pl
> My build versions:
> m4: 1.4.17
> automake: 1.14
> autoconf: 2.69
> libtool: 2.4.3
> the autogen completes successfully (attached is the autogen output if
> needed)
> $> mkdir build
> $> cd build
> $> ../configure --with-platform=optimized
> I have attached the config.log (note that I have tested it with both the
> shipped 1.9.1 and 1.10.0 hwloc)
> $> make all
> Error message is:
> make[2]: Entering directory '/home/nicpa/test/build/opal/libltdl'
> CDPATH="${ZSH_VERSION+.}:" && cd ../../../opal/libltdl && /bin/bash
> /home/nicpa/test/config/missing aclocal-1.14 -I ../../config
> aclocal-1.14: error: ../../config/autogen_found_items.m4:308: file
> 'opal/mca/backtrace/configure.m4' does not exist
> this error message is the same as found:
> http://www.open-mpi.org/community/lists/devel/2013/07/12504.php
> My work-around is simple
> It has to do with the created ACLOCAL_AMFLAGS variable
> in build/opal/libltdl/Makefile
> OLD:
> ACLOCAL_AMFLAGS = -I ../../config
> CORRECT:
> ACLOCAL_AMFLAGS = -I ../../
> Either the configure script creates the wrong include paths for the m4
> scripts, or the m4 scripts are not copied fully to the config directory.
> Ok, it works and the fix is simple. I just wonder why?
> << End note to devs
>
> First here is my test system 1:
> $> hwloc-info
> depth 0: 1 Machine (type #1)
> depth 1: 1 Socket (type #3)
> depth 2: 1 L3Cache (type #4)
> depth 3: 2 L2Cache (type #4)
> depth 4: 2 L1dCache (type #4)
> depth 5: 2 L1iCache (type #4)
> depth 6: 2 Core (type #5)
> depth 7: 4 PU (type #6)
> Special depth -3: 2 Bridge (type #9)
> Special depth -4: 4 PCI Device (type #10)
> Special depth -5: 5 OS Device (type #11)
> and my test system 2:
> depth 0: 1 Machine (type #1)
> depth 1: 1 Socket (type #3)
> depth 2: 1 L3Cache (type #4)
> depth 3: 4 L2Cache (type #4)
> depth 4: 4 L1dCache (type #4)
> depth 5: 4 L1iCache (type #4)
> depth 6: 4 Core (type #5)
> depth 7: 8 PU (type #6)
> Special depth -3: 3 Bridge (type #9)
> Special depth -4: 3 PCI Device (type #10)
> Special depth -5: 4 OS Device (type #11)
>
> Here is an excerpt of what it can do (I have attached a fortran program
> that creates a communicator using all types):
>
> Example of MPI_Comm_Split_Type
>
> Currently using 4 nodes.
>
> Comm using CU Node: 2 local rank: 2 out of 4 ranks
> Comm using CU Node: 3 local rank: 3 out of 4 ranks
> Comm using CU Node: 1 local rank: 1 out of 4 ranks
> Comm using CU Node: 0 local rank: 0 out of 4 ranks
>
> Comm using Host Node: 0 local rank: 0 out of 4 ranks
> Comm using Host Node: 2 local rank: 2 out of 4 ranks
> Comm using Host Node: 3 local rank: 3 out of 4 ranks
> Comm using Host Node: 1 local rank: 1 out of 4 ranks
>
> Comm using Board Node: 2 local rank: 2 out of 4 ranks
> Comm using Board Node: 3 local rank: 3 out of 4 ranks
> Comm using Board Node: 1 local rank: 1 out of 4 ranks
> Comm using Board Node: 0 local rank: 0 out of 4 ranks
>
> Comm using Node Node: 0 local rank: 0 out of 4 ranks
> Comm using Node Node: 1 local rank: 1 out of 4 ranks
> Comm using Node Node: 2 local rank: 2 out of 4 ranks
> Comm using Node Node: 3 local rank: 3 out of 4 ranks
>
> Comm using Shared Node: 0 local rank: 0 out of 4 ranks
> Comm using Shared Node: 3 local rank: 3 out of 4 ranks
> Comm using Shared Node: 1 local rank: 1 out of 4 ranks
> Comm using Shared Node: 2 local rank: 2 out of 4 ranks
>
> Comm using Numa Node: 0 local rank: 0 out of 1 ranks
> Comm using Numa Node: 2 local rank: 0 out of 1 ranks
> Comm using Numa Node: 3 local rank: 0 out of 1 ranks
> Comm using Numa Node: 1 local rank: 0 out of 1 ranks
>
> Comm using Socket Node: 1 local rank: 0 out of 1 ranks
> Comm using Socket Node: 2 local rank: 0 out of 1 ranks
> Comm using Socket Node: 3 local rank: 0 out of 1 ranks
> Comm using Socket Node: 0 local rank: 0 out of 1 ranks
>
> Comm using L3 Node: 0 local rank: 0 out of 1 ranks
> Comm using L3 Node: 3 local rank: 0 out of 1 ranks
> Comm using L3 Node: 1 local rank: 0 out of 1 ranks
> Comm using L3 Node: 2 local rank: 0 out of 1 ranks
>
> Comm using L2 Node: 2 local rank: 0 out of 1 ranks
> Comm using L2 Node: 3 local rank: 0 out of 

Re: [OMPI users] "default-only MCA variable"?

2014-11-28 Thread Dave Love
Gilles Gouaillardet  writes:

> It could be because configure did not find the knem headers and hence knem is 
> not supported and hence this mca parameter is read-only

Yes, in that case (though knem was meant to be used and it's annoying
that configure doesn't abort if it doesn't find something you've
explicitly asked for, and I didn't immediately need it).  However, I got
the same for at least mpi_abort_print_stack with that parameter set.

This didn't happen with OMPI 1.6 and there's no obvious way to turn it
off.



Re: [OMPI users] "default-only MCA variable"?

2014-11-28 Thread Dave Love
Gustavo Correa  writes:

> Hi Dave, Gilles, list
>
> There is a problem with knem in OMPI 1.8.3.
> A fix is supposed to come on OMPI 1.8.4.
> Please, see this long thread:
> http://www.open-mpi.org/community/lists/users/2014/10/25511.php
>
> Note also, as documented in the thread, 
> that in the OMPI 1.8 series "vader" replaces "sm" as the default intranode 
> btl.

Thanks.  I share the frustration (though my real ire currently is
directed at Red Hat for the MPI damage in RHEL 6.6).



Re: [OMPI users] mpi_wtime implementation

2014-11-28 Thread George Bosilca
https://github.com/open-mpi/ompi/pull/292

  George.


On Thu, Nov 27, 2014 at 7:45 AM, Jeff Squyres (jsquyres)  wrote:

> Gilles' concern is correct: we should never return timer values that go
> backwards.
>
> Perhaps the TSC-based WTIME should only be used in a process that is bound
> to a single core...?
>
> An MCA param can be used to force the switch between gettimeofday() and
> TSC, if someone really wants to take their chances with TSC when not bound
> to core (or bound to something wider than a core).
>
>
>
> On Nov 27, 2014, at 5:41 AM, Alex A. Granovsky 
> wrote:
>
> > AFAIK, Linux synchronizes all CPU timers on boot. The skew is normally
> no more than 50-100 CPU cycles.
> >
> > The reasons why you can observe larger differences are:
> >
> > 1) Main. The CPUs do not have "constant TSC" feature . Without this
> feature timer frequency changes across different power states of CPU or
> core.
> > 2) Secondary. Some motherboard can overclock CPUs depending on load
> using FSB clock generator. This results in CPU timers ticking faster or
> slower than expected, even with "constant TSC" feature  (which is no longer
> constant again).
> >
> > Kind regards,
> > Alex Granovsky
> >
> >
> >
> > -Original Message- From: Gilles Gouaillardet
> > Sent: Thursday, November 27, 2014 1:13 PM
> > To: Open MPI Users
> > Subject: Re: [OMPI users] mpi_wtime implementation
> >
> > Folks,
> >
> > one drawback of retrieving time with rdtsc is that this value is core
> > specific :
> > if a task is not bound to a core, then the value returned by MPI_Wtime()
> > might go backward.
> >
> > if i run the following program with
> > taskset -c 1 ./time
> >
> > and then move it accross between cores
> > (taskset -cp 0  ; taskset -cp 2 ; ...)
> > then the program can abort. in my environment, i can measure up to 150ms
> > difference.
> >
> > /* some mtt tests will abort if this condition is met */
> >
> >
> > i was unable to observe this behavior with gettimeofday()
> >
> > /* though it could occur when ntpd synchronizes the clock */
> >
> > is there any plan to make the timer function selectable via a mca param ?
> > or to automatically fallback to gettimeofday if a task is not bound on a
> > core ?
> >
> > Cheers,
> >
> > Gilles
> >
> > $ cat time.c
> > #include 
> > #include 
> >
> > int main (int argc, char *argv[]) {
> >   int i;
> >   double t = 0;
> >   MPI_Init(&argc, &argv);
> >   for (;;) {
> >   double _t = MPI_Wtime();
> >   if (_t < t) {
> >   fprintf(stderr, "going back in time %lf < %lf\n", _t, t);
> >   MPI_Abort(MPI_COMM_WORLD, 1);
> >   }
> >   t = _t;
> >   }
> >   MPI_Finalize();
> >   return 0;
> > }
> >
> > On 2014/11/25 1:59, Dave Goodell (dgoodell) wrote:
> >> On Nov 24, 2014, at 12:06 AM, George Bosilca 
> wrote:
> >>
> >>> https://github.com/open-mpi/ompi/pull/285 is a potential answer. I
> would like to hear Dave Goodell comment on this before pushing it upstream.
> >>>
> >>>  George.
> >> I'll take a look at it today.  My notification settings were messed up
> when you originally CCed me on the PR, so I didn't see this until now.
> >>
> >> -Dave
> >>
> >> ___
> >> users mailing list
> >> us...@open-mpi.org
> >> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> >> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25863.php
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25875.php
> >
> > ___
> > users mailing list
> > us...@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25876.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2014/11/25877.php
>