[OMPI users] IB question (slightly off topic)

2016-03-12 Thread dpchoudh .
Hello all

I have a question, that I do realize, is somewhat off topic to this list.
But I do not know who to approach for an answer. Hopefully the community
here will help me out.

I know that Infiniband is a 'standard' interface (standardized by IETF?
IEEE? or some similar body), much like Ethernet.

However, I do see that they come in different 'flavors', (and have
different feature set?) such as Qlogic PSM or Mellanox ConnectX, that have
*user space* "drivers" and even OpenMPI treats them differently (preferring
Qlogic PSM over other IB, as a default behavior).

For someone very new to the Infiniband world, what are the differences? How
can they be different and yet confirm to the (supposed) standard?

Any pointer to appropriate literature is appreciated.

Thanks in advance
Durga

Life is complex. It has real and imaginary parts.


Re: [OMPI users] IB question (slightly off topic)

2016-03-12 Thread Gilles Gouaillardet
In my understanding, the standard is mainly for the hardware and misc stuff
only.
For example, mellanox and qlogic infiniband can use the same cables and
switches.
Iirc, they can use the same subnet manager and communicate via IPoIB.

When performance matters, mellanox use ib verbs, and qlogic uses PSM
library.
I am not sure what you mean by "ompi prefers PSM over other IB"
assuming qlogic can work with IB verbs, then yes, PSM is faster for qlogic,
so ompi will prefer PSM.
Mellanox infiniband cannot use PSM so ompi use IB verbs.
Note mellanox also provides optimized proprietary libraries (hcoll, mxm,
...) that can be used for enhanced performances.

Fwiw and iirc, Intel bought the infiniband assets from Qlogic a few years
ago.

Cheers,

Gilles

On Saturday, March 12, 2016, dpchoudh .  wrote:

> Hello all
>
> I have a question, that I do realize, is somewhat off topic to this list.
> But I do not know who to approach for an answer. Hopefully the community
> here will help me out.
>
> I know that Infiniband is a 'standard' interface (standardized by IETF?
> IEEE? or some similar body), much like Ethernet.
>
> However, I do see that they come in different 'flavors', (and have
> different feature set?) such as Qlogic PSM or Mellanox ConnectX, that have
> *user space* "drivers" and even OpenMPI treats them differently (preferring
> Qlogic PSM over other IB, as a default behavior).
>
> For someone very new to the Infiniband world, what are the differences?
> How can they be different and yet confirm to the (supposed) standard?
>
> Any pointer to appropriate literature is appreciated.
>
> Thanks in advance
> Durga
>
> Life is complex. It has real and imaginary parts.
>


Re: [OMPI users] Communication problem (on one node) when network interface is down

2016-03-12 Thread Gilles Gouaillardet
Also, loop back interface is somehow special.
though all nodes do have the same ip 127.0.0.1, this interface cannot be
used for inter node communication.

On Saturday, March 12, 2016, Jeff Squyres (jsquyres) 
wrote:

> It's set by default in btl_tcp_if_exclude (because in most cases, you *do*
> want to exclude the loopback interface -- it's much slower than other
> shared memory types of scenarios).  But this value can certainly be
> overridden:
>
> mpirun --mca btl_tcp_if_exclude '' 
>
>
>
> > On Mar 11, 2016, at 11:15 AM, dpchoudh .  > wrote:
> >
> > Hello all
> >
> > From a user standpoint, that does not seem right to me. Why should one
> need any kind of network at all if one is entirely dealing with a single
> node? Is there any particular reason OpenMPI does not/cannot use the lo
> (loopback) interface? I'd think it is there for exactly this kind of
> situation.
> >
> > Thanks
> > Durga
> >
> > Life is complex. It has real and imaginary parts.
> >
> > On Fri, Mar 11, 2016 at 6:08 AM, Gilles Gouaillardet <
> gilles.gouaillar...@gmail.com > wrote:
> > Spawned tasks cannot use the sm nor vader btl so you need an other one
> (tcp, openib, ...)
> > self btl is only to send/recvcount with oneself (e.g. it does not work
> for inter process and intra node communications.
> >
> > I am pretty sure the lo interface is always discarded by openmpi, so I
> have no solution on top of my head that involves openmpi.
> > maybe your bed bet is to use a "dummy" interface, for example tan or tun
> or even a bridge.
> >
> > Cheers,
> >
> > Gilles
> >
> >
> >
> > On Friday, March 11, 2016, Rémy Grünblatt  > wrote:
> > Hello,
> > I'm having communications problem between two processes (with one being
> > spawned by the other, on the *same* physical machine). Everything works
> > as expected when I have network interface such as eth0 or wlo1 up, but
> > as soon as they are down, I get errors (such as « At least one pair of
> > MPI processes are unable to reach each other for MPI communications […]
> »).
> > I tried to specify a set of mca parameters including the btl "self"
> > parameter and including the lo interface in btl_tcp_if_include list, as
> > advised by https://www.open-mpi.org/faq/?category=tcp but I didn't reach
> > any working state for this code with "external" network interface down.
> >
> > Got any idea about what I might do wrong ?
> >
> > Example code that triggers the problem: https://ptpb.pw/YOjr.tar.gz
> > Ompi_info:  https://ptpb.pw/Vt_V.txt
> > Full log: https://ptpb.pw/JCXn.txt
> >
> > Rémy
> >
> >
> >
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28684.php
> >
> > ___
> > users mailing list
> > us...@open-mpi.org 
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28687.php
>
>
> --
> Jeff Squyres
> jsquy...@cisco.com 
> For corporate legal information go to:
> http://www.cisco.com/web/about/doing_business/legal/cri/
>
> ___
> users mailing list
> us...@open-mpi.org 
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2016/03/28689.php


Re: [OMPI users] Error with MPI_Register_datarep

2016-03-12 Thread Éric Chamberland

Hi Gilles,


Le 16-03-10 23:14, Gilles Gouaillardet a écrit :

Eric,

my short answer is no.

long answer is :

- from MPI_Register_datarep()

   /* The io framework is only initialized lazily.  If it hasn't
   already been initialized, do so now (note that MPI_FILE_OPEN
   and MPI_FILE_DELETE are the only two places that it will be
   initialized). */

- from mca_io_base_register_datarep()
/* Find the maximum additional number of bytes required by all io
   components for requests and make that the request size */

OPAL_LIST_FOREACH(cli, 
&ompi_io_base_framework.framework_components, 
mca_base_component_list_item_t) {

...
}

in your case, since nor MPI_File_open nor MPI_File_delete is invoked, 
the ompio component could be disabled.
but that would mean the io component selection is also based on the 
fact that MPI_Register_datarep() has
been invoked or not before. i can foresee users complaining about IO 
performance discrepancies just because

of one line (e.g. MPI_Register_datarep invokation) in their code.

Ok, my situation is that I want a datarep only to have a compatible 
32bits code (long int) working with files written from 64bits code with 
"native" datarep...


So I want to activate the datarep functionality only into 32 bits 
compilation of the code...


Now, I continued my tests with "--mca io ^ompio", but I hit another 
wall:  when I try to use the datarep just to test it, I now have this 
message:


ERROR Returned by MPI: 51
ERROR_string Returned by MPI: MPI_ERR_UNSUPPORTED_DATAREP: data 
representation not supported


which is pretty similar to MPICH output...

So I am completely stuck into implementing a solution to read/write 
"native" 64 bits datarep files from a 32 bits architecture...


Isn't that into the MPI-2 standard?  Does it means that no MPI 
implementation is standard compliant?  >:)


now if MPI_File_open is invoked first, that means that 
MPI_Register_datarep will fail or success based on the selected io 
component (and iirc, that could be file(system) dependent within the 
same application).


i am open to suggestions, but so far, i do not see a better one (other 
than implementing this in OMPIO)
the patch for v1.10 can be downloaded at 
https://github.com/ggouaillardet/ompi-release/commit/1589278200d9fb363d61fa20fb39a4c2fa78c942.patch

application will not crash, but fail "nicely" on MPI_Register_datarep



In reality I want a solution to read/write files with the same API (MPI 
collective calls)... and produce files that are compatible between 
32bits vs 64 bits architecture relative to long int issue and without 
any lost of precision or performance for "native" 64bits architectures...


I saw about 4 years ago the example into the "Using MPI-2" book about 
datarep, and that let me though I could easily implement a solution to 
read/write files in a compatible format between architectures, even if I 
choose "native" datarep under a 64 bits architecture that is the only 
one really used into clusters and by our users until now...  I made the 
decision to code once, with all collective I/O calls, knowing I would be 
able to convert int32 to int64 when needed only...


Now, I feel I made a bad choice, since no MPI implementation have this 
working... or maybe there is a simple workaround?  Is there an 
"external64" available?  Is there something written into the file about 
the datarep?  If not, then "native" could still be as performant as 
"external64" since no conversion shall be done, but under 32bits 
architectures, there shall be some io performance losts, since more 
conversions will occur


Thanks for helping me understand!

Eric


Cheers,

Gilles

On 3/11/2016 12:11 PM, Éric Chamberland wrote:

Thanks Gilles!

it works... I will continue my tests with that command line...

Until OMPIO supports this, is there a way to put a call into the code 
to disable ompio the same way --mca io ^ompio does?


Thanks,

Eric

Le 16-03-10 20:13, Gilles Gouaillardet a écrit :

Eric,

I will fix the crash (fwiw, it is already fixed in v2.x and master)

note this program cannot currently run "as is".
by default, there are two frameworks for io : ROMIO and OMPIO.
MPI_Register_datarep does try to register the datarep into all 
frameworks,
and successes only if datarep was successfully registered into all 
frameworks.


OMPIO does not currently support this
(and the stub is missing in v1.10 so the app does not crash)

your test is successful if you blacklist ompio :

mpirun --mca io ^ompio ./int64
or
OMPI_MCA_io=^romio ./int64

and you do not even need a patch for that :-)


Cheers,

Gilles

On 3/11/2016 4:47 AM, Éric Chamberland wrote:

Hi,

I have a segfault while trying to use MPI_Register_datarep with 
openmpi-1.10.2:


mpic++ -g -o int64 int64.cc
./int64
[melkor:24426] *** Process received signal ***
[melkor:24426] Signal: Segmentation fault (11)
[melkor:24426] Signal code: Address not mapped (1)
[melkor:24426] Failing at address: (nil)
[melkor:24426] [