We can always build complicated solutions, but in some cases sane and
simple solutions exists. Let me clear some of the misinformation in this
thread.

The MPI standard is clear what type of conversion is allowed and how it
should be done (for more info read Chapter 4): no type conversion is
allowed (don't send a long and expect a short), for everything else
truncation to a sane value is the rule. This is nothing new, the rules are
similar to other data conversion standards such as XDR. Thus, if you send
an MPI_LONG from a machine where long is 8 bytes to an MPI_LONG on a
machine where it is 4 bytes, you will get a valid number when possible,
otherwise [MAX|MIN]_LONG on the target machine. For floating point data the
rules are more complicated due to potential exponent and mantissa length
mismatch, but in general if the data is representable on the target
architecture a sane value is obtained. Otherwise, the data will be replaced
with one of the extremes. This also applies to file operations for as long
as the correct external32 type is used.

The datatype engine in Open MPI supports all these conversions, for as long
as the source and target machine are correctly identified. This
identification is only enabled when OMPI is compiled with support for
heterogeneous architectures.

  George.


On Wed, Apr 4, 2018 at 11:35 AM, George Reeke <re...@mail.rockefeller.edu>
wrote:

> Dear colleagues,
>    FWIW, years ago I was looking at this problem and developed my
> own solution (for C programs) with this structure:
> --Be sure your code that works with ambiguous-length types like
> 'long' can handle different sizes.  I have replacement unambiguous
> typedef names like 'si32', 'ui64' etc. for the usual signed and
> unsigned fixed-point numbers.
> --Run your source code through a utility that analyzes a specified
> set of variables, structures, and unions that will be used in
> messages and builds tables giving their included types.  Include
> these tables in your makefiles.
> --Replace malloc, calloc, realloc, free with my own versions,
> where you pass a type argument pointing into to this table along
> with number of items, etc.  There are separate memory pools for
> items that will be passed often, rarely, or never, just to make
> things more efficient.
> --Do all these calls on the rank 0 processor at program startup and
> call a special broadcast routine that sets up data structures on
> all the other processors to manage the conversions.
> --Replace mpi message passing and broadcast calls with new routines
> that use the type information (stored by malloc, calloc, etc.) to
> determine what variables to lengthen or shorten or swap on arrival
> at the destination.  Regular mpi message passing is used inside
> these routines and can be used natively for variables that do not
> ever need length changes or byte swapping (i.e. text).  I have a
> simple set of routines to gather statistics across nodes with sum,
> max, etc. operations, but not too fancy.  I do not have versions of
> any of the mpi operations that collect or distribute matrices, etc.
> --A little routine must be written for every union.  This is called
> from the package when a union is received to determine which
> member is present so the right conversion can be done.
> --There was a hook to handle IBM (hex exponent) vs IEEE floating
> point, but the code never got written.
>    Because this is all very complicated and demanding on the
> programmer, I am not making it publicly available, but will be
> glad to send it privately to anyone who really thinks they can
> use it and is willing to get their hands dirty.
>    George Reeke (private email: re...@rockefeller.edu)
>
>
>
>
>
>
> On Tue, 2018-04-03 at 23:39 +0000, Jeff Squyres (jsquyres) wrote:
> > On Apr 2, 2018, at 1:39 PM, dpchoudh . <dpcho...@gmail.com> wrote:
> > >
> > > Sorry for a pedantic follow up:
> > >
> > > Is this (heterogeneous cluster support) something that is specified by
> > > the MPI standard (perhaps as an optional component)?
> >
> > The MPI standard states that if you send a message, you should receive
> the same values at the receiver.  E.g., if you sent int=3, you should
> receive int=3, even if one machine is big endian and the other machine is
> little endian.
> >
> > It does not specify what happens when data sizes are different (e.g., if
> type X is 4 bits on one side and 8 bits on the other) -- there's no good
> answers on what to do there.
> >
> > > Do people know if
> > > MPICH. MVAPICH, Intel MPI etc support it? (I do realize this is an
> > > OpenMPI forum)
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users
>
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to