Re: [OMPI users] How to construct a datatype over two different arrays?

Jeff Squyres Thu, 1 Nov 2007 09:44:30 -0400

On Oct 31, 2007, at 5:52 PM, Oleg Morajko wrote:

Let me clarify the context of the problem. I'm implementing a MPIpiggyback mechanism that should allow for attaching extra data toany MPI message. The idea is to wrap MPI communication calls withPMPI interface (or with dynamic instrumentation or whatsoever) andadd/receive extra data in a non expensive way. The best solution Ihave found so far is dynamic datatype wrapping. That is when a usercalls MPI_Send (datatype, count) I create dynamically a newstructure type that contains an array [count] of datatype and extradata. To avoid copying the original send buffer I use absoluteaddresses to define displacaments in the structure. This works finefor all P2P calls and MPI_Bcast. And definitely it has performancebenefits when compared to copying bufferers or sending anadditional message in a different communicator. Or would you expectsomething different?
The only problem are collective calls like MPI_Gather when a rootprocess receives an array of data items. There is no problem towrap the message on the sender side (for each task), but thequestion is how to define a datatype that points both to originalreceive buffer and extra buffer for piggybacked data AND has anadecuate extent to work as an array element.
The real problem is that a structure datatype { original data,extra data} does not have a constant displacement between theoriginal data and extra data. Eg. consider original data = receivebuffer in MPI_Gather and extra data is an array of ints somewherein memory). So it cannot be directly used as an array datatype.

I guess I don't see why this is a problem...? If you're alreadymaking a specific datatype for this communication, MPI's datatypeprimitives are flexible enough to allow what you describe. Keep inmind that you can nest datatypes (e.g., with TYPE_CREATE_STRUCT).

But for collectives, I think you need to decide exactly whatinformation you want to generate / save. Specifically, if you'repiggybacking on collectives, you are stuck using the samecommunication pattern as that collective. I.e., if the applicationcalls MPI_REDUCE with MPI_SUM, I imagine you'll have a difficult timepiggybacking your data on that reduction without it being summedacross all the processes.

There are a few other canonical solutions to the "need to save extradata about every communication" technique:

- for small messages, do what you're doing: a) make a new/specificdatatype for p2p messages or b) memcpy the user+extra data into asmall contiguous buffer and then just send that (and memcpy out onthe receiver). If making datatypes is cheap in MPI, then a) iseffectively the same as b), and potentially more optimized/tuned.

- for large messages, don't bother making a new datatype -- just sendaround another message with your extra data. The performance impactwill be minimal because it's already a long message; don't force theMPI do to additional copies with a non-contiguous datatype if you canavoid it.

- for collectives, if you can't piggyback (e.g., REDUCE with SUM andothers), just send around another short message. Yes, you'll take aperformance hit for this.

- depending on what data you're piggybacking / collecting, it may bepossible to implement a "lazy" collection scheme in the meta/PMPIlayer. E.g., for when you send separate messages with your metadata, always use non-blocking sends. The receiver PMPI layer canlazily collect this data and match it with application sends/receivesafter the fact (i.e., don't be trapped into thinking that you have todo the match exactly when the application data is actually sent orreceived -- it could be done after that).


Hope that helps.

Any solution? It could be complex, I don't mind ;)

On 11/1/07, George Bosilca <bosi...@eecs.utk.edu> wrote: The MPIstandard defines the upper bound and the upper bound for

similar problems. However, even with all the functions in the MPI
standard we cannot describe all types of data. There is always a
solution, but sometimes one has to ask if the performance gain is
worth the complexity introduced.


As I said there is always a solution. In fact there are 2 solution,
one somehow optimal the other ... as bad as you can imagine.

The bad approach:
  1. Use an MPI_Type_struct to create exactly what you want, element

by element (i.e single pair). This can work in all cases. 2. Ifthe sizeof(int) == sizeof(double) then the displacement inside

each tuple (double_i, int_i) is constant. Therefore, you can start by
creating one "single element" type and then use for each send the
correct displacement in the array (added to the send buffer,
respectively to the receive one).

   george.

On Oct 31, 2007, at 1:40 PM, Oleg Morajko wrote:

> Hello,
>
> I have the following problem. There areI two arrays somewere in the
> program:
>
> double weights [MAX_SIZE];
> ...
> int       values [MAX_SIZE];
> ...
>
> I need to be able to send a single pair { weights [i], values [i] }
> with a single MPI_Send call Or receive it directly into both arrays
> at at given index i. How can I define a datatype that spans this
> pair over both arrays?
>
> The only additional constraint it the fact that the memory location
> of both arrays is fixed and cannot be changed and I should avoid
> extra copies.
>
> Is it possible?
>
> Any help welcome,
> Oleg Morajko
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users



--
Jeff Squyres
Cisco Systems

Re: [OMPI users] How to construct a datatype over two different arrays?

Reply via email to