Re: [OMPI users] About the Open-MPI point-to-point messaging layers

Jeff Squyres Mon, 2 Jul 2012 17:21:30 -0400

On Jun 30, 2012, at 9:46 PM, Sébastien Boisvert wrote:

> I really like Open-MPI and its Modular Component Architecture.
> The --mca parameters are so useful for learning and testing things !


Good!

> So here are my questions.
> 
> I know that the default point-to-point messaging layer is ob1
> (the Obi-Wan Kenobi PML). I know that there is also the PML
> cm (the Connor MacLeod PML).
> 
> From what I understand, the force is strong with Obi-Wan Kenobi, so he
> can use various byte transfer layers (BTLs).
> And there can be only one highlander (probably Connor MacLeod) so
> when I use the MTL psm, I can not use any of the BTLs because Connor
> MacLeod can only be alone at the end.

Exactly.

> But what about the PML csum ?
> 
> What exactly is the PML csum (checksum) doing ?

csum is a clone of ob1 and that adds checksumming as a data check -- it is 
helpful in some environments where you're not entirely sure if your underlying 
"reliable" transport may actually be silently corrupting data under the covers.

That being said, I'm not sure how much csum (hahah! Apple Mail keeps 
autocorrecting that to "scum" :-) ) has kept up with all the recent ob1 
advances.  So it may actually be lagging a bit.  As I understand it, csum will 
likely not be included in v1.7.

> Which code is the PML csum using for actually transferring stuff between
> MPI ranks ? BTLs or MTLs or something else or nothing ?

BTLs.

> I have searched the web a little but have not found much about it.

It was created by a vendor for a very specific purpose on a very specific 
network. It hasn't seen much use since then.

> If I use the MTL psm, can the PML csum be used to detect message
> corruption ? I guess the answer is no because csum is not Connor MacLeod.
> 
> I have read that when the MTL psm is used, all the Open-MPI BTL objects are
> disabled.

Correct.

> What code would the PML dr use to move bytes around should it
> be stable and production-ready ?

dr was never finished.  It was meant to be a fault-tolerant version of ob1.  
So, sadly, it also didn't keep up with all the changes in ob1 over the years, 
and was also never finished.

> And my final question:
> 
> When a company design a new interconnect, why choose the MTL architecture
> (and thus the PML cm) instead of the BTL architecture (with the ob1 PML) ?

BTLs are relatively easy to write.  They work for any old byte-pushing network.

MTLs require a bit more MPI co-design with the network.  MTLs are for networks 
that can either natively perform MPI-style message matching on the network or 
emulate it well enough (e.g., PSM does it all in software, as does MXM).  

> It seems to me that ob1 and BTLs are mature and that BTLs self and sm are 
> quite
> useful and bug-free for what I know. New code should only do the case when 
> the two
> MPI processes are on different nodes, right ?

Correct.

-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to: 
http://www.cisco.com/web/about/doing_business/legal/cri/

Re: [OMPI users] About the Open-MPI point-to-point messaging layers

Reply via email to