OMPI Users,

I was wondering if there is a best way to "tune" vader to get around an
intermittent MPI_Wait halt?

I ask because I recently found that if I use Open MPI 2.1.x on either my
desktop or on the supercomputer I have access to, if vader is enabled, the
model seems to "deadlock" at an MPI_Wait call. If I run as:

  mpirun --mca btl self,sm,tcp

on my desktop it works. When I moved to my cluster, I tried the more
generic:

  mpirun --mca btl ^vader

since it uses openib, and with it things work. Well, I hope that's how one
would turn off vader in MCA speak. (Note: this deadlock seems a bit
sporadic, but I do now have a case which seems to cause it reproducibly).

Now, I know vader is supposed to be the "better" sm communication tech, so
I'd rather use it and thought maybe I could twiddle some tuning knobs. So I
looked at:

  https://www.open-mpi.org/faq/?category=sm

and there I saw question 6 "How do I know what MCA parameters are available
for tuning MPI performance?". But when I try the commands listed (minus the
HTML/CSS tags):

(1081) $ ompi_info --param btl sm
                 MCA btl: sm (MCA v2.1.0, API v3.0.0, Component v2.1.0)
(1082) $ ompi_info --param mpool sm
(1083) $

Huh. I expected more, but searching around the Open MPI FAQs made me think
I should use:

  ompi_info --param btl sm --level 9

which does spit out a lot, though the equivalent for mpool sm does not.

Any ideas on which of the many knobs is best to try and turn? Something
that, by default, perhaps is one thing for sm but different for vader? I
tried to see if "ompi_info --param btl vader --level 9" did something, but
it doesn't put anything out.

I will note that this code runs just fine with Open MPI 2.0.2 as well as
with Intel MPI and SGI MPT, so I'm thinking the code itself is okay, but
something from Open MPI 2.0.x to Open MPI 2.1.x changed. I see two entries
in the Open MPI 2.1.0 announcement about vader, but nothing specific about
how to "revert" if they are even causing the problem:

- Fix regression that lowered the memory maximum message bandwidth for
  large messages on some BTL network transports, such as openib, sm,
  and vader.


- The vader BTL is now more efficient in terms of memory usage when
  using XPMEM.


Thanks for any help,
Matt


-- 
Matt Thompson

Man Among Men
Fulcrum of History
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Reply via email to