Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

Xiaolong Cui Thu, 11 Aug 2016 14:09:07 -0700

Thanks! I tried it, but it didn't solve my problem. Maybe the reason is not
eager/rndv.


The reason why I want to always use eager mode is that I want to avoid a
sender being slowed down by an unready receiver. I can prevent a sender
from slowing down by always using eager mode on InfiniBand, just like your
approach, but I cannot repeat this on OPA. Based on the experiments below,
it seems to me that a sender will be delayed to some extent due to reasons
other than eager/rndv.

I designed a simple test (see hello_world.c in attachment) where there is
one sender rank (r0) and one receiver rank (r1). r0 always runs at full
speed, but r1 runs at full speed in one case and half speed in the second
case. To run r1 at half speed, I collate a third rank r2 with r1 (see
rankfile). Then I compare the completion time at r0 to see if there is a
slow down when r1 is "unready to receive". The result is positive. But it
is surprising that the delay varies significantly when I change the message
length. This is different from my previous observation when eager/rndv is
the cause.

So my question is do you know other factors that cause a delay to a
MPI_Send() when the receiver is not ready to receive?




On Wed, Aug 10, 2016 at 11:48 PM, Cabral, Matias A <
matias.a.cab...@intel.com> wrote:

> To remain in eager mode you need to increase the size of 
> PSM2_MQ_RNDV_HFI_THRESH.
>
>
> PSM2_MQ_EAGER_SDMA_SZ is the threshold at which PSM changes from PIO (uses
> the CPU) to start setting SDMA engines.  This summary may help:
>
>
>
> PIO Eager Mode:              0 bytes -> PSM2_MQ_EAGER_SDMA_SZ - 1
>
> SDMA Eager Mode:        PSM2_MQ_EAGER_SDMA_SZ -> PSM2_MQ_RNDV_HFI_THRESH -
> 1
>
> RNDZ Expected:               PSM2_MQ_RNDV_HFI_THRESH -> Largest supported
> value.
>
>
>
> Regards,
>
>
>
> _MAC
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org] *On Behalf Of 
> *Xiaolong
> Cui
> *Sent:* Wednesday, August 10, 2016 7:19 PM
> *To:* Open MPI Users <users@lists.open-mpi.org>
> *Subject:* Re: [OMPI users] runtime performance tuning for Intel OMA
> interconnect
>
>
>
> Hi Matias,
>
>
>
> Thanks a lot, that's very helpful!
>
>
>
> What I need indeed is to always use eager mode. But I didn't find any
> information about PSM2_MQ_EAGER_SDMA_SZ online. Would you please
> elaborate on "Just in case PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA,
> always in eager mode."
>
>
>
> Thanks!
>
> Michael
>
>
>
> On Wed, Aug 10, 2016 at 3:59 PM, Cabral, Matias A <
> matias.a.cab...@intel.com> wrote:
>
> Hi Michael,
>
>
>
> When Open MPI run on Omni-Path it will choose the PSM2 MTL by default, to
> use the libpsm2.so. Strictly speaking, it has compatibility to run using
> the openib BTL. However, the performance so significantly impacted that it
> is, not only discouraged, but no tuning would make sense. Regarding the
> PSM2 MTL, currently it only supports two mca parameters
> ("mtl_psm2_connect_timeout" and "mtl_psm2_priority") which are not for what
> you are looking for. Instead, you can set values directly in the PSM2
> library with environment variables.  Further info in the Programmers Guide:
>
>
>
> http://www.intel.com/content/dam/support/us/en/documents/
> network-and-i-o/fabric-products/Intel_PSM2_PG_H76473_v3_0.pdf
>
> More docs:
>
>
>
> https://www-ssl.intel.com/content/www/us/en/support/
> network-and-i-o/fabric-products/000016242.html?wapkw=psm2
>
>
>
> Now, for your parameters:
>
>
>
> btl = openib,vader,self  -> Ignore this one
>
> btl_openib_eager_limit = 160000   -> I don’t clearly see the diff with the
> below parameter. However, they are set to the same value. Just in case
> PSM2_MQ_EAGER_SDMA_SZ changes PIO to SDMA, always in eager mode.
>
> btl_openib_rndv_eager_limit = 160000  -> PSM2_MQ_RNDV_HFI_THRESH
>
> btl_openib_max_send_size = 160000   -> does not apply to PSM2
>
> btl_openib_receive_queues = P,128,256,192,128:S,2048,1024,
> 1008,64:S,12288,1024,1008,64:S,160000,1024,512,512  -> does not apply for
> PSM2.
>
>
>
> Thanks,
>
> Regards,
>
>
>
> _MAC
>
> BTW, should change the subject OMA -> OPA
>
>
>
>
>
> *From:* users [mailto:users-boun...@lists.open-mpi.org
> <users-boun...@lists.open-mpi.org>] *On Behalf Of *Xiaolong Cui
> *Sent:* Tuesday, August 09, 2016 2:22 PM
> *To:* users@lists.open-mpi.org
> *Subject:* [OMPI users] runtime performance tuning for Intel OMA
> interconnect
>
>
>
> I used to tune the performance of OpenMPI on InfiniBand by changing the
> OpenMPI MCA parameters for openib component (see
> https://www.open-mpi.org/faq/?category=openfabrics). Now I migrate to a
> new cluster that deploys Intel's omni-path interconnect, and my previous
> approach does not work any more. Does anyone know how to tune the
> performance for omni-path interconnect (what OpenMPI component to change) ?
>
>
>
> The version of OpenMPI is openmpi-1.10.2-hfi. I have included the output
> from opmi_info and openib parameters that I used to change. Thanks!
>
>
>
> Sincerely,
>
> Michael
>
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://rfd.newmexicoconsortium.org/mailman/listinfo/users
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
>

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://rfd.newmexicoconsortium.org/mailman/listinfo/users

Re: [OMPI users] runtime performance tuning for Intel OMA interconnect

Reply via email to