George Bosilca wrote:

A fix for this problem is now available on the trunk. Please use any revision after 14963 and your problem will vanish [I hope!]. There are now some additional parameters which allow you to select which Myrinet network you want to use in the case there are several available (--mca btl_mx_if_include and --mca btl_mx_if_exclude). Even multi-rails should now work over MX.

I have tried nightly snapshot openmpi-1.3a1r14981 and it (almost)
seems to work.  The version as is, when run in combination with
MX-1.2.0j and the FMA mapper, currently results in the following
error on each node:

mx_get_info(MX_LINE_SPEED) failed with status 35 (Bad info length)

However, with the small patch below, multi-cluster jobs indeed seem
to be running fine (using MX locally). I'll do some more testing
later this week.

Thanks a lot for the fix!
Kees


*** ./ompi/mca/btl/mx/btl_mx_component.c.orig   2007-06-11 17:12:11.000000000 
+0200
--- ./ompi/mca/btl/mx/btl_mx_component.c        2007-06-11 17:13:34.000000000 
+0200
***************
*** 310,316 ****
  #if defined(MX_HAS_NET_TYPE)
      {
          int value;
!         if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED, NULL, 
0,
                                     &value, sizeof(int))) != MX_SUCCESS ) {
opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d (%s)\n",
                           status, mx_strerror(status) );
--- 310,317 ----
  #if defined(MX_HAS_NET_TYPE)
      {
          int value;
!         if( (status = mx_get_info( mx_btl->mx_endpoint, MX_LINE_SPEED,
!                                    &nic_id, sizeof(nic_id),
                                     &value, sizeof(int))) != MX_SUCCESS ) {
opal_output( 0, "mx_get_info(MX_LINE_SPEED) failed with status %d (%s)\n",
                           status, mx_strerror(status) );

Reply via email to