>>>>> "Mike" == Mike Dubman <mi...@dev.mellanox.co.il> writes:

    Mike> so, it seems you have old ofed w/o this parameter.  Can you
    Mike> install latest Mellanox ofed? or check which community ofed
    Mike> has it?

Rio is using the kernel.org drivers that are part of Ubuntu/3.13.x and
log_num_mtt is not a parameter in those drivers. In fact log_num_mtt
has never been a parameter in the kernel.org sources (just checked the
git commit history). And it's not needed anymore either, since the
following commit (which is also part of OFED 3.12 btw; Mike, seems
Mellanox OFED is behind with this respect):
-----------------------------------------------------------
commit db5a7a65c05867cb6ff5cb6d556a0edfce631d2d
Author: Roland Dreier <rol...@purestorage.com>
List-Post: users@lists.open-mpi.org
Date:   Mon Mar 5 10:05:28 2012 -0800

    mlx4_core: Scale size of MTT table with system RAM

    The current driver defaults to 1M MTT segments, where each segment holds
    8 MTT entries.  This limits the total memory registered to 8M * PAGE_SIZE
    which is 32GB with 4K pages.  Since systems that have much more memory
    are pretty common now (at least among systems with InfiniBand hardware),
    this limit ends up getting hit in practice quite a bit.

    Handle this by having the driver allocate at least enough MTT entries to
    cover 2 * totalram pages.

    Signed-off-by: Roland Dreier <rol...@purestorage.com>
-----------------------------------------------------------

The relevant code segment (drivers/net/ethernet/mellanox/mlx4/profile.c):

-----------------------------------------------------------
        /*
         * We want to scale the number of MTTs with the size of the
         * system memory, since it makes sense to register a lot of
         * memory on a system with a lot of memory.  As a heuristic,
         * make sure we have enough MTTs to cover twice the system
         * memory (with PAGE_SIZE entries).
         *
         * This number has to be a power of two and fit into 32 bits
         * due to device limitations, so cap this at 2^31 as well.
         * That limits us to 8TB of memory registration per HCA with
         * 4KB pages, which is probably OK for the next few months.
         */
        si_meminfo(&si);
        request->num_mtt =
                roundup_pow_of_two(max_t(unsigned, request->num_mtt,
                                         min(1UL << (31 - log_mtts_per_seg),
                                             si.totalram >> (log_mtts_per_seg - 
1))));
-----------------------------------------------------------

So the point here is that OpenMPI should check the mlx4 driver versions
and not output false warnings when newer drivers are used. Didn't check
whether this is fixed in the OpenMPI code repositories yet. It's not
fixed in 1.8.2rc4 anyway (static uint64_t calculate_max_reg in
ompi/mca/btl/openib/btl_openib.c). Also, the OpenMPI FAQ should be
corrected accordingly.

Rio as a note for you: You can safely ignore the warning.

Cheers,

Roland

-------
http://www.q-leap.com / http://qlustar.com
          --- HPC / Storage / Cloud Linux Cluster OS ---

    Mike> On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota
    Mike> <rioyok...@mac.com> wrote:

    >> Here is what "modinfo mlx4_core" gives
    >>
    >> filename:
    >> 
/lib/modules/3.13.0-34-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
    >> version: 2.2-1 license: Dual BSD/GPL description: Mellanox
    >> ConnectX HCA low-level driver author: Roland Dreier srcversion:
    >> 3AE29A0A6538EBBE9227361 alias:
    >> pci:v000015B3d00001010sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Fsv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Esv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Dsv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Csv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Bsv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000100Asv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001009sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001008sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001007sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001006sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001005sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001004sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001003sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00001002sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000676Esv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006746sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006764sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000675Asv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006372sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006750sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006368sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000673Csv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006732sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006354sv*sd*bc*sc*i* alias:
    >> pci:v000015B3d0000634Asv*sd*bc*sc*i* alias:
    >> pci:v000015B3d00006340sv*sd*bc*sc*i* depends: intree: Y vermagic:
    >> 3.13.0-34-generic SMP mod_unload modversions signer: Magrathea:
    >> Glacier signing key sig_key:
    >> 50:0B:C5:C8:7D:4B:11:5C:F3:C1:50:4F:7A:92:E2:33:C6:14:3D:58
    >> sig_hashalgo: sha512 parm: debug_level:Enable debug tracing if >
    >> 0 (int) parm: msi_x:attempt to use MSI-X if nonzero (int) parm:
    >> num_vfs:enable #num_vfs functions if num_vfs > 0
    >> num_vfs=port1,port2,port1+2 (array of byte) parm: probe_vf:number
    >> of vfs to probe by pf driver (num_vfs > 0)
    >> probe_vf=port1,port2,port1+2 (array of byte) parm:
    >> log_num_mgm_entry_size:log mgm size, that defines the num of qp
    >> per mcg, for example: 10 gives 248.range: 7 <=
    >> log_num_mgm_entry_size <= 12. To activate device managed flow
    >> steering when available, set to -1 (int) parm:
    >> enable_64b_cqe_eqe:Enable 64 byte CQEs/EQEs when the FW supports
    >> this (default: True) (bool) parm: log_num_mac:Log2 max number of
    >> MACs per ETH port (1-7) (int) parm: log_num_vlan:Log2 max number
    >> of VLANs per ETH port (0-7) (int) parm: use_prio:Enable steering
    >> by VLAN priority on ETH ports (0/1, default 0) (bool) parm:
    >> log_mtts_per_seg:Log2 number of MTT entries per segment (1-7)
    >> (int) parm: port_type_array:Array of port types: HW_DEFAULT (0)
    >> is default 1 for IB, 2 for Ethernet (array of int) parm:
    >> enable_qos:Enable Quality of Service support in the HCA (default:
    >> off) (bool) parm: internal_err_reset:Reset device on internal
    >> errors if non-zero (default 1, in SRIOV mode default is 0) (int)
    >>
    >> most likely you installing old ofed which does not have this
    >> parameter:
    >>
    >> try:
    >>
    >> #modinfo mlx4_core
    >>
    >> and see if it is there.  I would suggest install latest OFED or
    >> Mellanox OFED.
    >>
    >>
    >> On Mon, Aug 18, 2014 at 9:53 PM, Rio Yokota <rioyok...@mac.com>
    >> wrote:
    >>
    >>> I get "ofed_info: command not found". Note that I don't install
    >>> the entire OFED, but do a component wise installation by doing
    >>> "apt-get install infiniband-diags ibutils ibverbs-utils
    >>> libmlx4-dev" for the drivers and utilities.
    >>>
    >>> Hi, what ofed version do you use?  (ofed_info -s)
    >>>
    >>>
    >>> On Sun, Aug 17, 2014 at 7:16 PM, Rio Yokota <rioyok...@mac.com>
    >>> wrote:
    >>>
    >>>> I have recently upgraded from Ubuntu 12.04 to 14.04 and OpenMPI
    >>>> gives the following warning upon execution, which did not
    >>>> appear before the upgrade.
    >>>>
    >>>> WARNING: It appears that your OpenFabrics subsystem is
    >>>> configured to only allow registering part of your physical
    >>>> memory. This can cause MPI jobs to run with erratic
    >>>> performance, hang, and/or crash.
    >>>>
    >>>> Everything that I could find on google suggests to change
    >>>> log_num_mtt, but I cannot do this for the following reasons:
    >>>> 1. There is no log_num_mtt in /sys/module/mlx4_core/parameters/
    >>>> 2. Adding "options mlx4_core log_num_mtt=24" to
    >>>> /etc/modprobe.d/mlx4.conf doesn't seem to change anything
    >>>> 3. I am not sure how I can restart the driver because there is
    >>>>    no
    >>>> "/etc/init.d/openibd" file (I've rebooted the system but it
    >>>> didn't do anything to create log_num_mtt)
    >>>>
    >>>> [Template information]
    >>>> 1. OpenFabrics is from the Ubuntu distribution using "apt-get
    >>>>    install
    >>>> infiniband-diags ibutils ibverbs-utils libmlx4-dev"
    >>>> 2. OS is Ubuntu 14.04 LTS
    >>>> 3. Subnet manager is from the Ubuntu distribution using
    >>>>    "apt-get install
    >>>> opensm"
    >>>> 4. Output of ibv_devinfo is:
    >>>> hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.10.600
    >>>> node_guid: 0002:c903:003d:52b0 sys_image_guid:
    >>>> 0002:c903:003d:52b3 vendor_id: 0x02c9 vendor_part_id: 4099
    >>>> hw_ver: 0x0 board_id: MT_1100120019 phys_port_cnt: 1 port: 1
    >>>> state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5)
    >>>> sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: InfiniBand
    >>>> 5. Output of ifconfig for IB is
    >>>> ib0 Link encap:UNSPEC HWaddr
    >>>> 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00 inet
    >>>> addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6
    >>>> addr: fe80::202:c903:3d:52b1/64 Scope:Link UP BROADCAST RUNNING
    >>>> MULTICAST MTU:2044 Metric:1 RX packets:26 errors:0 dropped:0
    >>>> overruns:0 frame:0 TX packets:34 errors:0 dropped:16 overruns:0
    >>>> carrier:0 collisions:0 txqueuelen:256 RX bytes:5843 (5.8 KB) TX
    >>>> bytes:4324 (4.3 KB)
    >>>> 6. ulimit -l is "unlimited"
    >>>>
    >>>> Thanks, Rio _______________________________________________
    >>>> users mailing list us...@open-mpi.org Subscription:

Reply via email to