Dear Roland,

Thank you so much. This was very helpful.

Best,
Rio

>>>>>> "Mike" == Mike Dubman <mi...@dev.mellanox.co.il> writes:
> 
>    Mike> so, it seems you have old ofed w/o this parameter.  Can you
>    Mike> install latest Mellanox ofed? or check which community ofed
>    Mike> has it?
> 
> Rio is using the kernel.org drivers that are part of Ubuntu/3.13.x and
> log_num_mtt is not a parameter in those drivers. In fact log_num_mtt
> has never been a parameter in the kernel.org sources (just checked the
> git commit history). And it's not needed anymore either, since the
> following commit (which is also part of OFED 3.12 btw; Mike, seems
> Mellanox OFED is behind with this respect):
> -----------------------------------------------------------
> commit db5a7a65c05867cb6ff5cb6d556a0edfce631d2d
> Author: Roland Dreier <rol...@purestorage.com>
> Date:   Mon Mar 5 10:05:28 2012 -0800
> 
>    mlx4_core: Scale size of MTT table with system RAM
> 
>    The current driver defaults to 1M MTT segments, where each segment holds
>    8 MTT entries.  This limits the total memory registered to 8M * PAGE_SIZE
>    which is 32GB with 4K pages.  Since systems that have much more memory
>    are pretty common now (at least among systems with InfiniBand hardware),
>    this limit ends up getting hit in practice quite a bit.
> 
>    Handle this by having the driver allocate at least enough MTT entries to
>    cover 2 * totalram pages.
> 
>    Signed-off-by: Roland Dreier <rol...@purestorage.com>
> -----------------------------------------------------------
> 
> The relevant code segment (drivers/net/ethernet/mellanox/mlx4/profile.c):
> 
> -----------------------------------------------------------
>        /*
>         * We want to scale the number of MTTs with the size of the
>         * system memory, since it makes sense to register a lot of
>         * memory on a system with a lot of memory.  As a heuristic,
>         * make sure we have enough MTTs to cover twice the system
>         * memory (with PAGE_SIZE entries).
>         *
>         * This number has to be a power of two and fit into 32 bits
>         * due to device limitations, so cap this at 2^31 as well.
>         * That limits us to 8TB of memory registration per HCA with
>         * 4KB pages, which is probably OK for the next few months.
>         */
>        si_meminfo(&si);
>        request->num_mtt =
>                roundup_pow_of_two(max_t(unsigned, request->num_mtt,
>                                         min(1UL << (31 - log_mtts_per_seg),
>                                             si.totalram >> (log_mtts_per_seg 
> - 1))));
> -----------------------------------------------------------
> 
> So the point here is that OpenMPI should check the mlx4 driver versions
> and not output false warnings when newer drivers are used. Didn't check
> whether this is fixed in the OpenMPI code repositories yet. It's not
> fixed in 1.8.2rc4 anyway (static uint64_t calculate_max_reg in
> ompi/mca/btl/openib/btl_openib.c). Also, the OpenMPI FAQ should be
> corrected accordingly.
> 
> Rio as a note for you: You can safely ignore the warning.
> 
> Cheers,
> 
> Roland
> 
> -------
> http://www.q-leap.com / http://qlustar.com
>          --- HPC / Storage / Cloud Linux Cluster OS ---
> 
>    Mike> On Tue, Aug 19, 2014 at 9:34 AM, Rio Yokota
>    Mike> <rioyok...@mac.com> wrote:
> 
>>> Here is what "modinfo mlx4_core" gives
>>> 
>>> filename:
>>> /lib/modules/3.13.0-34-generic/kernel/drivers/net/ethernet/mellanox/mlx4/mlx4_core.ko
>>> version: 2.2-1 license: Dual BSD/GPL description: Mellanox
>>> ConnectX HCA low-level driver author: Roland Dreier srcversion:
>>> 3AE29A0A6538EBBE9227361 alias:
>>> pci:v000015B3d00001010sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Fsv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Esv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Dsv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Csv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Bsv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000100Asv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001009sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001008sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001007sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001006sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001005sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001004sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001003sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00001002sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000676Esv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006746sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006764sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000675Asv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006372sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006750sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006368sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000673Csv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006732sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006354sv*sd*bc*sc*i* alias:
>>> pci:v000015B3d0000634Asv*sd*bc*sc*i* alias:
>>> pci:v000015B3d00006340sv*sd*bc*sc*i* depends: intree: Y vermagic:
>>> 3.13.0-34-generic SMP mod_unload modversions signer: Magrathea:
>>> Glacier signing key sig_key:
>>> 50:0B:C5:C8:7D:4B:11:5C:F3:C1:50:4F:7A:92:E2:33:C6:14:3D:58
>>> sig_hashalgo: sha512 parm: debug_level:Enable debug tracing if >
>>> 0 (int) parm: msi_x:attempt to use MSI-X if nonzero (int) parm:
>>> num_vfs:enable #num_vfs functions if num_vfs > 0
>>> num_vfs=port1,port2,port1+2 (array of byte) parm: probe_vf:number
>>> of vfs to probe by pf driver (num_vfs > 0)
>>> probe_vf=port1,port2,port1+2 (array of byte) parm:
>>> log_num_mgm_entry_size:log mgm size, that defines the num of qp
>>> per mcg, for example: 10 gives 248.range: 7 <=
>>> log_num_mgm_entry_size <= 12. To activate device managed flow
>>> steering when available, set to -1 (int) parm:
>>> enable_64b_cqe_eqe:Enable 64 byte CQEs/EQEs when the FW supports
>>> this (default: True) (bool) parm: log_num_mac:Log2 max number of
>>> MACs per ETH port (1-7) (int) parm: log_num_vlan:Log2 max number
>>> of VLANs per ETH port (0-7) (int) parm: use_prio:Enable steering
>>> by VLAN priority on ETH ports (0/1, default 0) (bool) parm:
>>> log_mtts_per_seg:Log2 number of MTT entries per segment (1-7)
>>> (int) parm: port_type_array:Array of port types: HW_DEFAULT (0)
>>> is default 1 for IB, 2 for Ethernet (array of int) parm:
>>> enable_qos:Enable Quality of Service support in the HCA (default:
>>> off) (bool) parm: internal_err_reset:Reset device on internal
>>> errors if non-zero (default 1, in SRIOV mode default is 0) (int)
>>> 
>>> most likely you installing old ofed which does not have this
>>> parameter:
>>> 
>>> try:
>>> 
>>> #modinfo mlx4_core
>>> 
>>> and see if it is there.  I would suggest install latest OFED or
>>> Mellanox OFED.
>>> 
>>> 
>>> On Mon, Aug 18, 2014 at 9:53 PM, Rio Yokota <rioyok...@mac.com>
>>> wrote:
>>> 
>>>> I get "ofed_info: command not found". Note that I don't install
>>>> the entire OFED, but do a component wise installation by doing
>>>> "apt-get install infiniband-diags ibutils ibverbs-utils
>>>> libmlx4-dev" for the drivers and utilities.
>>>> 
>>>> Hi, what ofed version do you use?  (ofed_info -s)
>>>> 
>>>> 
>>>> On Sun, Aug 17, 2014 at 7:16 PM, Rio Yokota <rioyok...@mac.com>
>>>> wrote:
>>>> 
>>>>> I have recently upgraded from Ubuntu 12.04 to 14.04 and OpenMPI
>>>>> gives the following warning upon execution, which did not
>>>>> appear before the upgrade.
>>>>> 
>>>>> WARNING: It appears that your OpenFabrics subsystem is
>>>>> configured to only allow registering part of your physical
>>>>> memory. This can cause MPI jobs to run with erratic
>>>>> performance, hang, and/or crash.
>>>>> 
>>>>> Everything that I could find on google suggests to change
>>>>> log_num_mtt, but I cannot do this for the following reasons:
>>>>> 1. There is no log_num_mtt in /sys/module/mlx4_core/parameters/
>>>>> 2. Adding "options mlx4_core log_num_mtt=24" to
>>>>> /etc/modprobe.d/mlx4.conf doesn't seem to change anything
>>>>> 3. I am not sure how I can restart the driver because there is
>>>>>   no
>>>>> "/etc/init.d/openibd" file (I've rebooted the system but it
>>>>> didn't do anything to create log_num_mtt)
>>>>> 
>>>>> [Template information]
>>>>> 1. OpenFabrics is from the Ubuntu distribution using "apt-get
>>>>>   install
>>>>> infiniband-diags ibutils ibverbs-utils libmlx4-dev"
>>>>> 2. OS is Ubuntu 14.04 LTS
>>>>> 3. Subnet manager is from the Ubuntu distribution using
>>>>>   "apt-get install
>>>>> opensm"
>>>>> 4. Output of ibv_devinfo is:
>>>>> hca_id: mlx4_0 transport: InfiniBand (0) fw_ver: 2.10.600
>>>>> node_guid: 0002:c903:003d:52b0 sys_image_guid:
>>>>> 0002:c903:003d:52b3 vendor_id: 0x02c9 vendor_part_id: 4099
>>>>> hw_ver: 0x0 board_id: MT_1100120019 phys_port_cnt: 1 port: 1
>>>>> state: PORT_ACTIVE (4) max_mtu: 4096 (5) active_mtu: 4096 (5)
>>>>> sm_lid: 1 port_lid: 1 port_lmc: 0x00 link_layer: InfiniBand
>>>>> 5. Output of ifconfig for IB is
>>>>> ib0 Link encap:UNSPEC HWaddr
>>>>> 80-00-00-48-FE-80-00-00-00-00-00-00-00-00-00-00 inet
>>>>> addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0 inet6
>>>>> addr: fe80::202:c903:3d:52b1/64 Scope:Link UP BROADCAST RUNNING
>>>>> MULTICAST MTU:2044 Metric:1 RX packets:26 errors:0 dropped:0
>>>>> overruns:0 frame:0 TX packets:34 errors:0 dropped:16 overruns:0
>>>>> carrier:0 collisions:0 txqueuelen:256 RX bytes:5843 (5.8 KB) TX
>>>>> bytes:4324 (4.3 KB)
>>>>> 6. ulimit -l is "unlimited"
>>>>> 
>>>>> Thanks, Rio _______________________________________________
>>>>> users mailing list us...@open-mpi.org Subscription:

Reply via email to