Hi John:
    Thanks for your reply. Yes, the following is the detail .

ibdev2netdev



mlx4_0 port 1 ==> ib0 (Down)

mlx4_0 port 2 ==> ib1 (Up)



sh show-gids.sh



DEV PORT        INDEX       GID                                       IPv4
                  VER  DEV

---     ----    -----   ---
------------  ---     ---

mlx4_0     1       0
fe80:0000:0000:0000:e41d:2d03:0072:ed71                    v1

mlx4_0     2       0
fe80:0000:0000:0000:e41d:2d03:0072:ed72                    v1

n_gids_found=2


On Thu, Jul 19, 2018 at 6:43 PM John Hearns <hear...@googlemail.com> wrote:

> ms_async_rdma_port_num = 2
>
> Do you have dual port cards?
>
>
> On 19 July 2018 at 11:25, Will Zhao <zhao6...@gmail.com> wrote:
>
>> Hi all:
>>     Has anyone successfully set up ceph with rdma  over IB ?
>>
>> By following the instructions:
>>
>> (https://community.mellanox.com/docs/DOC-2721)
>>
>> (https://community.mellanox.com/docs/DOC-2693)
>>
>> (http://hwchiu.com/2017-05-03-ceph-with-rdma.html)
>>
>>
>>
>> I'm trying to configure CEPH with RDMA feature on environments as follows:
>>
>>
>>
>> CentOS Linux release 7.2.1511 (Core)
>>
>> MLNX_OFED_LINUX-4.4-1.0.0.0:
>>
>> Mellanox Technologies MT27500 Family [ConnectX-3]
>>
>>
>>
>> rping works between all nodes and add these lines to ceph.conf to enable
>> RDMA:
>>
>>
>>
>> public_network = 10.10.121.0/24
>>
>> cluster_network = 10.10.121.0/24
>>
>> ms_type = async+rdma
>>
>> ms_async_rdma_device_name = mlx4_0
>>
>> ms_async_rdma_port_num = 2
>>
>>
>>
>> IB network is using 10.10.121.0/24 addresses and "ibdev2netdev" command
>> shows port 2 is up.
>>
>> Error occurs when running "ceph-deploy --overwrite-conf mon
>> create-initial", ceph-deploy log details:
>>
>>
>>
>> [2018-07-12 17:53:48,943][ceph_deploy.conf][DEBUG ] found configuration
>> file at: /home/user1/.cephdeploy.conf
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ] Invoked (1.5.37):
>> /usr/bin/ceph-deploy --overwrite-conf mon create-initial
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ] ceph-deploy options:
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
>> username                      : None
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
>> verbose                       : False
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
>> overwrite_conf                : True
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]
>> subcommand                    : create-initial
>>
>> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO  ]  quiet
>>               : False
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> cd_conf                       : <ceph_deploy.conf.cephdeploy.Conf object at
>> 0x27e6210>
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> cluster                       : ceph
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> func                          : <function mon at 0x2a7d2a8>
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> ceph_conf                     : None
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> default_release               : False
>>
>> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO  ]
>> keyrings                      : None
>>
>> [2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] Deploying mon, cluster
>> ceph hosts node1
>>
>> [2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] detecting platform for
>> host node1 ...
>>
>> [2018-07-12 17:53:49,005][node1][DEBUG ] connection detected need for sudo
>>
>> [2018-07-12 17:53:49,039][node1][DEBUG ] connected to host: node1
>>
>> [2018-07-12 17:53:49,040][node1][DEBUG ] detect platform information from
>> remote host
>>
>> [2018-07-12 17:53:49,073][node1][DEBUG ] detect machine type
>>
>> [2018-07-12 17:53:49,078][node1][DEBUG ] find the location of an
>> executable
>>
>> [2018-07-12 17:53:49,079][ceph_deploy.mon][INFO  ] distro info: CentOS
>> Linux 7.2.1511 Core
>>
>> [2018-07-12 17:53:49,079][node1][DEBUG ] determining if provided host has
>> same hostname in remote
>>
>> [2018-07-12 17:53:49,079][node1][DEBUG ] get remote short hostname
>>
>> [2018-07-12 17:53:49,080][node1][DEBUG ] deploying mon to node1
>>
>> [2018-07-12 17:53:49,080][node1][DEBUG ] get remote short hostname
>>
>> [2018-07-12 17:53:49,081][node1][DEBUG ] remote hostname: node1
>>
>> [2018-07-12 17:53:49,083][node1][DEBUG ] write cluster configuration to
>> /etc/ceph/{cluster}.conf
>>
>> [2018-07-12 17:53:49,084][node1][DEBUG ] create the mon path if it does
>> not exist
>>
>> [2018-07-12 17:53:49,085][node1][DEBUG ] checking for done path:
>> /var/lib/ceph/mon/ceph-node1/done
>>
>> [2018-07-12 17:53:49,085][node1][DEBUG ] create a done file to avoid
>> re-doing the mon deployment
>>
>> [2018-07-12 17:53:49,086][node1][DEBUG ] create the init path if it does
>> not exist
>>
>> [2018-07-12 17:53:49,089][node1][INFO  ] Running command: sudo systemctl
>> enable ceph.target
>>
>> [2018-07-12 17:53:49,365][node1][INFO  ] Running command: sudo systemctl
>> enable ceph-mon@node1
>>
>> [2018-07-12 17:53:49,588][node1][INFO  ] Running command: sudo systemctl
>> start ceph-mon@node1
>>
>> [2018-07-12 17:53:51,762][node1][INFO  ] Running command: sudo ceph
>> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
>>
>> [2018-07-12 17:53:51,979][node1][DEBUG ]
>> ********************************************************************************
>>
>> [2018-07-12 17:53:51,979][node1][DEBUG ] status for monitor: mon.node1
>>
>> [2018-07-12 17:53:51,980][node1][DEBUG ] {
>>
>> [2018-07-12 17:53:51,980][node1][DEBUG ]   "election_epoch": 3,
>>
>> [2018-07-12 17:53:51,980][node1][DEBUG ]   "extra_probe_peers": [],
>>
>> [2018-07-12 17:53:51,980][node1][DEBUG ]   "feature_map": {
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]     "mon": {
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]       "group": {
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]         "features":
>> "0x1ffddff8eea4fffb",
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]         "num": 1,
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]         "release": "luminous"
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]       }
>>
>> [2018-07-12 17:53:51,981][node1][DEBUG ]     }
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]   },
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]   "features": {
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]     "quorum_con":
>> "2305244844532236283",
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]     "quorum_mon": [
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]       "kraken",
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]       "luminous"
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]     ],
>>
>> [2018-07-12 17:53:51,982][node1][DEBUG ]     "required_con":
>> "153140804152475648",
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]     "required_mon": [
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]       "kraken",
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]       "luminous"
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]     ]
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]   },
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]   "monmap": {
>>
>> [2018-07-12 17:53:51,983][node1][DEBUG ]     "created": "2018-07-12
>> 17:41:24.243749",
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]     "epoch": 1,
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]     "features": {
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]       "optional": [],
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]       "persistent": [
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]         "kraken",
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]         "luminous"
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]       ]
>>
>> [2018-07-12 17:53:51,984][node1][DEBUG ]     },
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]     "fsid":
>> "9317bc6a-ea20-4376-a390-52afa0b81353",
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]     "modified": "2018-07-12
>> 17:41:24.243749",
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]     "mons": [
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]       {
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]         "addr": "
>> 10.10.121.25:6789/0",
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]         "name": "node1",
>>
>> [2018-07-12 17:53:51,985][node1][DEBUG ]         "public_addr": "
>> 10.10.121.25:6789/0",
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]         "rank": 0
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]       }
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]     ]
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]   },
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]   "name": "node1",
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]   "outside_quorum": [],
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]   "quorum": [
>>
>> [2018-07-12 17:53:51,986][node1][DEBUG ]     0
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ]   ],
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ]   "rank": 0,
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ]   "state": "leader",
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ]   "sync_provider": []
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ] }
>>
>> [2018-07-12 17:53:51,987][node1][DEBUG ]
>> ********************************************************************************
>>
>> [2018-07-12 17:53:51,987][node1][INFO  ] monitor: mon.node1 is running
>>
>> [2018-07-12 17:53:51,989][node1][INFO  ] Running command: sudo ceph
>> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
>>
>> [2018-07-12 17:53:52,156][ceph_deploy.mon][INFO  ] processing monitor
>> mon.node1
>>
>> [2018-07-12 17:53:52,194][node1][DEBUG ] connection detected need for sudo
>>
>> [2018-07-12 17:53:52,230][node1][DEBUG ] connected to host: node1
>>
>> [2018-07-12 17:53:52,231][node1][DEBUG ] detect platform information from
>> remote host
>>
>> [2018-07-12 17:53:52,265][node1][DEBUG ] detect machine type
>>
>> [2018-07-12 17:53:52,270][node1][DEBUG ] find the location of an
>> executable
>>
>> [2018-07-12 17:53:52,273][node1][INFO  ] Running command: sudo ceph
>> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status
>>
>> [2018-07-12 17:53:52,439][ceph_deploy.mon][INFO  ] mon.node1 monitor has
>> reached quorum!
>>
>> [2018-07-12 17:53:52,440][ceph_deploy.mon][INFO  ] all initial monitors
>> are running and have formed quorum
>>
>> [2018-07-12 17:53:52,440][ceph_deploy.mon][INFO  ] Running gatherkeys...
>>
>> [2018-07-12 17:53:52,441][ceph_deploy.gatherkeys][INFO  ] Storing keys in
>> temp directory /tmp/tmp8bdYT6
>>
>> [2018-07-12 17:53:52,477][node1][DEBUG ] connection detected need for sudo
>>
>> [2018-07-12 17:53:52,510][node1][DEBUG ] connected to host: node1
>>
>> [2018-07-12 17:53:52,511][node1][DEBUG ] detect platform information from
>> remote host
>>
>> [2018-07-12 17:53:52,552][node1][DEBUG ] detect machine type
>>
>> [2018-07-12 17:53:52,558][node1][DEBUG ] get remote short hostname
>>
>> [2018-07-12 17:53:52,559][node1][DEBUG ] fetch remote file
>>
>> [2018-07-12 17:53:52,562][node1][INFO  ] Running command: sudo
>> /usr/bin/ceph --connect-timeout=25 --cluster=ceph
>> --admin-daemon=/var/run/ceph/ceph-mon.node1.asok mon_status
>>
>> [2018-07-12 17:53:52,731][node1][INFO  ] Running command: sudo
>> /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon.
>> --keyring=/var/lib/ceph/mon/ceph-node1/keyring auth get client.admin
>>
>> [2018-07-12 17:54:18,059][node1][ERROR ] "ceph auth get-or-create for
>> keytype admin returned 1
>>
>> [2018-07-12 17:54:18,059][node1][DEBUG ] Cluster connection interrupted
>> or timed out
>>
>> [2018-07-12 17:54:18,059][node1][ERROR ] Failed to return 'admin' key
>> from host node1
>>
>> [2018-07-12 17:54:18,059][ceph_deploy.gatherkeys][ERROR ] Failed to
>> connect to host:node1
>>
>> [2018-07-12 17:54:18,060][ceph_deploy.gatherkeys][INFO  ] Destroy temp
>> directory /tmp/tmp8bdYT6
>>
>> [2018-07-12 17:54:18,060][ceph_deploy][ERROR ] RuntimeError: Failed to
>> connect any mon
>>
>>
>>
>> ceph-mon service is up but cannot be connected to reach, "ceph -s" also
>> returns same types of error:
>>
>>
>>
>> 2018-07-13 10:44:21.169536 7fa570d4e700  0 monclient(hunting):
>> authenticate timed out after 300
>>
>> 2018-07-13 10:44:21.169579 7fa570d4e700  0 librados: client.admin
>> authentication error (110) Connection timed out
>>
>> [errno 110] error connecting to the cluster
>>
>>
>>
>> I'am running the ceph version 12.2.4 luminous stable, do you guys have
>> any suggestion about this issue?
>>
>>
>>
>> Thx
>>
>>
>> _______________________________________________
>> ceph-users mailing list
>> ceph-users@lists.ceph.com
>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to