Hi John: Thanks for your reply. Yes, the following is the detail . ibdev2netdev
mlx4_0 port 1 ==> ib0 (Down) mlx4_0 port 2 ==> ib1 (Up) sh show-gids.sh DEV PORT INDEX GID IPv4 VER DEV --- ---- ----- --- ------------ --- --- mlx4_0 1 0 fe80:0000:0000:0000:e41d:2d03:0072:ed71 v1 mlx4_0 2 0 fe80:0000:0000:0000:e41d:2d03:0072:ed72 v1 n_gids_found=2 On Thu, Jul 19, 2018 at 6:43 PM John Hearns <hear...@googlemail.com> wrote: > ms_async_rdma_port_num = 2 > > Do you have dual port cards? > > > On 19 July 2018 at 11:25, Will Zhao <zhao6...@gmail.com> wrote: > >> Hi all: >> Has anyone successfully set up ceph with rdma over IB ? >> >> By following the instructions: >> >> (https://community.mellanox.com/docs/DOC-2721) >> >> (https://community.mellanox.com/docs/DOC-2693) >> >> (http://hwchiu.com/2017-05-03-ceph-with-rdma.html) >> >> >> >> I'm trying to configure CEPH with RDMA feature on environments as follows: >> >> >> >> CentOS Linux release 7.2.1511 (Core) >> >> MLNX_OFED_LINUX-4.4-1.0.0.0: >> >> Mellanox Technologies MT27500 Family [ConnectX-3] >> >> >> >> rping works between all nodes and add these lines to ceph.conf to enable >> RDMA: >> >> >> >> public_network = 10.10.121.0/24 >> >> cluster_network = 10.10.121.0/24 >> >> ms_type = async+rdma >> >> ms_async_rdma_device_name = mlx4_0 >> >> ms_async_rdma_port_num = 2 >> >> >> >> IB network is using 10.10.121.0/24 addresses and "ibdev2netdev" command >> shows port 2 is up. >> >> Error occurs when running "ceph-deploy --overwrite-conf mon >> create-initial", ceph-deploy log details: >> >> >> >> [2018-07-12 17:53:48,943][ceph_deploy.conf][DEBUG ] found configuration >> file at: /home/user1/.cephdeploy.conf >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] Invoked (1.5.37): >> /usr/bin/ceph-deploy --overwrite-conf mon create-initial >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] ceph-deploy options: >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] >> username : None >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] >> verbose : False >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] >> overwrite_conf : True >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] >> subcommand : create-initial >> >> [2018-07-12 17:53:48,944][ceph_deploy.cli][INFO ] quiet >> : False >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> cd_conf : <ceph_deploy.conf.cephdeploy.Conf object at >> 0x27e6210> >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> cluster : ceph >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> func : <function mon at 0x2a7d2a8> >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> ceph_conf : None >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> default_release : False >> >> [2018-07-12 17:53:48,945][ceph_deploy.cli][INFO ] >> keyrings : None >> >> [2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] Deploying mon, cluster >> ceph hosts node1 >> >> [2018-07-12 17:53:48,947][ceph_deploy.mon][DEBUG ] detecting platform for >> host node1 ... >> >> [2018-07-12 17:53:49,005][node1][DEBUG ] connection detected need for sudo >> >> [2018-07-12 17:53:49,039][node1][DEBUG ] connected to host: node1 >> >> [2018-07-12 17:53:49,040][node1][DEBUG ] detect platform information from >> remote host >> >> [2018-07-12 17:53:49,073][node1][DEBUG ] detect machine type >> >> [2018-07-12 17:53:49,078][node1][DEBUG ] find the location of an >> executable >> >> [2018-07-12 17:53:49,079][ceph_deploy.mon][INFO ] distro info: CentOS >> Linux 7.2.1511 Core >> >> [2018-07-12 17:53:49,079][node1][DEBUG ] determining if provided host has >> same hostname in remote >> >> [2018-07-12 17:53:49,079][node1][DEBUG ] get remote short hostname >> >> [2018-07-12 17:53:49,080][node1][DEBUG ] deploying mon to node1 >> >> [2018-07-12 17:53:49,080][node1][DEBUG ] get remote short hostname >> >> [2018-07-12 17:53:49,081][node1][DEBUG ] remote hostname: node1 >> >> [2018-07-12 17:53:49,083][node1][DEBUG ] write cluster configuration to >> /etc/ceph/{cluster}.conf >> >> [2018-07-12 17:53:49,084][node1][DEBUG ] create the mon path if it does >> not exist >> >> [2018-07-12 17:53:49,085][node1][DEBUG ] checking for done path: >> /var/lib/ceph/mon/ceph-node1/done >> >> [2018-07-12 17:53:49,085][node1][DEBUG ] create a done file to avoid >> re-doing the mon deployment >> >> [2018-07-12 17:53:49,086][node1][DEBUG ] create the init path if it does >> not exist >> >> [2018-07-12 17:53:49,089][node1][INFO ] Running command: sudo systemctl >> enable ceph.target >> >> [2018-07-12 17:53:49,365][node1][INFO ] Running command: sudo systemctl >> enable ceph-mon@node1 >> >> [2018-07-12 17:53:49,588][node1][INFO ] Running command: sudo systemctl >> start ceph-mon@node1 >> >> [2018-07-12 17:53:51,762][node1][INFO ] Running command: sudo ceph >> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status >> >> [2018-07-12 17:53:51,979][node1][DEBUG ] >> ******************************************************************************** >> >> [2018-07-12 17:53:51,979][node1][DEBUG ] status for monitor: mon.node1 >> >> [2018-07-12 17:53:51,980][node1][DEBUG ] { >> >> [2018-07-12 17:53:51,980][node1][DEBUG ] "election_epoch": 3, >> >> [2018-07-12 17:53:51,980][node1][DEBUG ] "extra_probe_peers": [], >> >> [2018-07-12 17:53:51,980][node1][DEBUG ] "feature_map": { >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] "mon": { >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] "group": { >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] "features": >> "0x1ffddff8eea4fffb", >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] "num": 1, >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] "release": "luminous" >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] } >> >> [2018-07-12 17:53:51,981][node1][DEBUG ] } >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] }, >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "features": { >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "quorum_con": >> "2305244844532236283", >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "quorum_mon": [ >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "kraken", >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "luminous" >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] ], >> >> [2018-07-12 17:53:51,982][node1][DEBUG ] "required_con": >> "153140804152475648", >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] "required_mon": [ >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] "kraken", >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] "luminous" >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] ] >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] }, >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] "monmap": { >> >> [2018-07-12 17:53:51,983][node1][DEBUG ] "created": "2018-07-12 >> 17:41:24.243749", >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "epoch": 1, >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "features": { >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "optional": [], >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "persistent": [ >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "kraken", >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] "luminous" >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] ] >> >> [2018-07-12 17:53:51,984][node1][DEBUG ] }, >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "fsid": >> "9317bc6a-ea20-4376-a390-52afa0b81353", >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "modified": "2018-07-12 >> 17:41:24.243749", >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "mons": [ >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] { >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "addr": " >> 10.10.121.25:6789/0", >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "name": "node1", >> >> [2018-07-12 17:53:51,985][node1][DEBUG ] "public_addr": " >> 10.10.121.25:6789/0", >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] "rank": 0 >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] } >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] ] >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] }, >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] "name": "node1", >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] "outside_quorum": [], >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] "quorum": [ >> >> [2018-07-12 17:53:51,986][node1][DEBUG ] 0 >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] ], >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] "rank": 0, >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] "state": "leader", >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] "sync_provider": [] >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] } >> >> [2018-07-12 17:53:51,987][node1][DEBUG ] >> ******************************************************************************** >> >> [2018-07-12 17:53:51,987][node1][INFO ] monitor: mon.node1 is running >> >> [2018-07-12 17:53:51,989][node1][INFO ] Running command: sudo ceph >> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status >> >> [2018-07-12 17:53:52,156][ceph_deploy.mon][INFO ] processing monitor >> mon.node1 >> >> [2018-07-12 17:53:52,194][node1][DEBUG ] connection detected need for sudo >> >> [2018-07-12 17:53:52,230][node1][DEBUG ] connected to host: node1 >> >> [2018-07-12 17:53:52,231][node1][DEBUG ] detect platform information from >> remote host >> >> [2018-07-12 17:53:52,265][node1][DEBUG ] detect machine type >> >> [2018-07-12 17:53:52,270][node1][DEBUG ] find the location of an >> executable >> >> [2018-07-12 17:53:52,273][node1][INFO ] Running command: sudo ceph >> --cluster=ceph --admin-daemon /var/run/ceph/ceph-mon.node1.asok mon_status >> >> [2018-07-12 17:53:52,439][ceph_deploy.mon][INFO ] mon.node1 monitor has >> reached quorum! >> >> [2018-07-12 17:53:52,440][ceph_deploy.mon][INFO ] all initial monitors >> are running and have formed quorum >> >> [2018-07-12 17:53:52,440][ceph_deploy.mon][INFO ] Running gatherkeys... >> >> [2018-07-12 17:53:52,441][ceph_deploy.gatherkeys][INFO ] Storing keys in >> temp directory /tmp/tmp8bdYT6 >> >> [2018-07-12 17:53:52,477][node1][DEBUG ] connection detected need for sudo >> >> [2018-07-12 17:53:52,510][node1][DEBUG ] connected to host: node1 >> >> [2018-07-12 17:53:52,511][node1][DEBUG ] detect platform information from >> remote host >> >> [2018-07-12 17:53:52,552][node1][DEBUG ] detect machine type >> >> [2018-07-12 17:53:52,558][node1][DEBUG ] get remote short hostname >> >> [2018-07-12 17:53:52,559][node1][DEBUG ] fetch remote file >> >> [2018-07-12 17:53:52,562][node1][INFO ] Running command: sudo >> /usr/bin/ceph --connect-timeout=25 --cluster=ceph >> --admin-daemon=/var/run/ceph/ceph-mon.node1.asok mon_status >> >> [2018-07-12 17:53:52,731][node1][INFO ] Running command: sudo >> /usr/bin/ceph --connect-timeout=25 --cluster=ceph --name mon. >> --keyring=/var/lib/ceph/mon/ceph-node1/keyring auth get client.admin >> >> [2018-07-12 17:54:18,059][node1][ERROR ] "ceph auth get-or-create for >> keytype admin returned 1 >> >> [2018-07-12 17:54:18,059][node1][DEBUG ] Cluster connection interrupted >> or timed out >> >> [2018-07-12 17:54:18,059][node1][ERROR ] Failed to return 'admin' key >> from host node1 >> >> [2018-07-12 17:54:18,059][ceph_deploy.gatherkeys][ERROR ] Failed to >> connect to host:node1 >> >> [2018-07-12 17:54:18,060][ceph_deploy.gatherkeys][INFO ] Destroy temp >> directory /tmp/tmp8bdYT6 >> >> [2018-07-12 17:54:18,060][ceph_deploy][ERROR ] RuntimeError: Failed to >> connect any mon >> >> >> >> ceph-mon service is up but cannot be connected to reach, "ceph -s" also >> returns same types of error: >> >> >> >> 2018-07-13 10:44:21.169536 7fa570d4e700 0 monclient(hunting): >> authenticate timed out after 300 >> >> 2018-07-13 10:44:21.169579 7fa570d4e700 0 librados: client.admin >> authentication error (110) Connection timed out >> >> [errno 110] error connecting to the cluster >> >> >> >> I'am running the ceph version 12.2.4 luminous stable, do you guys have >> any suggestion about this issue? >> >> >> >> Thx >> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com