Can you attach the OSDMap (ceph osd getmap -o <mapfile>)? -Sam On Tue, Apr 26, 2016 at 2:07 AM, Henrik Svensson <henrik.svens...@sectra.com > wrote:
> Hi! > > We got a three node CEPH cluster with 10 OSD each. > > We bought 3 new machines with additional 30 disks that should reside in > another location. > Before adding these machines we modified the default CRUSH table. > > After modifying the (default) crush table with these commands the cluster > went down: > > ———————————————— > ceph osd crush add-bucket dc1 datacenter > ceph osd crush add-bucket dc2 datacenter > ceph osd crush add-bucket availo datacenter > ceph osd crush move dc1 root=default > ceph osd crush move lkpsx0120 root=default datacenter=dc1 > ceph osd crush move lkpsx0130 root=default datacenter=dc1 > ceph osd crush move lkpsx0140 root=default datacenter=dc1 > ceph osd crush move dc2 root=default > ceph osd crush move availo root=default > ceph osd crush add-bucket sectra root > ceph osd crush move dc1 root=sectra > ceph osd crush move dc2 root=sectra > ceph osd crush move dc3 root=sectra > ceph osd crush move availo root=sectra > ceph osd crush remove default > ———————————————— > > I tried to revert the CRUSH map but no luck: > > ———————————————— > ceph osd crush add-bucket default root > ceph osd crush move lkpsx0120 root=default > ceph osd crush move lkpsx0130 root=default > ceph osd crush move lkpsx0140 root=default > ceph osd crush remove sectra > ———————————————— > > After trying to restart the cluster (and even the machines) no OSD started > up again. > But ceph osd tree gave this output, stating certain OSD:s are up (but the > processes are not running): > > ———————————————— > # id weight type name up/down reweight > -1 163.8 root default > -2 54.6 host lkpsx0120 > 0 5.46 osd.0 down 0 > 1 5.46 osd.1 down 0 > 2 5.46 osd.2 down 0 > 3 5.46 osd.3 down 0 > 4 5.46 osd.4 down 0 > 5 5.46 osd.5 down 0 > 6 5.46 osd.6 down 0 > 7 5.46 osd.7 down 0 > 8 5.46 osd.8 down 0 > 9 5.46 osd.9 down 0 > -3 54.6 host lkpsx0130 > 10 5.46 osd.10 down 0 > 11 5.46 osd.11 down 0 > 12 5.46 osd.12 down 0 > 13 5.46 osd.13 down 0 > 14 5.46 osd.14 down 0 > 15 5.46 osd.15 down 0 > 16 5.46 osd.16 down 0 > 17 5.46 osd.17 down 0 > 18 5.46 osd.18 up 1 > 19 5.46 osd.19 up 1 > -4 54.6 host lkpsx0140 > 20 5.46 osd.20 up 1 > 21 5.46 osd.21 down 0 > 22 5.46 osd.22 down 0 > 23 5.46 osd.23 down 0 > 24 5.46 osd.24 down 0 > 25 5.46 osd.25 up 1 > 26 5.46 osd.26 up 1 > 27 5.46 osd.27 up 1 > 28 5.46 osd.28 up 1 > 29 5.46 osd.29 up 1 > ———————————————— > > The monitor starts/restarts OK (only one monitor exists). > But when starting one OSD with ceph -w nothing shows. > > Here is the ceph mon_status: > > ———————————————— > { "name": "lkpsx0120", > "rank": 0, > "state": "leader", > "election_epoch": 1, > "quorum": [ > 0], > "outside_quorum": [], > "extra_probe_peers": [], > "sync_provider": [], > "monmap": { "epoch": 4, > "fsid": "9244194a-5e10-47ae-9287-507944612f95", > "modified": "0.000000", > "created": "0.000000", > "mons": [ > { "rank": 0, > "name": "lkpsx0120", > "addr": "10.15.2.120:6789\/0"}]}} > ———————————————— > > Here is the ceph.conf file > > ———————————————— > [global] > fsid = 9244194a-5e10-47ae-9287-507944612f95 > mon_initial_members = lkpsx0120 > mon_host = 10.15.2.120 > #debug osd = 20 > #debug ms = 1 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > filestore_xattr_use_omap = true > osd_crush_chooseleaf_type = 1 > osd_pool_default_size = 2 > public_network = 10.15.2.0/24 > cluster_network = 10.15.4.0/24 > rbd_cache = true > rbd_cache_size = 67108864 > rbd_cache_max_dirty = 50331648 > rbd_cache_target_dirty = 33554432 > rbd_cache_max_dirty_age = 2 > rbd_cache_writethrough_until_flush = true > ———————————————— > > Here is the decompiled crush map: > > ———————————————— > # begin crush map > tunable choose_local_tries 0 > tunable choose_local_fallback_tries 0 > tunable choose_total_tries 50 > tunable chooseleaf_descend_once 1 > > # devices > device 0 osd.0 > device 1 osd.1 > device 2 osd.2 > device 3 osd.3 > device 4 osd.4 > device 5 osd.5 > device 6 osd.6 > device 7 osd.7 > device 8 osd.8 > device 9 osd.9 > device 10 osd.10 > device 11 osd.11 > device 12 osd.12 > device 13 osd.13 > device 14 osd.14 > device 15 osd.15 > device 16 osd.16 > device 17 osd.17 > device 18 osd.18 > device 19 osd.19 > device 20 osd.20 > device 21 osd.21 > device 22 osd.22 > device 23 osd.23 > device 24 osd.24 > device 25 osd.25 > device 26 osd.26 > device 27 osd.27 > device 28 osd.28 > device 29 osd.29 > > # types > type 0 osd > type 1 host > type 2 chassis > type 3 rack > type 4 row > type 5 pdu > type 6 pod > type 7 room > type 8 datacenter > type 9 region > type 10 root > > # buckets > host lkpsx0120 { > id -2 # do not change unnecessarily > # weight 54.600 > alg straw > hash 0 # rjenkins1 > item osd.0 weight 5.460 > item osd.1 weight 5.460 > item osd.2 weight 5.460 > item osd.3 weight 5.460 > item osd.4 weight 5.460 > item osd.5 weight 5.460 > item osd.6 weight 5.460 > item osd.7 weight 5.460 > item osd.8 weight 5.460 > item osd.9 weight 5.460 > } > host lkpsx0130 { > id -3 # do not change unnecessarily > # weight 54.600 > alg straw > hash 0 # rjenkins1 > item osd.10 weight 5.460 > item osd.11 weight 5.460 > item osd.12 weight 5.460 > item osd.13 weight 5.460 > item osd.14 weight 5.460 > item osd.15 weight 5.460 > item osd.16 weight 5.460 > item osd.17 weight 5.460 > item osd.18 weight 5.460 > item osd.19 weight 5.460 > } > host lkpsx0140 { > id -4 # do not change unnecessarily > # weight 54.600 > alg straw > hash 0 # rjenkins1 > item osd.20 weight 5.460 > item osd.21 weight 5.460 > item osd.22 weight 5.460 > item osd.23 weight 5.460 > item osd.24 weight 5.460 > item osd.25 weight 5.460 > item osd.26 weight 5.460 > item osd.27 weight 5.460 > item osd.28 weight 5.460 > item osd.29 weight 5.460 > } > root default { > id -1 # do not change unnecessarily > # weight 163.800 > alg straw > hash 0 # rjenkins1 > item lkpsx0120 weight 54.600 > item lkpsx0130 weight 54.600 > item lkpsx0140 weight 54.600 > } > > # rules > rule replicated_ruleset { > ruleset 0 > type replicated > min_size 1 > max_size 10 > step take default > step chooseleaf firstn 0 type host > step emit > } > > # end crush map > ———————————————— > > Operating system is Debian 8.0 and the CEPH version is 0.80.7 as stated in > the crash log. > > We increased the log level and tried to start osd.1 as an example. All > OSD:s we tried to start experiencing the same problem and dies. > > The log file from OSD 1 (ceph-osd.1.log) can be found here: > https://www.dropbox.com/s/dqunlufh0qtked5/ceph-osd.1.log.zip?dl=0 > > As of now, all systems are down including the KVM-cluster that are > dependent of CEPH. > > Best regards, > Med vänlig hälsning > > Henrik > ------------------------------ > *Henrik Svensson* > OpIT > Sectra AB > Teknikringen 20, 58330 Linköping, Sweden > E-mail: henrik.svens...@sectra.com > Phone: +46 (0)13 352 884 > Cellular: +46 (0)70 395141 > Web: *www.sectra.com* <http://www.sectra.com/medical/> > > ------------------------------ > This message is intended only for the addressee and may contain > information that is > confidential or privileged. Unauthorized use is strictly prohibited and > may be unlawful. > > If you are not the addressee, you should not read, copy, disclose or > otherwise use this > message, except for the purpose of delivery to the addressee. If you have > received > this in error, please delete and advise us immediately. > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com