Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

Muthusamy Muthiah Tue, 31 Jan 2017 04:48:01 -0800

Hi Greg,

Following are the test outcomes on EC profile ( n = k + m)




1.       Kraken filestore and bluetore with m=1 , recovery does not start .

2.       Jewel filestore and bluestore with m=1 , recovery happens .

3.       Kraken bluestore all default configuration and m=1, no recovery.

4.       Kraken bluestore with m=2 , recovery happens when one OSD is down
and for 2 OSD fails.



So, the issue seems to be on ceph-kraken release. Your views…



Thanks,

Muthu



On 31 January 2017 at 14:18, Muthusamy Muthiah <muthiah.muthus...@gmail.com>
wrote:

> Hi Greg,
>
> Now we could see the same problem exists for kraken-filestore also.
> Attached the requested osdmap and crushmap.
>
> OSD.1 was stopped in this following procedure and OSD map for a PG is
> displayed.
>
> ceph osd dump | grep cdvr_ec
> 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash
> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool
> stripe_width 4128
>
> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap
>
>
> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 /tmp/osdmap
> osdmaptool: osdmap file '/tmp/osdmap'
>  object 'object1' -> 2.2bc -> [20,47,1,36]
>
> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
> osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
> -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20)
>
> [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service
>
> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1
>
>
> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1
> /tmp/osdmap1
> osdmaptool: osdmap file '/tmp/osdmap1'
>  object 'object1' -> 2.2bc -> [20,47,2147483647,36]
>
>
> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1
> osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc)
> -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20)
>
>
> [root@ca-cn2 ~]# ceph osd tree
> 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following
> dangerous and experimental features are enabled: bluestore,rocksdb
> ID WEIGHT    TYPE NAME       UP/DOWN REWEIGHT PRIMARY-AFFINITY
> -1 327.47314 root default
> -2  65.49463     host ca-cn4
>  3   5.45789         osd.3        up  1.00000          1.00000
>  5   5.45789         osd.5        up  1.00000          1.00000
> 10   5.45789         osd.10       up  1.00000          1.00000
> 16   5.45789         osd.16       up  1.00000          1.00000
> 21   5.45789         osd.21       up  1.00000          1.00000
> 27   5.45789         osd.27       up  1.00000          1.00000
> 30   5.45789         osd.30       up  1.00000          1.00000
> 35   5.45789         osd.35       up  1.00000          1.00000
> 42   5.45789         osd.42       up  1.00000          1.00000
> 47   5.45789         osd.47       up  1.00000          1.00000
> 51   5.45789         osd.51       up  1.00000          1.00000
> 53   5.45789         osd.53       up  1.00000          1.00000
> -3  65.49463     host ca-cn3
>  2   5.45789         osd.2        up  1.00000          1.00000
>  6   5.45789         osd.6        up  1.00000          1.00000
> 11   5.45789         osd.11       up  1.00000          1.00000
> 15   5.45789         osd.15       up  1.00000          1.00000
> 20   5.45789         osd.20       up  1.00000          1.00000
> 25   5.45789         osd.25       up  1.00000          1.00000
> 29   5.45789         osd.29       up  1.00000          1.00000
> 33   5.45789         osd.33       up  1.00000          1.00000
> 38   5.45789         osd.38       up  1.00000          1.00000
> 40   5.45789         osd.40       up  1.00000          1.00000
> 45   5.45789         osd.45       up  1.00000          1.00000
> 49   5.45789         osd.49       up  1.00000          1.00000
> -4  65.49463     host ca-cn5
>  0   5.45789         osd.0        up  1.00000          1.00000
>  7   5.45789         osd.7        up  1.00000          1.00000
> 12   5.45789         osd.12       up  1.00000          1.00000
> 17   5.45789         osd.17       up  1.00000          1.00000
> 23   5.45789         osd.23       up  1.00000          1.00000
> 26   5.45789         osd.26       up  1.00000          1.00000
> 32   5.45789         osd.32       up  1.00000          1.00000
> 34   5.45789         osd.34       up  1.00000          1.00000
> 41   5.45789         osd.41       up  1.00000          1.00000
> 46   5.45789         osd.46       up  1.00000          1.00000
> 52   5.45789         osd.52       up  1.00000          1.00000
> 56   5.45789         osd.56       up  1.00000          1.00000
> -5  65.49463     host ca-cn1
>  4   5.45789         osd.4        up  1.00000          1.00000
>  9   5.45789         osd.9        up  1.00000          1.00000
> 14   5.45789         osd.14       up  1.00000          1.00000
> 19   5.45789         osd.19       up  1.00000          1.00000
> 24   5.45789         osd.24       up  1.00000          1.00000
> 36   5.45789         osd.36       up  1.00000          1.00000
> 43   5.45789         osd.43       up  1.00000          1.00000
> 50   5.45789         osd.50       up  1.00000          1.00000
> 55   5.45789         osd.55       up  1.00000          1.00000
> 57   5.45789         osd.57       up  1.00000          1.00000
> 58   5.45789         osd.58       up  1.00000          1.00000
> 59   5.45789         osd.59       up  1.00000          1.00000
> -6  65.49463     host ca-cn2
>  1   5.45789         osd.1      down        0          1.00000
>  8   5.45789         osd.8        up  1.00000          1.00000
> 13   5.45789         osd.13       up  1.00000          1.00000
> 18   5.45789         osd.18       up  1.00000          1.00000
> 22   5.45789         osd.22       up  1.00000          1.00000
> 28   5.45789         osd.28       up  1.00000          1.00000
> 31   5.45789         osd.31       up  1.00000          1.00000
> 37   5.45789         osd.37       up  1.00000          1.00000
> 39   5.45789         osd.39       up  1.00000          1.00000
> 44   5.45789         osd.44       up  1.00000          1.00000
> 48   5.45789         osd.48       up  1.00000          1.00000
> 54   5.45789         osd.54       up  1.00000          1.00000
>
> health HEALTH_ERR
>             69 pgs are stuck inactive for more than 300 seconds
>             69 pgs incomplete
>             69 pgs stuck inactive
>             69 pgs stuck unclean
>             512 requests are blocked > 32 sec
>      monmap e2: 5 mons at {ca-cn1=10.50.5.117:6789/0,ca-
> cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.
> 50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>             election epoch 8, quorum 0,1,2,3,4 ca-cn1,ca-cn2,ca-cn3,ca-cn4,
> ca-cn5
>         mgr active: ca-cn4 standbys: ca-cn2, ca-cn5, ca-cn3, ca-cn1
>      osdmap e406: 60 osds: 59 up, 59 in; 69 remapped pgs
>             flags sortbitwise,require_jewel_osds,require_kraken_osds
>       pgmap v23018: 1024 pgs, 1 pools, 3892 GB data, 7910 kobjects
>             6074 GB used, 316 TB / 322 TB avail
>                  955 active+clean
>                   69 remapped+incomplete
>
> Thanks,
> Muthu
>
>
> On 31 January 2017 at 02:54, Gregory Farnum <gfar...@redhat.com> wrote:
>
>> You might also check out "ceph osd tree" and crush dump and make sure
>> they look the way you expect.
>>
>> On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum <gfar...@redhat.com>
>> wrote:
>> > On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah
>> > <muthiah.muthus...@gmail.com> wrote:
>> >> Hi All,
>> >>
>> >> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled  .
>> When
>> >> an OSD is down the cluster goes to ERROR state even when the cluster
>> is n+1
>> >> . No recovery happening.
>> >>
>> >> health HEALTH_ERR
>> >>             75 pgs are stuck inactive for more than 300 seconds
>> >>             75 pgs incomplete
>> >>             75 pgs stuck inactive
>> >>             75 pgs stuck unclean
>> >>      monmap e2: 5 mons at
>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=
>> 10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>> >>             election epoch 10, quorum 0,1,2,3,4
>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>> >>         mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2
>> >>      osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs
>> >>             flags sortbitwise,require_jewel_osds,require_kraken_osds
>> >>       pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects
>> >>             39976 GB used, 282 TB / 322 TB avail
>> >>                  941 active+clean
>> >>                   75 remapped+incomplete
>> >>                    8 active+clean+scrubbing
>> >>
>> >> this seems to be an issue with bluestore , recovery not happening
>> properly
>> >> with EC .
>> >
>> > It's possible but it seems a lot more likely this is some kind of
>> > config issue. Can you share your osd map ("ceph osd getmap")?
>> > -Greg
>> >
>> >>
>> >> Thanks,
>> >> Muthu
>> >>
>> >> On 24 January 2017 at 12:57, Muthusamy Muthiah <
>> muthiah.muthus...@gmail.com>
>> >> wrote:
>> >>>
>> >>> Hi Greg,
>> >>>
>> >>> We use EC:4+1 on 5 node cluster in production deployments with
>> filestore
>> >>> and it does recovery and peering when one OSD goes down. After few
>> mins ,
>> >>> other OSD from a node where the fault OSD exists will take over the
>> PGs
>> >>> temporarily and all PGs goes to active + clean state . Cluster also
>> does not
>> >>> goes down during this recovery process.
>> >>>
>> >>> Only on bluestore we see cluster going to error state when one OSD is
>> >>> down.
>> >>> We are still validating this and let you know additional findings.
>> >>>
>> >>> Thanks,
>> >>> Muthu
>> >>>
>> >>> On 21 January 2017 at 02:06, Shinobu Kinjo <ski...@redhat.com> wrote:
>> >>>>
>> >>>> `ceph pg dump` should show you something like:
>> >>>>
>> >>>>  * active+undersized+degraded ... [NONE,3,2,4,1]    3
>> [NONE,3,2,4,1]
>> >>>>
>> >>>> Sam,
>> >>>>
>> >>>> Am I wrong? Or is it up to something else?
>> >>>>
>> >>>>
>> >>>> On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum <gfar...@redhat.com>
>> >>>> wrote:
>> >>>> > I'm pretty sure the default configs won't let an EC PG go active
>> with
>> >>>> > only "k" OSDs in its PG; it needs at least k+1 (or possibly more?
>> Not
>> >>>> > certain). Running an "n+1" EC config is just not a good idea.
>> >>>> > For testing you could probably adjust this with the equivalent of
>> >>>> > min_size for EC pools, but I don't know the parameters off the top
>> of
>> >>>> > my head.
>> >>>> > -Greg
>> >>>> >
>> >>>> > On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah
>> >>>> > <muthiah.muthus...@gmail.com> wrote:
>> >>>> >> Hi ,
>> >>>> >>
>> >>>> >> We are validating kraken 11.2.0 with bluestore  on 5 node cluster
>> with
>> >>>> >> EC
>> >>>> >> 4+1.
>> >>>> >>
>> >>>> >> When an OSD is down , the peering is not happening and ceph health
>> >>>> >> status
>> >>>> >> moved to ERR state after few mins. This was working in previous
>> >>>> >> development
>> >>>> >> releases. Any additional configuration required in v11.2.0
>> >>>> >>
>> >>>> >> Following is our ceph configuration:
>> >>>> >>
>> >>>> >> mon_osd_down_out_interval = 30
>> >>>> >> mon_osd_report_timeout = 30
>> >>>> >> mon_osd_down_out_subtree_limit = host
>> >>>> >> mon_osd_reporter_subtree_level = host
>> >>>> >>
>> >>>> >> and the recovery parameters set to default.
>> >>>> >>
>> >>>> >> [root@ca-cn1 ceph]# ceph osd crush show-tunables
>> >>>> >>
>> >>>> >> {
>> >>>> >>     "choose_local_tries": 0,
>> >>>> >>     "choose_local_fallback_tries": 0,
>> >>>> >>     "choose_total_tries": 50,
>> >>>> >>     "chooseleaf_descend_once": 1,
>> >>>> >>     "chooseleaf_vary_r": 1,
>> >>>> >>     "chooseleaf_stable": 1,
>> >>>> >>     "straw_calc_version": 1,
>> >>>> >>     "allowed_bucket_algs": 54,
>> >>>> >>     "profile": "jewel",
>> >>>> >>     "optimal_tunables": 1,
>> >>>> >>     "legacy_tunables": 0,
>> >>>> >>     "minimum_required_version": "jewel",
>> >>>> >>     "require_feature_tunables": 1,
>> >>>> >>     "require_feature_tunables2": 1,
>> >>>> >>     "has_v2_rules": 1,
>> >>>> >>     "require_feature_tunables3": 1,
>> >>>> >>     "has_v3_rules": 0,
>> >>>> >>     "has_v4_buckets": 0,
>> >>>> >>     "require_feature_tunables5": 1,
>> >>>> >>     "has_v5_rules": 0
>> >>>> >> }
>> >>>> >>
>> >>>> >> ceph status:
>> >>>> >>
>> >>>> >>      health HEALTH_ERR
>> >>>> >>             173 pgs are stuck inactive for more than 300 seconds
>> >>>> >>             173 pgs incomplete
>> >>>> >>             173 pgs stuck inactive
>> >>>> >>             173 pgs stuck unclean
>> >>>> >>      monmap e2: 5 mons at
>> >>>> >>
>> >>>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3=
>> 10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0}
>> >>>> >>             election epoch 106, quorum 0,1,2,3,4
>> >>>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5
>> >>>> >>         mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5,
>> ca-cn3
>> >>>> >>      osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs
>> >>>> >>             flags sortbitwise,require_jewel_osds
>> ,require_kraken_osds
>> >>>> >>       pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293
>> kobjects
>> >>>> >>             85199 GB used, 238 TB / 322 TB avail
>> >>>> >>                 1868 active+clean
>> >>>> >>                  173 remapped+incomplete
>> >>>> >>                    7 active+clean+scrubbing
>> >>>> >>
>> >>>> >> MON log:
>> >>>> >>
>> >>>> >> 2017-01-20 09:25:54.715684 7f55bcafb700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> osd.54 out (down for 31.703786)
>> >>>> >> 2017-01-20 09:25:54.725688 7f55bf4d5700  0 mon.ca-cn1@0
>> (leader).osd
>> >>>> >> e1120
>> >>>> >> crush map has features 288250512065953792, adjusting msgr requires
>> >>>> >> 2017-01-20 09:25:54.729019 7f55bf4d5700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> osdmap e1120: 60 osds: 59 up, 59 in
>> >>>> >> 2017-01-20 09:25:54.735987 7f55bf4d5700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6
>> >>>> >> active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB /
>> 322 TB
>> >>>> >> avail;
>> >>>> >> 21825 B/s rd, 163 MB/s wr, 2046 op/s
>> >>>> >> 2017-01-20 09:25:55.737749 7f55bf4d5700  0 mon.ca-cn1@0
>> (leader).osd
>> >>>> >> e1121
>> >>>> >> crush map has features 288250512065953792, adjusting msgr requires
>> >>>> >> 2017-01-20 09:25:55.744338 7f55bf4d5700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> osdmap e1121: 60 osds: 59 up, 59 in
>> >>>> >> 2017-01-20 09:25:55.749616 7f55bf4d5700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869
>> active+clean,
>> >>>> >> 144
>> >>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB
>> used,
>> >>>> >> 238 TB /
>> >>>> >> 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s
>> >>>> >> 2017-01-20 09:25:56.768721 7f55bf4d5700  0 log_channel(cluster)
>> log
>> >>>> >> [INF] :
>> >>>> >> pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869
>> active+clean,
>> >>>> >> 126
>> >>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB
>> used,
>> >>>> >> 238 TB /
>> >>>> >> 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s
>> >>>> >>
>> >>>> >> Thanks,
>> >>>> >> Muthu
>> >>>> >>
>> >>>> >>
>> >>>> >> _______________________________________________
>> >>>> >> ceph-users mailing list
>> >>>> >> ceph-users@lists.ceph.com
>> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>> >>
>> >>>> > _______________________________________________
>> >>>> > ceph-users mailing list
>> >>>> > ceph-users@lists.ceph.com
>> >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>>
>> >>>
>> >>
>> >>
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>> >>
>>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] Bluestore: v11.2.0 peering not happening when OSD is down

Reply via email to