Hi Greg, the problem is in kraken, when a pool is created with EC profile , min_size equals erasure size.
For 3+1 profile , following is the pool status , pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool stripe_width 4128 For 4+1 profile: pool 5 'cdvr_ec' erasure size 5 min_size 5 crush_ruleset 1 object_hash rjenkins pg_num 4096 pgp_num 4096 For 3+2 profile : pool 3 'cdvr_ec' erasure size 5 min_size 4 crush_ruleset 1 object_hash rjenkins pg_num 1024 pgp_num 1024 last_change 412 flags hashpspool stripe_width 4128 Where as on Jewel release for EC 4+1: pool 30 'cdvr_ec' *erasure size 5 min_size 4* crush_ruleset 1 object_hash rjenkins pg_num 4096 pgp_num 4096 Trying to modify min_size and verify the status. Is there any reason behind this change in ceph kraken or a bug. Thanks, Muthu On 31 January 2017 at 18:17, Muthusamy Muthiah <muthiah.muthus...@gmail.com> wrote: > Hi Greg, > > Following are the test outcomes on EC profile ( n = k + m) > > > > 1. Kraken filestore and bluetore with m=1 , recovery does not start > . > > 2. Jewel filestore and bluestore with m=1 , recovery happens . > > 3. Kraken bluestore all default configuration and m=1, no recovery. > > 4. Kraken bluestore with m=2 , recovery happens when one OSD is > down and for 2 OSD fails. > > > > So, the issue seems to be on ceph-kraken release. Your views… > > > > Thanks, > > Muthu > > > > On 31 January 2017 at 14:18, Muthusamy Muthiah < > muthiah.muthus...@gmail.com> wrote: > >> Hi Greg, >> >> Now we could see the same problem exists for kraken-filestore also. >> Attached the requested osdmap and crushmap. >> >> OSD.1 was stopped in this following procedure and OSD map for a PG is >> displayed. >> >> ceph osd dump | grep cdvr_ec >> 2017-01-31 08:39:44.827079 7f323d66c700 -1 WARNING: the following >> dangerous and experimental features are enabled: bluestore,rocksdb >> 2017-01-31 08:39:44.848901 7f323d66c700 -1 WARNING: the following >> dangerous and experimental features are enabled: bluestore,rocksdb >> pool 2 'cdvr_ec' erasure size 4 min_size 4 crush_ruleset 1 object_hash >> rjenkins pg_num 1024 pgp_num 1024 last_change 234 flags hashpspool >> stripe_width 4128 >> >> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap >> >> >> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 >> /tmp/osdmap >> osdmaptool: osdmap file '/tmp/osdmap' >> object 'object1' -> 2.2bc -> [20,47,1,36] >> >> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1 >> osdmap e402 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc) >> -> up ([20,47,1,36], p20) acting ([20,47,1,36], p20) >> >> [root@ca-cn2 ~]# systemctl stop ceph-osd@1.service >> >> [root@ca-cn2 ~]# ceph osd getmap -o /tmp/osdmap1 >> >> >> [root@ca-cn2 ~]# osdmaptool --pool 2 --test-map-object object1 >> /tmp/osdmap1 >> osdmaptool: osdmap file '/tmp/osdmap1' >> object 'object1' -> 2.2bc -> [20,47,2147483647,36] >> >> >> [root@ca-cn2 ~]# ceph osd map cdvr_ec object1 >> osdmap e406 pool 'cdvr_ec' (2) object 'object1' -> pg 2.bac5debc (2.2bc) >> -> up ([20,47,39,36], p20) acting ([20,47,NONE,36], p20) >> >> >> [root@ca-cn2 ~]# ceph osd tree >> 2017-01-31 08:42:19.606876 7f4ed856a700 -1 WARNING: the following >> dangerous and experimental features are enabled: bluestore,rocksdb >> 2017-01-31 08:42:19.628358 7f4ed856a700 -1 WARNING: the following >> dangerous and experimental features are enabled: bluestore,rocksdb >> ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY >> -1 327.47314 root default >> -2 65.49463 host ca-cn4 >> 3 5.45789 osd.3 up 1.00000 1.00000 >> 5 5.45789 osd.5 up 1.00000 1.00000 >> 10 5.45789 osd.10 up 1.00000 1.00000 >> 16 5.45789 osd.16 up 1.00000 1.00000 >> 21 5.45789 osd.21 up 1.00000 1.00000 >> 27 5.45789 osd.27 up 1.00000 1.00000 >> 30 5.45789 osd.30 up 1.00000 1.00000 >> 35 5.45789 osd.35 up 1.00000 1.00000 >> 42 5.45789 osd.42 up 1.00000 1.00000 >> 47 5.45789 osd.47 up 1.00000 1.00000 >> 51 5.45789 osd.51 up 1.00000 1.00000 >> 53 5.45789 osd.53 up 1.00000 1.00000 >> -3 65.49463 host ca-cn3 >> 2 5.45789 osd.2 up 1.00000 1.00000 >> 6 5.45789 osd.6 up 1.00000 1.00000 >> 11 5.45789 osd.11 up 1.00000 1.00000 >> 15 5.45789 osd.15 up 1.00000 1.00000 >> 20 5.45789 osd.20 up 1.00000 1.00000 >> 25 5.45789 osd.25 up 1.00000 1.00000 >> 29 5.45789 osd.29 up 1.00000 1.00000 >> 33 5.45789 osd.33 up 1.00000 1.00000 >> 38 5.45789 osd.38 up 1.00000 1.00000 >> 40 5.45789 osd.40 up 1.00000 1.00000 >> 45 5.45789 osd.45 up 1.00000 1.00000 >> 49 5.45789 osd.49 up 1.00000 1.00000 >> -4 65.49463 host ca-cn5 >> 0 5.45789 osd.0 up 1.00000 1.00000 >> 7 5.45789 osd.7 up 1.00000 1.00000 >> 12 5.45789 osd.12 up 1.00000 1.00000 >> 17 5.45789 osd.17 up 1.00000 1.00000 >> 23 5.45789 osd.23 up 1.00000 1.00000 >> 26 5.45789 osd.26 up 1.00000 1.00000 >> 32 5.45789 osd.32 up 1.00000 1.00000 >> 34 5.45789 osd.34 up 1.00000 1.00000 >> 41 5.45789 osd.41 up 1.00000 1.00000 >> 46 5.45789 osd.46 up 1.00000 1.00000 >> 52 5.45789 osd.52 up 1.00000 1.00000 >> 56 5.45789 osd.56 up 1.00000 1.00000 >> -5 65.49463 host ca-cn1 >> 4 5.45789 osd.4 up 1.00000 1.00000 >> 9 5.45789 osd.9 up 1.00000 1.00000 >> 14 5.45789 osd.14 up 1.00000 1.00000 >> 19 5.45789 osd.19 up 1.00000 1.00000 >> 24 5.45789 osd.24 up 1.00000 1.00000 >> 36 5.45789 osd.36 up 1.00000 1.00000 >> 43 5.45789 osd.43 up 1.00000 1.00000 >> 50 5.45789 osd.50 up 1.00000 1.00000 >> 55 5.45789 osd.55 up 1.00000 1.00000 >> 57 5.45789 osd.57 up 1.00000 1.00000 >> 58 5.45789 osd.58 up 1.00000 1.00000 >> 59 5.45789 osd.59 up 1.00000 1.00000 >> -6 65.49463 host ca-cn2 >> 1 5.45789 osd.1 down 0 1.00000 >> 8 5.45789 osd.8 up 1.00000 1.00000 >> 13 5.45789 osd.13 up 1.00000 1.00000 >> 18 5.45789 osd.18 up 1.00000 1.00000 >> 22 5.45789 osd.22 up 1.00000 1.00000 >> 28 5.45789 osd.28 up 1.00000 1.00000 >> 31 5.45789 osd.31 up 1.00000 1.00000 >> 37 5.45789 osd.37 up 1.00000 1.00000 >> 39 5.45789 osd.39 up 1.00000 1.00000 >> 44 5.45789 osd.44 up 1.00000 1.00000 >> 48 5.45789 osd.48 up 1.00000 1.00000 >> 54 5.45789 osd.54 up 1.00000 1.00000 >> >> health HEALTH_ERR >> 69 pgs are stuck inactive for more than 300 seconds >> 69 pgs incomplete >> 69 pgs stuck inactive >> 69 pgs stuck unclean >> 512 requests are blocked > 32 sec >> monmap e2: 5 mons at {ca-cn1=10.50.5.117:6789/0,ca- >> cn2=10.50.5.118:6789/0,ca-cn3=10.50.5.119:6789/0,ca-cn4=10.5 >> 0.5.120:6789/0,ca-cn5=10.50.5.121:6789/0} >> election epoch 8, quorum 0,1,2,3,4 >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5 >> mgr active: ca-cn4 standbys: ca-cn2, ca-cn5, ca-cn3, ca-cn1 >> osdmap e406: 60 osds: 59 up, 59 in; 69 remapped pgs >> flags sortbitwise,require_jewel_osds,require_kraken_osds >> pgmap v23018: 1024 pgs, 1 pools, 3892 GB data, 7910 kobjects >> 6074 GB used, 316 TB / 322 TB avail >> 955 active+clean >> 69 remapped+incomplete >> >> Thanks, >> Muthu >> >> >> On 31 January 2017 at 02:54, Gregory Farnum <gfar...@redhat.com> wrote: >> >>> You might also check out "ceph osd tree" and crush dump and make sure >>> they look the way you expect. >>> >>> On Mon, Jan 30, 2017 at 1:23 PM, Gregory Farnum <gfar...@redhat.com> >>> wrote: >>> > On Sun, Jan 29, 2017 at 6:40 AM, Muthusamy Muthiah >>> > <muthiah.muthus...@gmail.com> wrote: >>> >> Hi All, >>> >> >>> >> Also tried EC profile 3+1 on 5 node cluster with bluestore enabled . >>> When >>> >> an OSD is down the cluster goes to ERROR state even when the cluster >>> is n+1 >>> >> . No recovery happening. >>> >> >>> >> health HEALTH_ERR >>> >> 75 pgs are stuck inactive for more than 300 seconds >>> >> 75 pgs incomplete >>> >> 75 pgs stuck inactive >>> >> 75 pgs stuck unclean >>> >> monmap e2: 5 mons at >>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3= >>> 10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0} >>> >> election epoch 10, quorum 0,1,2,3,4 >>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5 >>> >> mgr active: ca-cn1 standbys: ca-cn4, ca-cn3, ca-cn5, ca-cn2 >>> >> osdmap e264: 60 osds: 59 up, 59 in; 75 remapped pgs >>> >> flags sortbitwise,require_jewel_osds,require_kraken_osds >>> >> pgmap v119402: 1024 pgs, 1 pools, 28519 GB data, 21548 kobjects >>> >> 39976 GB used, 282 TB / 322 TB avail >>> >> 941 active+clean >>> >> 75 remapped+incomplete >>> >> 8 active+clean+scrubbing >>> >> >>> >> this seems to be an issue with bluestore , recovery not happening >>> properly >>> >> with EC . >>> > >>> > It's possible but it seems a lot more likely this is some kind of >>> > config issue. Can you share your osd map ("ceph osd getmap")? >>> > -Greg >>> > >>> >> >>> >> Thanks, >>> >> Muthu >>> >> >>> >> On 24 January 2017 at 12:57, Muthusamy Muthiah < >>> muthiah.muthus...@gmail.com> >>> >> wrote: >>> >>> >>> >>> Hi Greg, >>> >>> >>> >>> We use EC:4+1 on 5 node cluster in production deployments with >>> filestore >>> >>> and it does recovery and peering when one OSD goes down. After few >>> mins , >>> >>> other OSD from a node where the fault OSD exists will take over the >>> PGs >>> >>> temporarily and all PGs goes to active + clean state . Cluster also >>> does not >>> >>> goes down during this recovery process. >>> >>> >>> >>> Only on bluestore we see cluster going to error state when one OSD is >>> >>> down. >>> >>> We are still validating this and let you know additional findings. >>> >>> >>> >>> Thanks, >>> >>> Muthu >>> >>> >>> >>> On 21 January 2017 at 02:06, Shinobu Kinjo <ski...@redhat.com> >>> wrote: >>> >>>> >>> >>>> `ceph pg dump` should show you something like: >>> >>>> >>> >>>> * active+undersized+degraded ... [NONE,3,2,4,1] 3 >>> [NONE,3,2,4,1] >>> >>>> >>> >>>> Sam, >>> >>>> >>> >>>> Am I wrong? Or is it up to something else? >>> >>>> >>> >>>> >>> >>>> On Sat, Jan 21, 2017 at 4:22 AM, Gregory Farnum <gfar...@redhat.com >>> > >>> >>>> wrote: >>> >>>> > I'm pretty sure the default configs won't let an EC PG go active >>> with >>> >>>> > only "k" OSDs in its PG; it needs at least k+1 (or possibly more? >>> Not >>> >>>> > certain). Running an "n+1" EC config is just not a good idea. >>> >>>> > For testing you could probably adjust this with the equivalent of >>> >>>> > min_size for EC pools, but I don't know the parameters off the >>> top of >>> >>>> > my head. >>> >>>> > -Greg >>> >>>> > >>> >>>> > On Fri, Jan 20, 2017 at 2:15 AM, Muthusamy Muthiah >>> >>>> > <muthiah.muthus...@gmail.com> wrote: >>> >>>> >> Hi , >>> >>>> >> >>> >>>> >> We are validating kraken 11.2.0 with bluestore on 5 node >>> cluster with >>> >>>> >> EC >>> >>>> >> 4+1. >>> >>>> >> >>> >>>> >> When an OSD is down , the peering is not happening and ceph >>> health >>> >>>> >> status >>> >>>> >> moved to ERR state after few mins. This was working in previous >>> >>>> >> development >>> >>>> >> releases. Any additional configuration required in v11.2.0 >>> >>>> >> >>> >>>> >> Following is our ceph configuration: >>> >>>> >> >>> >>>> >> mon_osd_down_out_interval = 30 >>> >>>> >> mon_osd_report_timeout = 30 >>> >>>> >> mon_osd_down_out_subtree_limit = host >>> >>>> >> mon_osd_reporter_subtree_level = host >>> >>>> >> >>> >>>> >> and the recovery parameters set to default. >>> >>>> >> >>> >>>> >> [root@ca-cn1 ceph]# ceph osd crush show-tunables >>> >>>> >> >>> >>>> >> { >>> >>>> >> "choose_local_tries": 0, >>> >>>> >> "choose_local_fallback_tries": 0, >>> >>>> >> "choose_total_tries": 50, >>> >>>> >> "chooseleaf_descend_once": 1, >>> >>>> >> "chooseleaf_vary_r": 1, >>> >>>> >> "chooseleaf_stable": 1, >>> >>>> >> "straw_calc_version": 1, >>> >>>> >> "allowed_bucket_algs": 54, >>> >>>> >> "profile": "jewel", >>> >>>> >> "optimal_tunables": 1, >>> >>>> >> "legacy_tunables": 0, >>> >>>> >> "minimum_required_version": "jewel", >>> >>>> >> "require_feature_tunables": 1, >>> >>>> >> "require_feature_tunables2": 1, >>> >>>> >> "has_v2_rules": 1, >>> >>>> >> "require_feature_tunables3": 1, >>> >>>> >> "has_v3_rules": 0, >>> >>>> >> "has_v4_buckets": 0, >>> >>>> >> "require_feature_tunables5": 1, >>> >>>> >> "has_v5_rules": 0 >>> >>>> >> } >>> >>>> >> >>> >>>> >> ceph status: >>> >>>> >> >>> >>>> >> health HEALTH_ERR >>> >>>> >> 173 pgs are stuck inactive for more than 300 seconds >>> >>>> >> 173 pgs incomplete >>> >>>> >> 173 pgs stuck inactive >>> >>>> >> 173 pgs stuck unclean >>> >>>> >> monmap e2: 5 mons at >>> >>>> >> >>> >>>> >> {ca-cn1=10.50.5.117:6789/0,ca-cn2=10.50.5.118:6789/0,ca-cn3= >>> 10.50.5.119:6789/0,ca-cn4=10.50.5.120:6789/0,ca-cn5=10.50.5.121:6789/0} >>> >>>> >> election epoch 106, quorum 0,1,2,3,4 >>> >>>> >> ca-cn1,ca-cn2,ca-cn3,ca-cn4,ca-cn5 >>> >>>> >> mgr active: ca-cn1 standbys: ca-cn2, ca-cn4, ca-cn5, >>> ca-cn3 >>> >>>> >> osdmap e1128: 60 osds: 59 up, 59 in; 173 remapped pgs >>> >>>> >> flags sortbitwise,require_jewel_osds >>> ,require_kraken_osds >>> >>>> >> pgmap v782747: 2048 pgs, 1 pools, 63133 GB data, 46293 >>> kobjects >>> >>>> >> 85199 GB used, 238 TB / 322 TB avail >>> >>>> >> 1868 active+clean >>> >>>> >> 173 remapped+incomplete >>> >>>> >> 7 active+clean+scrubbing >>> >>>> >> >>> >>>> >> MON log: >>> >>>> >> >>> >>>> >> 2017-01-20 09:25:54.715684 7f55bcafb700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> osd.54 out (down for 31.703786) >>> >>>> >> 2017-01-20 09:25:54.725688 7f55bf4d5700 0 mon.ca-cn1@0 >>> (leader).osd >>> >>>> >> e1120 >>> >>>> >> crush map has features 288250512065953792, adjusting msgr >>> requires >>> >>>> >> 2017-01-20 09:25:54.729019 7f55bf4d5700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> osdmap e1120: 60 osds: 59 up, 59 in >>> >>>> >> 2017-01-20 09:25:54.735987 7f55bf4d5700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> pgmap v781993: 2048 pgs: 1869 active+clean, 173 incomplete, 6 >>> >>>> >> active+clean+scrubbing; 63159 GB data, 85201 GB used, 238 TB / >>> 322 TB >>> >>>> >> avail; >>> >>>> >> 21825 B/s rd, 163 MB/s wr, 2046 op/s >>> >>>> >> 2017-01-20 09:25:55.737749 7f55bf4d5700 0 mon.ca-cn1@0 >>> (leader).osd >>> >>>> >> e1121 >>> >>>> >> crush map has features 288250512065953792, adjusting msgr >>> requires >>> >>>> >> 2017-01-20 09:25:55.744338 7f55bf4d5700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> osdmap e1121: 60 osds: 59 up, 59 in >>> >>>> >> 2017-01-20 09:25:55.749616 7f55bf4d5700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> pgmap v781994: 2048 pgs: 29 remapped+incomplete, 1869 >>> active+clean, >>> >>>> >> 144 >>> >>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB >>> used, >>> >>>> >> 238 TB / >>> >>>> >> 322 TB avail; 44503 B/s rd, 45681 kB/s wr, 518 op/s >>> >>>> >> 2017-01-20 09:25:56.768721 7f55bf4d5700 0 log_channel(cluster) >>> log >>> >>>> >> [INF] : >>> >>>> >> pgmap v781995: 2048 pgs: 47 remapped+incomplete, 1869 >>> active+clean, >>> >>>> >> 126 >>> >>>> >> incomplete, 6 active+clean+scrubbing; 63159 GB data, 85201 GB >>> used, >>> >>>> >> 238 TB / >>> >>>> >> 322 TB avail; 20275 B/s rd, 72742 kB/s wr, 665 op/s >>> >>>> >> >>> >>>> >> Thanks, >>> >>>> >> Muthu >>> >>>> >> >>> >>>> >> >>> >>>> >> _______________________________________________ >>> >>>> >> ceph-users mailing list >>> >>>> >> ceph-users@lists.ceph.com >>> >>>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>>> >> >>> >>>> > _______________________________________________ >>> >>>> > ceph-users mailing list >>> >>>> > ceph-users@lists.ceph.com >>> >>>> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >>> >>> >>> >>> >> >>> >> >>> >> _______________________________________________ >>> >> ceph-users mailing list >>> >> ceph-users@lists.ceph.com >>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> >>> >> >> >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com