Sorry missing the pg dump : 2.1 0 0 0 0 0 0 0 0 stale+peering 2018-07-26 19:38:13.381673 0'0 125:9 [3] 3 [3] 3 0'0 2018-07-26 15:20:08.965357 0'0 2018-07-26 15:20:08.965357 0 2.0 0 0 0 0 0 0 0 0 stale+peering 2018-07-26 19:38:13.345341 0'0 125:13 [3] 3 [3] 3 0'0 2018-07-26 15:20:08.965357 0'0 2018-07-26 15:20:08.965357 0
2 0 0 0 0 0 0 0 0 sum 0 0 0 0 0 0 0 0 OSD_STAT USED AVAIL TOTAL HB_PEERS PG_SUM PRIMARY_PG_SUM 3 1051M 1861G 1863G [0,1,2] 256 256 2 1051M 1861G 1863G [0,1,3] 0 0 1 1051M 3724G 3726G [0,2,3] 0 0 0 1051M 1861G 1863G [1,2,3] 0 0 sum 4205M 9310G 9315G For some reason it seems that some PG are allocated to osd 3 ( but stall + peering) This is kind of odd On Thu, 26 Jul 2018 at 20:50, Benoit Hudzia <ben...@stratoscale.com> wrote: > You are correct the PG are stale ( not allocated ) > > [root@stratonode1 /]# ceph status > cluster: > id: ea0df043-7b25-4447-a43d-e9b2af8fe069 > health: HEALTH_WARN > Reduced data availability: 256 pgs inactive, 256 pgs peering, > 256 pgs stale > > services: > mon: 3 daemons, quorum > stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato > mgr: stratonode1(active), standbys: stratonode2, stratonode3 > osd: 4 osds: 4 up, 4 in > > data: > pools: 1 pools, 256 pgs > objects: 0 objects, 0 bytes > usage: 4192 MB used, 9310 GB / 9315 GB avail > pgs: 100.000% pgs not active > 256 stale+peering > > PG dump : show all PG in stale + peering > > However it s kind of strange it show some PG associated with OSD 3 > > > So it seems that PGcalc is not taking into account the ruleset ..... > > Do you think that changing ""osd max pg per osd hard ratio"" to a huge > number (1M) would be a valid temp workaround ? > > We always allocate pool with dedicated OSD using the device class rule set > , so we never have pool sharing OSD . > > I ll open a bug with ceph regarding pg creation check ignoring the crush > ruleset. > > > On Thu, 26 Jul 2018 at 17:11, John Spray <jsp...@redhat.com> wrote: > >> On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <ben...@stratoscale.com> >> wrote: >> >>> HI, >>> >>> We currently segregate ceph pool PG allocation using the crush device >>> class ruleset as described: >>> https://ceph.com/community/new-luminous-crush-device-classes/ >>> simply using the following command to define the rule : ceph osd crush >>> rule create-replicated <RULE> default host <DEVICE CLASS> >>> >>> However, we noticed that the rule is not strict in certain scenarios. By >>> that, I mean that if there is no OSD of the specific device class ceph will >>> allocate PG for this pool to any other OSD available ( creating an >>> issue with the PG calculation when we want to add new pool) >>> >>> Simple scenario : >>> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each >>> . belonging to class <pool1> >>> 2. remove all OSD ( delete them ) >>> 3. create 4 new OSD (using same disk but different ID) but this time >>> tag them with class <pool2> >>> 4. Try to create pool <pool2> -> this will fail with >>> >>> the pool creation will fail with output : Error ERANGE: pg_num 256 >>> size 2 would mean 1024 total pgs, which exceeds max 800 >>> (mon_max_pg_per_osd 200 * num_in_osds 4)" >>> >>> Pool1 simply started allocating PG to OSD that doesn't belong to the >>> ruleset >>> >> >> Are you sure pool 1's PGs are actually being placed on the wrong OSDs? >> Have you looked at the output of "ceph pg dump" to check that? >> >> It sounds more like the pool creation check is simply failing to consider >> the crush rules and applying a cruder global check. >> >> John >> >> >>> >>> Which leads me to the following question: is there a way to make the >>> crush rule a hard requirement. E.g : if we do not have any osd matching the >>> device class , it won't start trying to allocate pg to OSD that doesn't >>> match it? >>> >>> Is there any way to prevent pool 1 to use the OSD ? >>> >>> >>> >>> >>> -- >>> Dr. Benoit Hudzia >>> >>> Mobile (UK): +44 (0) 75 346 78673 >>> Mobile (IE): +353 (0) 89 219 3675 >>> Email: ben...@stratoscale.com >>> >>> >>> >>> Web <http://www.stratoscale.com/> | Blog >>> <http://www.stratoscale.com/blog/> | Twitter >>> <https://twitter.com/Stratoscale> | Google+ >>> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> >>> | Linkedin <https://www.linkedin.com/company/stratoscale> >>> >>> _______________________________________________ >>> ceph-users mailing list >>> ceph-users@lists.ceph.com >>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>> >> > > -- > Dr. Benoit Hudzia > > Mobile (UK): +44 (0) 75 346 78673 > Mobile (IE): +353 (0) 89 219 3675 > Email: ben...@stratoscale.com > > > > Web <http://www.stratoscale.com/> | Blog > <http://www.stratoscale.com/blog/> | Twitter > <https://twitter.com/Stratoscale> | Google+ > <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> > | Linkedin <https://www.linkedin.com/company/stratoscale> > > -- Dr. Benoit Hudzia Mobile (UK): +44 (0) 75 346 78673 Mobile (IE): +353 (0) 89 219 3675 Email: ben...@stratoscale.com Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/> | Twitter <https://twitter.com/Stratoscale> | Google+ <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> | Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com