You are correct the PG are stale ( not allocated ) [root@stratonode1 /]# ceph status cluster: id: ea0df043-7b25-4447-a43d-e9b2af8fe069 health: HEALTH_WARN Reduced data availability: 256 pgs inactive, 256 pgs peering, 256 pgs stale
services: mon: 3 daemons, quorum stratonode1.node.strato,stratonode2.node.strato,stratonode0.node.strato mgr: stratonode1(active), standbys: stratonode2, stratonode3 osd: 4 osds: 4 up, 4 in data: pools: 1 pools, 256 pgs objects: 0 objects, 0 bytes usage: 4192 MB used, 9310 GB / 9315 GB avail pgs: 100.000% pgs not active 256 stale+peering PG dump : show all PG in stale + peering However it s kind of strange it show some PG associated with OSD 3 So it seems that PGcalc is not taking into account the ruleset ..... Do you think that changing ""osd max pg per osd hard ratio"" to a huge number (1M) would be a valid temp workaround ? We always allocate pool with dedicated OSD using the device class rule set , so we never have pool sharing OSD . I ll open a bug with ceph regarding pg creation check ignoring the crush ruleset. On Thu, 26 Jul 2018 at 17:11, John Spray <jsp...@redhat.com> wrote: > On Thu, Jul 26, 2018 at 4:57 PM Benoit Hudzia <ben...@stratoscale.com> > wrote: > >> HI, >> >> We currently segregate ceph pool PG allocation using the crush device >> class ruleset as described: >> https://ceph.com/community/new-luminous-crush-device-classes/ >> simply using the following command to define the rule : ceph osd crush >> rule create-replicated <RULE> default host <DEVICE CLASS> >> >> However, we noticed that the rule is not strict in certain scenarios. By >> that, I mean that if there is no OSD of the specific device class ceph will >> allocate PG for this pool to any other OSD available ( creating an >> issue with the PG calculation when we want to add new pool) >> >> Simple scenario : >> 1. create 1 Pool : <pool1> , replication 2 with 4 nodes , 1 OSD each >> . belonging to class <pool1> >> 2. remove all OSD ( delete them ) >> 3. create 4 new OSD (using same disk but different ID) but this time tag >> them with class <pool2> >> 4. Try to create pool <pool2> -> this will fail with >> >> the pool creation will fail with output : Error ERANGE: pg_num 256 size >> 2 would mean 1024 total pgs, which exceeds max 800 (mon_max_pg_per_osd 200 >> * num_in_osds 4)" >> >> Pool1 simply started allocating PG to OSD that doesn't belong to the >> ruleset >> > > Are you sure pool 1's PGs are actually being placed on the wrong OSDs? > Have you looked at the output of "ceph pg dump" to check that? > > It sounds more like the pool creation check is simply failing to consider > the crush rules and applying a cruder global check. > > John > > >> >> Which leads me to the following question: is there a way to make the >> crush rule a hard requirement. E.g : if we do not have any osd matching the >> device class , it won't start trying to allocate pg to OSD that doesn't >> match it? >> >> Is there any way to prevent pool 1 to use the OSD ? >> >> >> >> >> -- >> Dr. Benoit Hudzia >> >> Mobile (UK): +44 (0) 75 346 78673 >> Mobile (IE): +353 (0) 89 219 3675 >> Email: ben...@stratoscale.com >> >> >> >> Web <http://www.stratoscale.com/> | Blog >> <http://www.stratoscale.com/blog/> | Twitter >> <https://twitter.com/Stratoscale> | Google+ >> <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> >> | Linkedin <https://www.linkedin.com/company/stratoscale> >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> > -- Dr. Benoit Hudzia Mobile (UK): +44 (0) 75 346 78673 Mobile (IE): +353 (0) 89 219 3675 Email: ben...@stratoscale.com Web <http://www.stratoscale.com/> | Blog <http://www.stratoscale.com/blog/> | Twitter <https://twitter.com/Stratoscale> | Google+ <https://plus.google.com/u/1/b/108421603458396133912/108421603458396133912/posts> | Linkedin <https://www.linkedin.com/company/stratoscale>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com