Re: [ceph-users] Understanding incomplete PGs

2019-07-06 Thread Torben Hørup
Hi The "ec unable to recover when below min size" thing has very recently been fixed for octopus. See https://tracker.ceph.com/issues/18749 and https://github.com/ceph/ceph/pull/17619 Docs has been updated with a section on this issue http://docs.ceph.com/docs/master/rados/operations/erasure-

Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Kyle
On Friday, July 5, 2019 11:50:44 AM CDT Paul Emmerich wrote: > * There are virtually no use cases for ec pools with m=1, this is a bad > configuration as you can't have both availability and durability I'll have to look into this more. The cluster only has 4 hosts, so it might be worth switching

Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Kyle
On Friday, July 5, 2019 11:28:32 AM CDT Caspar Smit wrote: > Kyle, > > Was the cluster still backfilling when you removed osd 6 or did you only > check its utilization? Yes, still backfilling. > > Running an EC pool with m=1 is a bad idea. EC pool min_size = k+1 so losing > a single OSD results

Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Paul Emmerich
* There are virtually no use cases for ec pools with m=1, this is a bad configuration as you can't have both availability and durability * Due to weird internal restrictions ec pools below their min size can't recover, you'll probably have to reduce min_size temporarily to recover it * Depending

Re: [ceph-users] Understanding incomplete PGs

2019-07-05 Thread Caspar Smit
Kyle, Was the cluster still backfilling when you removed osd 6 or did you only check its utilization? Running an EC pool with m=1 is a bad idea. EC pool min_size = k+1 so losing a single OSD results in inaccessible data. Your incomplete PG's are probably all EC pool pgs, please verify. If the ab

[ceph-users] Understanding incomplete PGs

2019-07-04 Thread Kyle
Hello, I'm working with a small ceph cluster (about 10TB, 7-9 OSDs, all Bluestore on lvm) and recently ran into a problem with 17 pgs marked as incomplete after adding/removing OSDs. Here's the sequence of events: 1. 7 osds in the cluster, health is OK, all pgs are active+clean 2. 3 new osds on