Thanks for the tips A single OSD was indeed 95% full, and after removing it there is 24TB usable space and everything working again. :D
I hope during the backfilling, another OSD won't go 95% also. It's a bit odd with ~140 OSD's that a single full one can take everything down with it. I would understand since 8/2 erasure coding spreads data over 10 disks that if one of those is full it cant use the capacity of the other 9 disks. But seems it can only use the free capacity based on the lowest one in the whole cluster On Thu, May 9, 2019 at 1:25 PM Paul Emmerich <paul.emmer...@croit.io> wrote: > One full OSD stops everything. > > You can change what's considered 'full', the default is 95% > > ceph osd set-full-ratio 0.95 > > Never let an OSD run 100% full, that will lead to lots of real > problems, 95% is a good default (it's not exact, some metadata might > not always be accounted or it might temporarily need more) > > A quick and dirty work-around if only one OSD is full: take it down ;) > > Paul > > -- > Paul Emmerich > > Looking for help with your Ceph cluster? Contact us at https://croit.io > > croit GmbH > Freseniusstr. 31h > 81247 München > www.croit.io > Tel: +49 89 1896585 90 > > On Thu, May 9, 2019 at 2:08 PM Kári Bertilsson <karibert...@gmail.com> > wrote: > > > > Hello > > > > I am running cephfs with 8/2 erasure coding. I had about 40tb usable > free(110tb raw), one small disk crashed and i added 2x10tb disks. Now it's > backfilling & recovering with 0B free and i can't read a single file from > the file system... > > > > This happend with max-backfilling 4, but i have increased max backfills > to 128, to hopefully get this over a little faster since system has been > unusable for 12 hours anyway. Not sure yet if that was a good idea. > > > > 131TB of raw space was somehow not enough to keep things running. Any > tips to avoid this kind of scenario in the future ? > > > > GLOBAL: > > SIZE AVAIL RAW USED %RAW USED > > 489TiB 131TiB 358TiB 73.17 > > POOLS: > > NAME ID USED %USED MAX AVAIL > OBJECTS > > ec82_pool 41 278TiB 100.00 0B > 28549450 > > cephfs_metadata 42 174MiB 0.04 381GiB > 666939 > > rbd 51 99.3GiB 20.68 381GiB > 25530 > > > > data: > > pools: 3 pools, 704 pgs > > objects: 29.24M objects, 278TiB > > usage: 358TiB used, 131TiB / 489TiB avail > > pgs: 1265432/287571907 objects degraded (0.440%) > > 12366014/287571907 objects misplaced (4.300%) > > 536 active+clean > > 137 active+remapped+backfilling > > 27 active+undersized+degraded+remapped+backfilling > > 4 active+remapped+backfill_toofull > > > > io: > > client: 64.0KiB/s wr, 0op/s rd, 7op/s wr > > recovery: 1.17GiB/s, 113objects/s > > > > Is there anything i can do to restore reading ? I can understand writing > not working, but why is it blocking reading also ? Any tips ? > > > > > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com