Thanks for the tips

A single OSD was indeed 95% full, and after removing it there is 24TB
usable space and everything working again. :D

I hope during the backfilling, another OSD won't go 95% also.

It's a bit odd with ~140 OSD's that a single full one can take everything
down with it.
I would understand since 8/2 erasure coding spreads data over 10 disks that
if one of those is full it cant use the capacity of the other 9 disks.
But seems it can only use the free capacity based on the lowest one in the
whole cluster




On Thu, May 9, 2019 at 1:25 PM Paul Emmerich <paul.emmer...@croit.io> wrote:

> One full OSD stops everything.
>
> You can change what's considered 'full', the default is 95%
>
> ceph osd set-full-ratio 0.95
>
> Never let an OSD run 100% full, that will lead to lots of real
> problems, 95% is a good default (it's not exact, some metadata might
> not always be accounted or it might temporarily need more)
>
> A quick and dirty work-around if only one OSD is full: take it down ;)
>
> Paul
>
> --
> Paul Emmerich
>
> Looking for help with your Ceph cluster? Contact us at https://croit.io
>
> croit GmbH
> Freseniusstr. 31h
> 81247 München
> www.croit.io
> Tel: +49 89 1896585 90
>
> On Thu, May 9, 2019 at 2:08 PM Kári Bertilsson <karibert...@gmail.com>
> wrote:
> >
> > Hello
> >
> > I am running cephfs with 8/2 erasure coding. I had about 40tb usable
> free(110tb raw), one small disk crashed and i added 2x10tb disks. Now it's
> backfilling & recovering with 0B free and i can't read a single file from
> the file system...
> >
> > This happend with max-backfilling 4, but i have increased max backfills
> to 128, to hopefully get this over a little faster since system has been
> unusable for 12 hours anyway. Not sure yet if that was a good idea.
> >
> > 131TB of raw space was somehow not enough to keep things running. Any
> tips to avoid this kind of scenario in the future ?
> >
> > GLOBAL:
> >    SIZE       AVAIL      RAW USED     %RAW USED
> >    489TiB     131TiB       358TiB         73.17
> > POOLS:
> >    NAME                ID     USED        %USED      MAX AVAIL
>  OBJECTS
> >    ec82_pool           41      278TiB     100.00            0B
>  28549450
> >    cephfs_metadata     42      174MiB       0.04        381GiB
>  666939
> >    rbd                 51     99.3GiB      20.68        381GiB
> 25530
> >
> >   data:
> >    pools:   3 pools, 704 pgs
> >    objects: 29.24M objects, 278TiB
> >    usage:   358TiB used, 131TiB / 489TiB avail
> >    pgs:     1265432/287571907 objects degraded (0.440%)
> >             12366014/287571907 objects misplaced (4.300%)
> >             536 active+clean
> >             137 active+remapped+backfilling
> >             27  active+undersized+degraded+remapped+backfilling
> >             4   active+remapped+backfill_toofull
> >
> >  io:
> >    client:   64.0KiB/s wr, 0op/s rd, 7op/s wr
> >    recovery: 1.17GiB/s, 113objects/s
> >
> > Is there anything i can do to restore reading ? I can understand writing
> not working, but why is it blocking reading also ? Any tips ?
> >
> >
> > _______________________________________________
> > ceph-users mailing list
> > ceph-users@lists.ceph.com
> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to