On Sun, Nov 25, 2018 at 2:41 PM Stefan Kooman <ste...@bit.nl> wrote:

> Hi list,
>
> During cluster expansion (adding extra disks to existing hosts) some
> OSDs failed (FAILED assert(0 == "unexpected error", _txc_add_transaction
> error (39) Directory not empty not handled on operation 21 (op 1,
> counting from 0), full details: https://8n1.org/14078/c534). We had
> "norebalance", "nobackfill", and "norecover" flags set. After we unset
> nobackfill and norecover (to let Ceph fix the degraded PGs) it would
> recover all but 12 objects (2 PGs). We queried the PGs and the OSDs that
> were supposed to have a copy of them, and they were already "probed".  A
> day later (~24 hours) it would still not have recovered the degraded
> objects.  After we unset the "norebalance" flag it would start
> rebalancing, backfilling and recovering. The 12 degraded objects were
> recovered.
>
> Is this expected behaviour? I would expect Ceph to always try to fix
> degraded things first and foremost. Even "pg force-recover" and "pg
> force-backfill" could not force recovery.
>

I haven't dug into how the norebalance flag works, but I think this is
expected — it presumably prevents OSDs from creating new copies of PGs,
which is what needed to happen here.
-Greg


>
> Gr. Stefan
>
>
>
>
> --
> | BIT BV  http://www.bit.nl/        Kamer van Koophandel 09090351
> | GPG: 0xD14839C6                   +31 318 648 688
> <+31%20318%20648%20688> / i...@bit.nl
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to