My first guess is to ask what your crush rules are.  `ceph osd crush rule
dump` along with `ceph osd pool ls detail` would be helpful.  Also if you
have a `ceph status` output from a time where the VM RBDs aren't working
might explain something.

On Thu, Oct 11, 2018 at 1:12 PM Nils Fahldieck - Profihost AG <
n.fahldi...@profihost.ag> wrote:

> Hi everyone,
>
> since some time we experience service outages in our Ceph cluster
> whenever there is any change to the HEALTH status. E. g. swapping
> storage devices, adding storage devices, rebooting Ceph hosts, during
> backfills ect.
>
> Just now I had a recent situation, where several VMs hung after I
> rebooted one Ceph host. We have 3 replications for each PG, 3 mon, 3
> mgr, 3 mds and 71 osds spread over 9 hosts.
>
> We use Ceph as a storage backend for our Proxmox VE (PVE) environment.
> The outages are in the form of blocked virtual file systems of those
> virtual machines running in our PVE cluster.
>
> It feels similar to stuck and inactive PGs to me. Honestly though I'm
> not really sure on how to debug this problem or which log files to examine.
>
> OS: Debian 9
> Kernel: 4.12 based upon SLE15-SP1
>
> # ceph version
> ceph version 12.2.8-133-gded2f6836f
> (ded2f6836f6331a58f5c817fca7bfcd6c58795aa) luminous (stable)
>
> Can someone guide me? I'm more than happy to provide more information
> as needed.
>
> Thanks in advance
> Nils
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to