Em ter., 4 de fev. de 2020 às 15:19, <dhils...@performair.com> escreveu:
>
> Rodrigo;
>
> Best bet would be to check logs.  Check the OSD logs on the affected server.  
> Check cluster logs on the MONs.  Check OSD logs on other servers.
>
> Your Ceph version(s) and your OS distribution and version would also be 
> useful to help you troubleshoot this OSD flapping issue.

Looking at the logs I finally found the issue: when I said that there
were no changes in network topology, I was mistaken. I removed an
unused (or at least I thought so) network board from each server.

These servers had 2 network boards that I installed and configured so
I would have a "public network" and a "cluster network". That was when
I was first installing the ceph cluster.

After having some problems with this set up I was advised by members
of this list to not use this dual network setup as it could make
debugging much more difficult. I followed this advice, or at least
tried to.

To make a long story short, ceph was still trying to use the second
network for some OSDs. With a "ceph config rm global cluster_network"
and a general restart of the cluster, everything started working
again.

Thanks for the help and sorry for the confusion.


Regards,

Rodrigo
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to