We had a very similar problem, but was repeatable on Firefly as well. For us, it turns out the MTU on the switches were not all configured for 9000 byte frames. This prevented the peering process to complete and as data was added, and such got worse.
Here is a section I wrote for our internal documentation (I'm going to try and get this into the official documentation when I have some time). What To Do If A Cluster Just Won't Get Healthy Sometimes it seems a cluster just won't reach a healthy state. This is quite obvious when some PGs are stuck in a peering state for a long time. If looking through the logs don't give you ideas, here is a list of things to verify (don't skip any steps because these are often overlooked, but can cause a cluster to be stuck). 1. Verify firewall settings. The firewall should be either off or tcp ports 6789 and 6800-6899 should allow incoming connections on the public and cluster networks. - iptables -L -n 2. Verify that the clocks on all hosts are close in time and running ntpd. - date; ps aux | grep ntpd 3. Verify that each host can communicate using Jumbo frames with each host on the public and private networks - ping -c 4 -s 8970 www.xxx.yyy.zzz 4. Check if there is some zombie OSD entries in the CRUSH map. If there are a number of DNE entries for OSDs, they haven't been cleaned up from the CRUSH map completely. - If you are unsure if an OSD is completely dead or might come back, you can set the weight of that OSD to zero - ceph osd crush reweight osd.XX 0.000 - If you know that the OSD will not ever come back, you can remove it completely - ceph osd crush remove osd.X 5. Check that you aren't running into an open file limit. Ceph OSDs starting by udev or SystemV/upstart/systemd should handle this pretty well. If processes are being started manually, be sure to check. - ps aux | grep ceph-osd (should have a ulimit command in the process command) - grep "Too many open files" /var/log/ceph/ceph-osd.* (should not return anything) On Tue, Apr 21, 2015 at 7:03 AM, f...@univ-lr.fr <f...@univ-lr.fr> wrote: > Hi all, > > may there be a problem with the crush function during 'from scratch' > installation of 0.94.1-0 ? > > This has been tested many times, with ceph-deploy-1.5.22-0 or > ceph-deploy-1.5.23-0. Platform RHEL7. > > Each time, the new cluster ends up in a weird state never seen on my > previous installed versions (0.94, 0.87.1), > - I've seen things perhaps linked to ceph-deploy-1.5.23-0, either one or > more monitors being unable to form the cluster (with respawning 'python > /usr/sbin/ceph-create-keys' messages). But I think that's other part of the > issue. > - the main issue is visible as a warning on health of the PGs as soon as > the cluster is enough formed to answer a 'ceph -s'. > > - here is a 1 Mon, almost empty freshly installed cluster : > > ROOT > ceph -s > cluster e581ab43-d0f5-4ea8-811f-94c8df16d044 > health HEALTH_WARN > 2 pgs degraded > 14 pgs peering > 4 pgs stale > 2 pgs stuck degraded > 25 pgs stuck inactive > 4 pgs stuck stale > 27 pgs stuck unclean > 2 pgs stuck undersized > 2 pgs undersized > too few PGs per OSD (3 < min 30) > monmap e1: 1 mons at {helga=10.10.10.64:6789/0} > election epoch 2, quorum 0 helga > osdmap e398: 60 osds: 60 up, 60 in; 2 remapped pgs > pgmap v1553: 64 pgs, 1 pools, 0 bytes data, 0 objects > 2829 MB used, 218 TB / 218 TB avail > 37 active+clean > 12 peering > 11 activating > 2 stale+active+undersized+degraded > 2 stale+remapped+peering > > with time, the number of defects is growing. They literraly explode if we > put objects on it. > > - a 'ceph health detail' show for example entries like this one : > pg 0.22 is stuck inactive since forever, current state peering, last > acting [18,17,0] > > - A query on the PG shows > ceph pg 0.22 query > { > "state": "peering", > ../.. > "up": [ > 18, > 17, > 0 > ], > "blocked_by": [ > 0, > 1, > 5, > 17 > ], > ../.. > } > > > If my understanding of the ceph query is correct, OSDs 1, 5 and 17 have > nothing do do with this PG.... Where do they come from ?? > Couldn't this be part of the "critical issues with CRUSH" 0.94.1 is meant > to correct ? > > Frederic > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com