You'll need to go look at the individual OSDs to determine why they aren't on. All the cluster knows is that the OSDs aren't communicating properly. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com
On Tue, Apr 29, 2014 at 3:06 AM, Gandalf Corvotempesta <gandalf.corvotempe...@gmail.com> wrote: > After a simple "service ceph restart" on a server, i'm unable to get > my cluster up again > http://pastebin.com/raw.php?i=Wsmfik2M > > suddenly, some OSDs goes UP and DOWN randomly. > > I don't see any network traffic on cluster interface. > How can I detect what ceph is doing ? From the posted output there is > no way to detect if ceph is recovering or not. Showing just a bunch of > increasing/decreasing numbers doens't help. > > I can see this: > > 2014-04-29 12:03:49.013808 mon.0 [INF] pgmap v1047121: 98432 pgs: 241 > inactive, 33138 peering, 25 remapped, 60067 down+peering, 3489 > remapped+peering, 1472 down+remapped+peering; 66261 bytes data, 1647 > MB used, 5582 GB / 5583 GB avail > > so what, is it recovering? Is it sleeping ? Why is not recovering ? > > http://pastebin.com/raw.php?i=2EdugwQa > why all OSDs from host osd12 and osd13 are down ? Both hosts are up and > running. > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com