Data lives in another container attached to OSD container as Docker volume. According to `deis ps -a`, this volume was created two weeks ago, though all files in `current` are very recent. I suspect that something removed files in the data volume after reboot. As reboot was caused by CoreOS update, it might be newer version of Docker (1.6 -> 1.7) that introduced the problem. Or maybe it was container initialization process that somehow removed and recreated files. I don't have this data volume anymore, so can only guess.
2015-08-31 18:28 GMT+03:00 Jan Schermer <j...@schermer.cz>: > Is it possible that something else was mounted there? > Or is it possible nothing was mounted there? > That would explain such behaviour... > > Jan > > On 31 Aug 2015, at 17:07, Евгений Д. <ineu.m...@gmail.com> wrote: > > No, it really was in the cluster. Before reboot cluster had HEALTH_OK. > Though now I've checked `current` directory and it doesn't contain any > data: > > root@staging-coreos-1:/var/lib/ceph/osd/ceph-0# ls current > commit_op_seq meta nosnap omap > > while other OSDs do. It really looks like something was broken on reboot, > probably during container start, so it's not really related to Ceph. I'll > go with OSD recreation. > > Thank you. > > 2015-08-31 11:50 GMT+03:00 Gregory Farnum <gfar...@redhat.com>: > >> On Sat, Aug 29, 2015 at 3:32 PM, Евгений Д. <ineu.m...@gmail.com> wrote: >> > I'm running 3-node cluster with Ceph (it's Deis cluster, so Ceph >> daemons are >> > containerized). There are 3 OSDs and 3 mons. After rebooting all nodes >> one >> > by one all monitors are up, but only two OSDs of three are up. 'Down' >> OSD is >> > really running but is never marked up/in. >> > All three mons are reachable from inside the OSD container. >> > I've run `log dump` for this OSD and found this line: >> > >> > Aug 29 06:19:39 staging-coreos-1 sh[7393]: -99> 2015-08-29 >> 06:18:51.855432 >> > 7f5902009700 3 osd.0 0 handle_osd_map epochs [1,90], i have 0, src has >> > [1,90] >> > >> > Is it the reason why OSD cannot connect to the cluster? If yes, why >> could it >> > happen? I haven't removed any data from /var/lib/ceph/osd. >> > Is it possible to bring this OSD back to cluster without completely >> > recreating it? >> > >> > Ceph version is: >> > >> > root@staging-coreos-1:/# ceph -v >> > ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3) >> >> It's pretty unlikely. I presume (since the OSD has no maps) that it's >> never actually been up and in the cluster? Or else its data store has >> been pretty badly corrupted since it doesn't have any of the requisite >> metadata. In which case you'll probably be best off recreating it >> (with 3 OSDs I assume all your PGs are still active). >> -Greg >> > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com