Re: [ceph-users] OSD won't go up after node reboot

Евгений Д . Tue, 01 Sep 2015 03:47:20 -0700

Data lives in another container attached to OSD container as Docker volume.
According to `deis ps -a`, this volume was created two weeks ago, though
all files in `current` are very recent. I suspect that something removed
files in the data volume after reboot. As reboot was caused by CoreOS
update, it might be newer version of Docker (1.6 -> 1.7) that introduced
the problem. Or maybe it was container initialization process that somehow
removed and recreated files.
I don't have this data volume anymore, so can only guess.


2015-08-31 18:28 GMT+03:00 Jan Schermer <j...@schermer.cz>:

> Is it possible that something else was mounted there?
> Or is it possible nothing was mounted there?
> That would explain such behaviour...
>
> Jan
>
> On 31 Aug 2015, at 17:07, Евгений Д. <ineu.m...@gmail.com> wrote:
>
> No, it really was in the cluster. Before reboot cluster had HEALTH_OK.
> Though now I've checked `current` directory and it doesn't contain any
> data:
>
>         root@staging-coreos-1:/var/lib/ceph/osd/ceph-0# ls current
>         commit_op_seq  meta  nosnap  omap
>
> while other OSDs do. It really looks like something was broken on reboot,
> probably during container start, so it's not really related to Ceph. I'll
> go with OSD recreation.
>
> Thank you.
>
> 2015-08-31 11:50 GMT+03:00 Gregory Farnum <gfar...@redhat.com>:
>
>> On Sat, Aug 29, 2015 at 3:32 PM, Евгений Д. <ineu.m...@gmail.com> wrote:
>> > I'm running 3-node cluster with Ceph (it's Deis cluster, so Ceph
>> daemons are
>> > containerized). There are 3 OSDs and 3 mons. After rebooting all nodes
>> one
>> > by one all monitors are up, but only two OSDs of three are up. 'Down'
>> OSD is
>> > really running but is never marked up/in.
>> > All three mons are reachable from inside the OSD container.
>> > I've run `log dump` for this OSD and found this line:
>> >
>> > Aug 29 06:19:39 staging-coreos-1 sh[7393]: -99> 2015-08-29
>> 06:18:51.855432
>> > 7f5902009700  3 osd.0 0 handle_osd_map epochs [1,90], i have 0, src has
>> > [1,90]
>> >
>> > Is it the reason why OSD cannot connect to the cluster? If yes, why
>> could it
>> > happen? I haven't removed any data from /var/lib/ceph/osd.
>> > Is it possible to bring this OSD back to cluster without completely
>> > recreating it?
>> >
>> > Ceph version is:
>> >
>> > root@staging-coreos-1:/# ceph -v
>> > ceph version 0.94.2 (5fb85614ca8f354284c713a2f9c610860720bbf3)
>>
>> It's pretty unlikely. I presume (since the OSD has no maps) that it's
>> never actually been up and in the cluster? Or else its data store has
>> been pretty badly corrupted since it doesn't have any of the requisite
>> metadata. In which case you'll probably be best off recreating it
>> (with 3 OSDs I assume all your PGs are still active).
>> -Greg
>>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD won't go up after node reboot

Reply via email to