Re: [ceph-users] OSD mystery

Dan Koren Mon, 31 Mar 2014 13:59:19 -0700

Thanks for the prompt reply.
The OSDs are set up on dedicated devices, and
the mappings are in /etc/fstab. mount shows:


/dev/rssda on /var/lib/ceph/osd/ceph-0 type xfs (rw)


and similar on all other nodes.
Thx,
dk

On Mon, Mar 31, 2014 at 1:12 PM, Gregory Farnum <g...@inktank.com> wrote:

> Well, you killed them as part of the reboot...they should have
> restarted automatically when the system turned on, but that will
> depend on your configuration and how they were set up. (Eg, if they
> are each getting a dedicated hard drive, make sure the system knows
> the drive is present.)
> What version of the software are you running?
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
> On Mon, Mar 31, 2014 at 1:00 PM, Dan Koren <d...@daterainc.com> wrote:
> > Hi Greg,
> > Thanks for the prompt response.
> > Sure enough, I do see all the OSDs are now down.
> > However, I do not understand the meaning of the
> > sentence about killing the OSDs. This was an OS
> > level reboot of the entire cluster, not issuing any
> > ceph commands either before or after the restart.
> > Doesn't Ceph recover transparently to the same
> > state it was in before the cluster rebooted?
> > Thx,
> > dk
> >
> > On Mon, Mar 31, 2014 at 12:47 PM, Gregory Farnum <g...@inktank.com>
> wrote:
> >>
> >> If you wait longer, you should see the remaining OSDs get marked down.
> >> We detect down OSDs in two ways:
> >> 1) OSDs heartbeat each other frequently and issue reports when the
> >> heartbeat responses take too long. (This is the main way.)
> >> 2) OSDs periodically send statistics to the monitors, and if these
> >> statistics do not arrive for a *very* long time (roughly 15 minutes,
> >> by default) the monitor will mark the OSD down.
> >>
> >> It looks like when restarting, you did it so that the first OSD was
> >> marked down by the other OSDs in their timeframe (about 30 seconds),
> >> but you killed the others quickly enough that they were never marked
> >> down by the other.
> >> -Greg
> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
> >>
> >>
> >> On Mon, Mar 31, 2014 at 12:44 PM, Dan Koren <d...@daterainc.com> wrote:
> >> > On a 4 node cluster (admin + 3 mon/osd nodes) I see the following
> >> > shortly
> >> > after rebooting the cluster and waiting for a couple of minutes:
> >> >
> >> > root@rts23:~# ps -ef | grep ceph && ceph osd tree
> >> > root       4183      1  0 12:09 ?        00:00:00 /usr/bin/ceph-mon
> >> > --cluster=ceph -i rts23 -f
> >> > root       5771   5640  0 12:30 pts/0    00:00:00 grep --color=auto
> ceph
> >> > # id    weight  type name       up/down reweight
> >> > -1      0.94    root default
> >> > -2      0.31            host rts22
> >> > 0       0.31                    osd.0   down    0
> >> > -3      0.31            host rts21
> >> > 1       0.31                    osd.1   up      1
> >> > -4      0.32            host rts23
> >> > 2       0.32                    osd.2   up      1
> >> >
> >> >
> >> > It seems rather odd that ceph reports 2 OSDs up while ps does not show
> >> > any OSD daemons running (ceph osd tree output is the same on all 4
> >> > nodes).
> >> >
> >> > ceph status shows:
> >> >
> >> > root@rts23:~# ceph status
> >> >     cluster 6149cebd-b619-4709-9fec-00fd8bc210a3
> >> >      health HEALTH_WARN 192 pgs degraded; 192 pgs stale; 192 pgs stuck
> >> > stale; 192 pgs
> >> > stuck unclean; recovery 10242/20484 objects degraded (50.000%); 2/2 in
> >> > osds
> >> > are down;
> >> > clock skew detected on mon.rts23
> >> >      monmap e1: 3 mons at
> >> > {rts21=172.29.0.21:6789/0,rts22=172.29.0.22:6789/0,rts23=
> >> > 172.29.0.23:6789/0}, election epoch 12, quorum 0,1,2
> rts21,rts22,rts23
> >> >      osdmap e25: 3 osds: 0 up, 2 in
> >> >       pgmap v445: 192 pgs, 3 pools, 40960 MB data, 10242 objects
> >> >             10305 MB used, 641 GB / 651 GB avail
> >> >             10242/20484 objects degraded (50.000%)
> >> >                  192 stale+active+degraded
> >> >
> >> >
> >> > How can OSDs be "up" when no OSD daemons are running in the cluster?
> >> >
> >> > MTIA,
> >> >
> >> > dk
> >> >
> >> > Dan Koren
> >> > Director of Software
> >> > DATERA | 650.210.7910 | @dateranews
> >> > d...@datera.io
> >> >
> >>
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] OSD mystery

Reply via email to