Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

Gregory Farnum Thu, 23 Aug 2018 09:52:02 -0700

On Thu, Aug 23, 2018 at 12:26 AM mj <li...@merit.unu.edu> wrote:

> Hi,
>
> Thanks John and Gregory for your answers.
>
> Gregory's answer worries us. We thought that with a 3/2 pool, and one PG
> corrupted, the assumption would be: the two similar ones are correct,
> and the third one needs to be adjusted.
>
> Can we determine from this output, if I created corruption in our
> cluster..?
>


No, we can't tell. It's not very likely, though, and given that this is an
rbd pool I'd bet the most likely inconsistency was a lost journal write
which is long-since unneeded by the VM anyway. If it's a concern, you could
try tracking down the rbd image (it's got the internal ID 2c191e238e1f29;
I'm not sure what the command is to turn that into the front-end name) and
running an fsck and any available application data scrubs.

In future I believe the most reliable way on Jewel is to simply go look at
the objects and do the vote-counting yourself. Later versions of Ceph
include more checksumming output etc to make it easier and to more reliably
identify the broken copy to begin with.
-Greg


>
> > root@pm1:~# ceph health detail
> > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
> > 1 scrub errors
> > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15
> 10.10.89.1:6812/3810 2122 : cluster [INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15
> 10.10.89.1:6812/3810 2123 : cluster [INF] 2.1a9 deep-scrub ok
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15
> 10.10.89.1:6800/3352 18074 : cluster [INF] 2.1a9 deep-scrub starts
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15
> 10.10.89.1:6800/3352 18075 : cluster [ERR] 2.1a9 shard 23: soid
> 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a
> read error
> > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15
> 10.10.89.1:6800/3352 18076 : cluster [ERR] 2.1a9 deep-scrub 0 missing, 1
> inconsistent objects
>
> > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15
> 10.10.89.1:6800/3352 18088 : cluster [INF] 2.1a9 repair starts
> > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15
> 10.10.89.1:6800/3352 18089 : cluster [ERR] 2.1a9 shard 23: soid
> 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a
> read error
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15
> 10.10.89.1:6800/3352 18090 : cluster [ERR] 2.1a9 repair 0 missing, 1
> inconsistent objects
> > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15
> 10.10.89.1:6800/3352 18091 : cluster [ERR] 2.1a9 repair 1 errors, 1 fixed
>
> And also: jewel (which we're running) is considered "the old past" with
> the old non-checksum behaviour?
>
> In case this occurs again... what would be the steps to determine WHICH
> pg is the corrupt one, and how to proceed it it happens to be the
> primary pg for an object?
>
> Upgrading to luminous would prevent this from happening again i guess.
> We're a bit scared to upgrade, because there seem to be so many issues
> with luminous and upgrading to it.
>
> Having said all this: we are surprised to see this is on our cluster, as
> it should be and has been running stable and reliably for over two
> years. Perhaps just a one-time glitch.
>
> Thanks for your replies!
>
> MJ
>
> On 08/23/2018 01:06 AM, Gregory Farnum wrote:
> > On Wed, Aug 22, 2018 at 2:46 AM John Spray <jsp...@redhat.com
> > <mailto:jsp...@redhat.com>> wrote:
> >
> >     On Wed, Aug 22, 2018 at 7:57 AM mj <li...@merit.unu.edu
> >     <mailto:li...@merit.unu.edu>> wrote:
> >      >
> >      > Hi,
> >      >
> >      > This morning I woke up, seeing my ceph jewel 10.2.10 cluster in
> >      > HEALTH_ERR state. That helps you getting out of bed. :-)
> >      >
> >      > Anyway, much to my surprise, all VMs  running on the cluster were
> >     still
> >      > working like nothing was going on. :-)
> >      >
> >      > Checking a bit more reveiled:
> >      >
> >      > > root@pm1:~# ceph -s
> >      > >     cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1
> >      > >      health HEALTH_ERR
> >      > >             1 pgs inconsistent
> >      > >             1 scrub errors
> >      > >      monmap e3: 3 mons at
> >     {0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0
> >     <http://10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0>}
> >      > >             election epoch 296, quorum 0,1,2 0,1,2
> >      > >      osdmap e12662: 24 osds: 24 up, 24 in
> >      > >             flags sortbitwise,require_jewel_osds
> >      > >       pgmap v64045618: 1088 pgs, 2 pools, 14023 GB data, 3680
> >     kobjects
> >      > >             44027 GB used, 45353 GB / 89380 GB avail
> >      > >                 1087 active+clean
> >      > >                    1 active+clean+inconsistent
> >      > >   client io 26462 kB/s rd, 14048 kB/s wr, 6 op/s rd, 383 op/s wr
> >      > > root@pm1:~# ceph health detail
> >      > > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors
> >      > > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6]
> >      > > 1 scrub errors
> >      > > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log*
> >      > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15
> >     10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2122 : cluster
> >     [INF] 2.1a9 deep-scrub starts
> >      > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15
> >     10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2123 : cluster
> >     [INF] 2.1a9 deep-scrub ok
> >      > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18074 : cluster
> >     [INF] 2.1a9 deep-scrub starts
> >      > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18075 : cluster
> >     [ERR] 2.1a9 shard 23: soid
> >     2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
> >     had a read error
> >      > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18076 : cluster
> >     [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects
> >      >
> >      > ok, according to the docs I should do "ceph pg repair 2.1a9". Did
> >     that,
> >      > and some minutes later the cluster came back to "HEALTH_OK"
> >      >
> >      > Checking the logs:
> >      > > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18088 : cluster
> >     [INF] 2.1a9 repair starts
> >      > > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18089 : cluster
> >     [ERR] 2.1a9 shard 23: soid
> >     2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate
> >     had a read error
> >      > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18090 : cluster
> >     [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects
> >      > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15
> >     10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18091 : cluster
> >     [ERR] 2.1a9 repair 1 errors, 1 fixed
> >      >
> >      > So, we are fine again, it seems.
> >      >
> >      > But now my question: can anyone what happened? Is one of my disks
> >     dying?
> >      > In the proxmox gui, all osd disks are SMART status "OK".
> >      >
> >      > Besides that, as the cluster was still running and the fix was
> >      > relatively simple, would a HEALTH_WARN not have been more
> >     appropriate?
> >
> >     An inconsistent PG generally implies data corruption, which is
> usually
> >     pretty scary.  Your cluster may have been running okay for the
> moment,
> >     but things might not be so good if your workload happens to touch
> that
> >     one inconsistent object.
> >
> >     This is a subjective thing, and sometimes users aren't so worried
> >     about inconsistency:
> >       - known-unreliable hardware, and are expecting to encounter
> periodic
> >     corruptions.
> >       - pools that are just for dev/test, where corruption is not an
> >     urgent issue
> >
> >     In those cases, they might need to do some external filtering of
> >     health checks, possibly down-grading the PG_DAMAGED check.
> >
> >      > And, since this is a size 3, min 2 pool... shouldn't this have
> been
> >      > taken care of automatically..? ('self-healing' and all that..?)
> >
> >     The good news is that there's an osd_scrub_auto_repair option
> (default
> >     is false).
> >
> >     I imagine there was probably some historical debate about whether
> that
> >     should be on by default, core RADOS folks probably know more.
> >
> >
> > In the past, "recovery" merely forced all the replicas into alignment
> > with the primary. If the primary was the bad copy...well, too bad!
> >
> > Things are much better now that we have checksums in various places and
> > take more care about it. But it's still possible to configure and use
> > Ceph so that we don't know what the right answer is, and these kinds of
> > issues really aren't supposed to turn up, so we don't yet feel
> > comfortable auto-repairing.
> > -Greg
> >
> >
> >     John
> >
> >
> >      > So, I'm having my morning coffee finally, wondering what
> >     happened... :-)
> >      >
> >      > Best regards to all, have a nice day!
> >      >
> >      > MJ
> >      > _______________________________________________
> >      > ceph-users mailing list
> >      > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >      > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >     _______________________________________________
> >     ceph-users mailing list
> >     ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com>
> >     http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
> >
>

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] HEALTH_ERR vs HEALTH_WARN

Reply via email to