On Thu, Aug 23, 2018 at 12:26 AM mj <li...@merit.unu.edu> wrote: > Hi, > > Thanks John and Gregory for your answers. > > Gregory's answer worries us. We thought that with a 3/2 pool, and one PG > corrupted, the assumption would be: the two similar ones are correct, > and the third one needs to be adjusted. > > Can we determine from this output, if I created corruption in our > cluster..? >
No, we can't tell. It's not very likely, though, and given that this is an rbd pool I'd bet the most likely inconsistency was a lost journal write which is long-since unneeded by the VM anyway. If it's a concern, you could try tracking down the rbd image (it's got the internal ID 2c191e238e1f29; I'm not sure what the command is to turn that into the front-end name) and running an fsck and any available application data scrubs. In future I believe the most reliable way on Jewel is to simply go look at the objects and do the vote-counting yourself. Later versions of Ceph include more checksumming output etc to make it easier and to more reliably identify the broken copy to begin with. -Greg > > > root@pm1:~# ceph health detail > > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors > > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6] > > 1 scrub errors > > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log* > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15 > 10.10.89.1:6812/3810 2122 : cluster [INF] 2.1a9 deep-scrub starts > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15 > 10.10.89.1:6812/3810 2123 : cluster [INF] 2.1a9 deep-scrub ok > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15 > 10.10.89.1:6800/3352 18074 : cluster [INF] 2.1a9 deep-scrub starts > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15 > 10.10.89.1:6800/3352 18075 : cluster [ERR] 2.1a9 shard 23: soid > 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a > read error > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15 > 10.10.89.1:6800/3352 18076 : cluster [ERR] 2.1a9 deep-scrub 0 missing, 1 > inconsistent objects > > > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15 > 10.10.89.1:6800/3352 18088 : cluster [INF] 2.1a9 repair starts > > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15 > 10.10.89.1:6800/3352 18089 : cluster [ERR] 2.1a9 shard 23: soid > 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate had a > read error > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15 > 10.10.89.1:6800/3352 18090 : cluster [ERR] 2.1a9 repair 0 missing, 1 > inconsistent objects > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15 > 10.10.89.1:6800/3352 18091 : cluster [ERR] 2.1a9 repair 1 errors, 1 fixed > > And also: jewel (which we're running) is considered "the old past" with > the old non-checksum behaviour? > > In case this occurs again... what would be the steps to determine WHICH > pg is the corrupt one, and how to proceed it it happens to be the > primary pg for an object? > > Upgrading to luminous would prevent this from happening again i guess. > We're a bit scared to upgrade, because there seem to be so many issues > with luminous and upgrading to it. > > Having said all this: we are surprised to see this is on our cluster, as > it should be and has been running stable and reliably for over two > years. Perhaps just a one-time glitch. > > Thanks for your replies! > > MJ > > On 08/23/2018 01:06 AM, Gregory Farnum wrote: > > On Wed, Aug 22, 2018 at 2:46 AM John Spray <jsp...@redhat.com > > <mailto:jsp...@redhat.com>> wrote: > > > > On Wed, Aug 22, 2018 at 7:57 AM mj <li...@merit.unu.edu > > <mailto:li...@merit.unu.edu>> wrote: > > > > > > Hi, > > > > > > This morning I woke up, seeing my ceph jewel 10.2.10 cluster in > > > HEALTH_ERR state. That helps you getting out of bed. :-) > > > > > > Anyway, much to my surprise, all VMs running on the cluster were > > still > > > working like nothing was going on. :-) > > > > > > Checking a bit more reveiled: > > > > > > > root@pm1:~# ceph -s > > > > cluster 1397f1dc-7d94-43ea-ab12-8f8792eee9c1 > > > > health HEALTH_ERR > > > > 1 pgs inconsistent > > > > 1 scrub errors > > > > monmap e3: 3 mons at > > {0=10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0 > > <http://10.10.89.1:6789/0,1=10.10.89.2:6789/0,2=10.10.89.3:6789/0>} > > > > election epoch 296, quorum 0,1,2 0,1,2 > > > > osdmap e12662: 24 osds: 24 up, 24 in > > > > flags sortbitwise,require_jewel_osds > > > > pgmap v64045618: 1088 pgs, 2 pools, 14023 GB data, 3680 > > kobjects > > > > 44027 GB used, 45353 GB / 89380 GB avail > > > > 1087 active+clean > > > > 1 active+clean+inconsistent > > > > client io 26462 kB/s rd, 14048 kB/s wr, 6 op/s rd, 383 op/s wr > > > > root@pm1:~# ceph health detail > > > > HEALTH_ERR 1 pgs inconsistent; 1 scrub errors > > > > pg 2.1a9 is active+clean+inconsistent, acting [15,23,6] > > > > 1 scrub errors > > > > root@pm1:~# zgrep 2.1a9 /var/log/ceph/ceph.log* > > > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:02:24.755778 osd.15 > > 10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2122 : cluster > > [INF] 2.1a9 deep-scrub starts > > > > /var/log/ceph/ceph.log.14.gz:2017-09-11 21:08:10.537249 osd.15 > > 10.10.89.1:6812/3810 <http://10.10.89.1:6812/3810> 2123 : cluster > > [INF] 2.1a9 deep-scrub ok > > > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:33:21.156004 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18074 : cluster > > [INF] 2.1a9 deep-scrub starts > > > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:40:02.579204 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18075 : cluster > > [ERR] 2.1a9 shard 23: soid > > 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate > > had a read error > > > > /var/log/ceph/ceph.log.1.gz:2018-08-22 04:41:02.720716 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18076 : cluster > > [ERR] 2.1a9 deep-scrub 0 missing, 1 inconsistent objects > > > > > > ok, according to the docs I should do "ceph pg repair 2.1a9". Did > > that, > > > and some minutes later the cluster came back to "HEALTH_OK" > > > > > > Checking the logs: > > > > /var/log/ceph/ceph.log:2018-08-22 08:23:09.682792 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18088 : cluster > > [INF] 2.1a9 repair starts > > > > /var/log/ceph/ceph.log:2018-08-22 08:29:28.440526 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18089 : cluster > > [ERR] 2.1a9 shard 23: soid > > 2:95b8d975:::rbd_data.2c191e238e1f29.00000000000c7c9d:head candidate > > had a read error > > > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.790176 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18090 : cluster > > [ERR] 2.1a9 repair 0 missing, 1 inconsistent objects > > > > /var/log/ceph/ceph.log:2018-08-22 08:30:18.791718 osd.15 > > 10.10.89.1:6800/3352 <http://10.10.89.1:6800/3352> 18091 : cluster > > [ERR] 2.1a9 repair 1 errors, 1 fixed > > > > > > So, we are fine again, it seems. > > > > > > But now my question: can anyone what happened? Is one of my disks > > dying? > > > In the proxmox gui, all osd disks are SMART status "OK". > > > > > > Besides that, as the cluster was still running and the fix was > > > relatively simple, would a HEALTH_WARN not have been more > > appropriate? > > > > An inconsistent PG generally implies data corruption, which is > usually > > pretty scary. Your cluster may have been running okay for the > moment, > > but things might not be so good if your workload happens to touch > that > > one inconsistent object. > > > > This is a subjective thing, and sometimes users aren't so worried > > about inconsistency: > > - known-unreliable hardware, and are expecting to encounter > periodic > > corruptions. > > - pools that are just for dev/test, where corruption is not an > > urgent issue > > > > In those cases, they might need to do some external filtering of > > health checks, possibly down-grading the PG_DAMAGED check. > > > > > And, since this is a size 3, min 2 pool... shouldn't this have > been > > > taken care of automatically..? ('self-healing' and all that..?) > > > > The good news is that there's an osd_scrub_auto_repair option > (default > > is false). > > > > I imagine there was probably some historical debate about whether > that > > should be on by default, core RADOS folks probably know more. > > > > > > In the past, "recovery" merely forced all the replicas into alignment > > with the primary. If the primary was the bad copy...well, too bad! > > > > Things are much better now that we have checksums in various places and > > take more care about it. But it's still possible to configure and use > > Ceph so that we don't know what the right answer is, and these kinds of > > issues really aren't supposed to turn up, so we don't yet feel > > comfortable auto-repairing. > > -Greg > > > > > > John > > > > > > > So, I'm having my morning coffee finally, wondering what > > happened... :-) > > > > > > Best regards to all, have a nice day! > > > > > > MJ > > > _______________________________________________ > > > ceph-users mailing list > > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > _______________________________________________ > > ceph-users mailing list > > ceph-users@lists.ceph.com <mailto:ceph-users@lists.ceph.com> > > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com