Re: [ceph-users] 12.2.6 CRC errors

Stefan Schneebeli Mon, 16 Jul 2018 11:28:21 -0700

hello guys,

unfortunately I missed the warning on friday and upgraded my cluster onsaturday to 12.2.6.The cluster is in a migration state from filestore to bluestore (10/2)and I get constantly inconsistent PG's only on the two bluestore OSD's.If I run a rados list-inconsistent-obj 2.17 --format=json-pretty forexample I see at the end this mismatches:


            "shards": [
                {
                    "osd": 0,
                    "primary": true,
                    "errors": [],
                    "size": 4194304,
                    "omap_digest": "0xffffffff"
                },
                {
                    "osd": 1,
                    "primary": false,
                    "errors": [
                        "data_digest_mismatch_info"
                    ],
                    "size": 4194304,
                    "omap_digest": "0xffffffff",
                    "data_digest": "0x21b21973"

Is this the issue you talking about ?

I can repair this PG's wth ceph pg repair and it reports the error isfixed.

But is it really fixed?
Do I have to be afraid to have now corrupted data?
Would it be an option to noout this bluestore OSD's and stop them?
When do you expect the new 12.2.7 Release? Will it fix all the errors?

Thank you in advance for your answers!

Stefan





------ Originalnachricht ------
Von: "Sage Weil" <s...@newdream.net>
An: "Glen Baars" <g...@onsitecomputers.com.au>
Cc: "ceph-users@lists.ceph.com" <ceph-users@lists.ceph.com>
Gesendet: 14.07.2018 19:15:57
Betreff: Re: [ceph-users] 12.2.6 CRC errors

On Sat, 14 Jul 2018, Glen Baars wrote:

Hello Ceph users!

Note to users, don't install new servers on Friday the 13th!

We added a new ceph node on Friday and it has received the latest12.2.6update. I started to see CRC errors and investigated hardware issues.I

have since found that it is caused by the 12.2.6 release. About 80TB
copied onto this server.

I have set noout,noscrub,nodeepscrub and repaired the affected PGs (
ceph pg repair ) . This has cleared the errors.

***** no idea if this is a good way to fix the issue. From the bug
report this issue is in the deepscrub and therefore I suppose stopping
it will limit the issues. ***

Can anyone tell me what to do? Downgrade seems that it won't fix the
issue. Maybe remove this node and rebuild with 12.2.5 and resync data?
Wait a few days for 12.2.7?


I would sit tight for now.  I'm working on the right fix and hope to
having something to test shortly, and possibly a release by tomorrow.

There is a remaining danger is that for the objects with badfull-object

digests, that a read of the entire object will throw an EIO.  It's up
to you whether you want to try to quiesce workloads to avoid that (to
prevent corruption at higher layers) or avoid a service

degradation/outage. :( Unfortunately I don't have super preciseguidance

as far as how likely that is.

Are you using bluestore only, or is it a mix of bluestore andfilestore?


sage


_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] 12.2.6 CRC errors

Reply via email to