Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-09 Thread Bryan Stillwell
> On Apr 8, 2019, at 5:42 PM, Bryan Stillwell wrote: > > >> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: >> >> On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell >> wrote: >>> >>> There doesn't appear to be any correlation between the OSDs which would >>> point to a hardware issue, and s

Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Bryan Stillwell
> On Apr 8, 2019, at 4:38 PM, Gregory Farnum wrote: > > On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell wrote: >> >> There doesn't appear to be any correlation between the OSDs which would >> point to a hardware issue, and since it's happening on two different >> clusters I'm wondering if the

Re: [ceph-users] Inconsistent PGs caused by omap_digest mismatch

2019-04-08 Thread Gregory Farnum
On Mon, Apr 8, 2019 at 3:19 PM Bryan Stillwell wrote: > > We have two separate RGW clusters running Luminous (12.2.8) that have started > seeing an increase in PGs going active+clean+inconsistent with the reason > being caused by an omap_digest mismatch. Both clusters are using FileStore > and

Re: [ceph-users] Inconsistent PGs every few days

2018-08-07 Thread Konstantin Shalygin
Hi, I run a cluster with 7 OSD. The cluster has no much traffic on it. But every few days, I get a HEALTH_ERR, because of inconsistent PGs: root at Sam ~ # ceph status cluster: id: c4bfc288-8ba8-4c3a-b3a6-ed95503f50b7

Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread shrey chauhan
yes it is cache, moreover what are these whiteouts? and when does this mismatch occur. Thanks On Fri, Jun 1, 2018 at 3:51 PM, Brad Hubbard wrote: > On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan > wrote: > > Hi, > > > > I keep getting inconsistent placement groups and every time its the > > whi

Re: [ceph-users] inconsistent pgs :- stat mismatch in whiteouts

2018-06-01 Thread Brad Hubbard
On Fri, Jun 1, 2018 at 6:41 PM, shrey chauhan wrote: > Hi, > > I keep getting inconsistent placement groups and every time its the > whiteout. > > > cluster [ERR] 9.f repair stat mismatch, got 1563/1563 objects, 0/0 clones, > 1551/1551 dirty, 78/78 omap, 0/0 pinned, 12/12 hit_set_archive, 0/-9 > w

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-08-08 Thread Lincoln Bryant
Hi all, Apologies for necromancing an old thread, but I was wondering if anyone had any more thoughts on this. We're running v10.2.9 now and still have 3 PGs exhibiting this behavior in our cache pool after scrubs, deep-scrubs, and repair attempts. Some more information below. Thanks much, Li

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-07-03 Thread Rhian Resnick
.0222 [image] <https://hpc.fau.edu/wp-content/uploads/2015/01/image.jpg> From: Gregory Farnum Sent: Monday, July 3, 2017 11:49 AM To: Rhian Resnick Cc: ceph-users Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi On Mon, Jul 3, 2017 at 6:0

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-07-03 Thread Gregory Farnum
Resnick > > Assistant Director Middleware and HPC > > Office of Information Technology > > > Florida Atlantic University > > 777 Glades Road, CM22, Rm 173B > > Boca Raton, FL 33431 > > Phone 561.297.2647 > > Fax 561.297.0222 > > > > > > > From: c

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-07-03 Thread Rhian Resnick
eil, Sage Cc: ceph-users Subject: Re: [ceph-users] Inconsistent pgs with size_mismatch_oi On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant mailto:linco...@uchicago.edu>> wrote: Hi Greg, Curiously, some of these scrub errors went away on their own. The example pg in the original post is no

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Gregory Farnum
On Mon, May 15, 2017 at 3:19 PM Lincoln Bryant wrote: > Hi Greg, > > Curiously, some of these scrub errors went away on their own. The example > pg in the original post is now active+clean, and nothing interesting in the > logs: > > # zgrep "36.277b" ceph-osd.244*gz > ceph-osd.244.log-20170510.gz

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Lincoln Bryant
Hi Greg, Curiously, some of these scrub errors went away on their own. The example pg in the original post is now active+clean, and nothing interesting in the logs: # zgrep "36.277b" ceph-osd.244*gz ceph-osd.244.log-20170510.gz:2017-05-09 06:56:40.739855 7f0184623700 0 log_channel(cluster) log

Re: [ceph-users] Inconsistent pgs with size_mismatch_oi

2017-05-15 Thread Gregory Farnum
On Mon, May 1, 2017 at 9:28 AM, Lincoln Bryant wrote: > Hi all, > > I’ve run across a peculiar issue on 10.2.7. On my 3x replicated cache tiering > cache pool, routine scrubbing suddenly found a bunch of PGs with > size_mismatch_oi errors. From the “rados list-inconsistent-pg tool”[1], I see >

Re: [ceph-users] Inconsistent PGs

2016-06-22 Thread Paweł Sadowski
Query on that PGs hangs forever. We ended up using *ceph-objectstore-tool**mark-complete* on those PGs. On 06/22/2016 11:45 AM, 施柏安 wrote: > Hi, > You can use command 'ceph pg query' to check what's going on with the > pgs which have problem and use "ceph-objectstore-tool" to recover that pg. > >

Re: [ceph-users] Inconsistent PGs

2016-06-22 Thread 施柏安
Hi, You can use command 'ceph pg query' to check what's going on with the pgs which have problem and use "ceph-objectstore-tool" to recover that pg. 2016-06-21 19:09 GMT+08:00 Paweł Sadowski : > Already restarted those OSD and then whole cluster (rack by rack, > failure domain is rack in this set

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Thanks for response. All OSDs seems to be ok, they have been restarted, joined cluster after that, nothing weird in the logs. # ceph pg dump_stuck stale ok # ceph pg dump_stuck inactive ok pg_statstateupup_primaryactingacting_primary 3.2929incomplete[109,272,83]10

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread M Ranga Swami Reddy
you can use the below cmds: == ceph pg dump_stuck stale ceph pg dump_stuck inactive ceph pg dump_stuck unclean === And the query the PG, which are in unclean or stale state, check for any issue with a specific OSD. Thanks Swami On Tue, Jun 21, 2016 at 3:02 PM, Paweł Sadowski wrote: > Hello, >

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread Paweł Sadowski
Already restarted those OSD and then whole cluster (rack by rack, failure domain is rack in this setup). We would like to try *ceph-objectstore-tool mark-complete* operation. Is there any way (other than checking mtime on file and querying PGs) to determine which replica has most up to date datas?

Re: [ceph-users] Inconsistent PGs

2016-06-21 Thread M Ranga Swami Reddy
Try to restart OSD 109 and 166? check if it help? On Tue, Jun 21, 2016 at 4:05 PM, Paweł Sadowski wrote: > Thanks for response. > > All OSDs seems to be ok, they have been restarted, joined cluster after > that, nothing weird in the logs. > > # ceph pg dump_stuck stale > ok > > # ceph pg dump_st

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-09-08 Thread Shinobu Kinjo
That's a good news. Shinobu - Original Message - From: "Sage Weil" To: "Andras Pataki" Cc: ceph-users@lists.ceph.com, ceph-de...@vger.kernel.org Sent: Wednesday, September 9, 2015 3:07:29 AM Subject: Re: [ceph-users] Inconsistent PGs that ceph pg repair does no

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-09-08 Thread Andras Pataki
Cool, thanks! Andras From: Sage Weil Sent: Tuesday, September 8, 2015 2:07 PM To: Andras Pataki Cc: Samuel Just; ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org Subject: Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix On Tue, 8

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-09-08 Thread Sage Weil
pm-centos7-x86_64-basic/ref/hammer/ (or similar, adjust URL for your distro). sage > > Thanks, > > Andras > > > > From: Andras Pataki > Sent: Monday, August 3, 2015 4:09 PM > To: Samuel Just > Cc: ceph-users@lists.ceph.com; ceph-de.

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-09-08 Thread Andras Pataki
"ceph version 0.94.3 (95cefea9fd9ab740263bf8bb4796fd864d9afe2b)" Could you have another look? Thanks, Andras From: Andras Pataki Sent: Monday, August 3, 2015 4:09 PM To: Samuel Just Cc: ceph-users@lists.ceph.com; ceph-de...@vger.kernel.org Subject: Re: [ceph-users]

Re: [ceph-users] inconsistent pgs

2015-08-11 Thread Jan Schermer
Ouch - been there too. Now the question becomes: Which copy is the right one? And a slightly related question - how many of you look at BER rate when selecting drives? Do the math, it's pretty horrible when you know you have one bad sector for every ~11.5TB of data (if you use desktop-class dri

Re: [ceph-users] inconsistent pgs

2015-08-10 Thread Константин Сахинов
Igor, Jan, David, thanks for your help. The problem was in bad memory chips. Tested it with http://www.memtest.org/ and found several red results. пт, 7 авг. 2015 г. в 13:44, Константин Сахинов : > Hi! > > I have a large number of inconsistent pgs 229 of 656, and it's increasing > every hour. > I

Re: [ceph-users] inconsistent pgs

2015-08-09 Thread Константин Сахинов
Just more info about my config. Maybe I have to change default ruleset from "step chooseleaf firstn 0 type host" to "step chooseleaf firstn 0 type chasis"? # ceph osd tree ID WEIGHT TYPE NAMEUP/DOWN REWEIGHT PRIMARY-AFFINITY -3 0.91998 root ssd -6 0.45999 host block1 2 0.

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
When I changed crush from root-host-osd to root-chasis-host-osd, did I've to change default ruleset? I didn't changed it. It looks like this: rule replicated_ruleset { ruleset 0 type replicated min_size 1 max_size 10 step take default step chooseleaf firstn 0 type host step emit } п

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
It's hard to say now. I changed one-by-one my 6 OSDs from btrfs to xfs. During the repair process I added 2 more OSDs. Changed crush map from root-host-osd to root-*chasis*-host-osd structure... There was SSD cache tiering set, when first inconsistency showed up. Then I removed tiering to confirm t

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
"you use XFS on your OSDs?" This OSD was formatted in BTRFS as a whole block device /dev/sdc (with no partition table). Then I moved from BTRFS to XFS /dev/sdc1 (with partition table), because BTRFS was v-v-very slow. Maybe partprober sees some old signatures from first sectors of that disk... By

Re: [ceph-users] inconsistent pgs

2015-08-07 Thread Константин Сахинов
No, dmesg is clean on both hosts of osd.1 (block0) and osd.7 (block2). There are only boot time messages (listings below). If there is cable or SATA-controller issues, will it be shown in /var/log/dmesg? block0 dmesg: [5.296302] XFS (sdb1): Mounting V4 Filesystem [5.316487] XFS (sda1): Mo

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Andras Pataki
Done: http://tracker.ceph.com/issues/12577 BTW, I¹m using the latest release 0.94.2 on all machines. Andras On 8/3/15, 3:38 PM, "Samuel Just" wrote: >Hrm, that's certainly supposed to work. Can you make a bug? Be sure >to note what version you are running (output of ceph-osd -v). >-Sam > >On

Re: [ceph-users] Inconsistent PGs that ceph pg repair does not fix

2015-08-03 Thread Samuel Just
Hrm, that's certainly supposed to work. Can you make a bug? Be sure to note what version you are running (output of ceph-osd -v). -Sam On Mon, Aug 3, 2015 at 12:34 PM, Andras Pataki wrote: > Summary: I am having problems with inconsistent PG's that the 'ceph pg > repair' command does not fix.

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
On Mon, Jul 7, 2014 at 4:39 PM, James Harper wrote: >> >> You can look at which OSDs the PGs map to. If the PGs have >> insufficient replica counts they'll report as degraded in "ceph -s" or >> "ceph -w". > > I meant in a general sense. If I have a pg that I suspect might be > insufficiently redu

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread James Harper
> > You can look at which OSDs the PGs map to. If the PGs have > insufficient replica counts they'll report as degraded in "ceph -s" or > "ceph -w". I meant in a general sense. If I have a pg that I suspect might be insufficiently redundant I can look that up, but I'd like to know in advance an

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
You can look at which OSDs the PGs map to. If the PGs have insufficient replica counts they'll report as degraded in "ceph -s" or "ceph -w". Software Engineer #42 @ http://inktank.com | http://ceph.com On Mon, Jul 7, 2014 at 4:30 PM, James Harper wrote: >> >> It sounds like maybe you've got a ba

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread James Harper
> > It sounds like maybe you've got a bad CRUSH map if you're seeing that. > One of the things the tunables do is make the algorithm handle a > variety of maps better, but if PGs are only mapping to one OSD you > need to fix that. > How can I tell that this is definitely the case (all copies of

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
On Mon, Jul 7, 2014 at 4:21 PM, James Harper wrote: >> >> Okay. Based on your description I think the reason for the tunables >> crashes is that either the "out" OSDs, or possibly one of the >> monitors, never got restarted. You should be able to update the >> tunables now, if you want to. (Or the

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread James Harper
> > Okay. Based on your description I think the reason for the tunables > crashes is that either the "out" OSDs, or possibly one of the > monitors, never got restarted. You should be able to update the > tunables now, if you want to. (Or there's also a config option that > will disable the warning

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
Okay. Based on your description I think the reason for the tunables crashes is that either the "out" OSDs, or possibly one of the monitors, never got restarted. You should be able to update the tunables now, if you want to. (Or there's also a config option that will disable the warning; check the r

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread James Harper
> > What was the exact sequence of events > Exact sequence of events was: . set retiring node OSD's out . noticed that mds's were now stuck in 'rejoining' state . messed around with restarting mds's but couldn't fix . google told me that upgrading ceph resolved such a problem for them . upgraded

Re: [ceph-users] inconsistent pgs

2014-07-07 Thread Gregory Farnum
What was the exact sequence of events — were you rebalancing when you did the upgrade? Did the marked out OSDs get upgraded? Did you restart all the monitors prior to changing the tunables? (Are you *sure*?) -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Sat, Jul 5, 2014 at

Re: [ceph-users] inconsistent pgs

2014-07-05 Thread James Harper
> > I have 4 physical boxes each running 2 OSD's. I needed to retire one so I set > the 2 OSD's on it to 'out' and everything went as expected. Then I noticed > that 'ceph health' was reporting that my crush map had legacy tunables. The > release notes told me I needed to do 'ceph osd crush tunabl

Re: [ceph-users] Inconsistent pgs after update to 0.73 - > 0.74

2014-01-13 Thread Mark Kirkwood
On 10/01/14 17:16, Mark Kirkwood wrote: On 10/01/14 16:18, David Zafman wrote: With pool size of 1 the scrub can still do some consistency checking. These are things like missing attributes, on-disk size doesn’t match attribute size, non-clone without a head, expected clone. You could check

Re: [ceph-users] Inconsistent pgs after update to 0.73 - > 0.74

2014-01-09 Thread Mark Kirkwood
On 10/01/14 16:18, David Zafman wrote: With pool size of 1 the scrub can still do some consistency checking. These are things like missing attributes, on-disk size doesn’t match attribute size, non-clone without a head, expected clone. You could check the osd logs to see what they were. The

Re: [ceph-users] Inconsistent pgs after update to 0.73 - > 0.74

2014-01-09 Thread David Zafman
With pool size of 1 the scrub can still do some consistency checking. These are things like missing attributes, on-disk size doesn’t match attribute size, non-clone without a head, expected clone. You could check the osd logs to see what they were. The pg below only had 1 object in error and