Re: [ceph-users] scrub errors on rgw data pool

2019-11-29 Thread M Ranga Swami Reddy
Primary OSD crashes with below assert: 12.2.11/src/osd/ReplicatedBackend.cc:1445 assert(peer_missing.count( fromshard)) == here I have 2 OSDs with bluestore backend and 1 osd with filestore backend. On Mon, Nov 25, 2019 at 3:34 PM M Ranga Swami Reddy wrote: > Hello - We are using the ceph 12.2.1

Re: [ceph-users] scrub errors on rgw data pool

2019-11-25 Thread M Ranga Swami Reddy
Thanks for reply Have you migrated all filestore OSDs from filestore backend to bluestore backend? Or Have you upgraded from Luminious 12.2.11 to 14.x? What helped here? On Tue, Nov 26, 2019 at 8:03 AM Fyodor Ustinov wrote: > Hi! > > I had similar errors in pools on SSD until I upgraded to nau

Re: [ceph-users] scrub errors on rgw data pool

2019-11-25 Thread Fyodor Ustinov
Hi! I had similar errors in pools on SSD until I upgraded to nautilus (clean bluestore installation) - Original Message - > From: "M Ranga Swami Reddy" > To: "ceph-users" , "ceph-devel" > > Sent: Monday, 25 November, 2019 12:04:46 > Subject: [ceph-users] scrub errors on rgw data pool

Re: [ceph-users] scrub errors

2019-03-28 Thread Brad Hubbard
On Fri, Mar 29, 2019 at 7:54 AM solarflow99 wrote: > > ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of > backfill mode but still not sure if it'll fix anything. pg 10.2a still shows > state active+clean+inconsistent. Peer 8 is now > remapped+inconsistent+peering

Re: [ceph-users] scrub errors

2019-03-28 Thread solarflow99
ok, I tried doing ceph osd out on each of the 4 OSDs 1 by 1. I got it out of backfill mode but still not sure if it'll fix anything. pg 10.2a still shows state active+clean+inconsistent. Peer 8 is now remapped+inconsistent+peering, and the other peer is active+clean+inconsistent On Wed, Mar 2

Re: [ceph-users] scrub errors

2019-03-27 Thread Brad Hubbard
On Thu, Mar 28, 2019 at 8:33 AM solarflow99 wrote: > > yes, but nothing seems to happen. I don't understand why it lists OSDs 7 in > the "recovery_state": when i'm only using 3 replicas and it seems to use > 41,38,8 Well, osd 8s state is listed as "active+undersized+degraded+remapped+wait_bac

Re: [ceph-users] scrub errors

2019-03-27 Thread solarflow99
yes, but nothing seems to happen. I don't understand why it lists OSDs 7 in the "recovery_state": when i'm only using 3 replicas and it seems to use 41,38,8 # ceph health detail HEALTH_ERR 1 pgs inconsistent; 47 scrub errors pg 10.2a is active+clean+inconsistent, acting [41,38,8] 47 scrub errors

Re: [ceph-users] scrub errors

2019-03-26 Thread Brad Hubbard
http://docs.ceph.com/docs/hammer/rados/troubleshooting/troubleshooting-pg/ Did you try repairing the pg? On Tue, Mar 26, 2019 at 9:08 AM solarflow99 wrote: > > yes, I know its old. I intend to have it replaced but thats a few months > away and was hoping to get past this. the other OSDs appe

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
yes, I know its old. I intend to have it replaced but thats a few months away and was hoping to get past this. the other OSDs appear to be ok, I see them up and in, why do you see something wrong? On Mon, Mar 25, 2019 at 4:00 PM Brad Hubbard wrote: > Hammer is no longer supported. > > What's t

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard
Hammer is no longer supported. What's the status of osds 7 and 17? On Tue, Mar 26, 2019 at 8:56 AM solarflow99 wrote: > > hi, thanks. Its still using Hammer. Here's the output from the pg query, > the last command you gave doesn't work at all but be too old. > > > # ceph pg 10.2a query > { >

Re: [ceph-users] scrub errors

2019-03-25 Thread solarflow99
hi, thanks. Its still using Hammer. Here's the output from the pg query, the last command you gave doesn't work at all but be too old. # ceph pg 10.2a query { "state": "active+clean+inconsistent", "snap_trimq": "[]", "epoch": 23265, "up": [ 41, 38, 8

Re: [ceph-users] scrub errors

2019-03-25 Thread Brad Hubbard
It would help to know what version you are running but, to begin with, could you post the output of the following? $ sudo ceph pg 10.2a query $ sudo rados list-inconsistent-obj 10.2a --format=json-pretty Also, have a read of http://docs.ceph.com/docs/mimic/rados/troubleshooting/troubleshooting-pg

Re: [ceph-users] scrub errors

2018-10-23 Thread Sergey Malinin
There is an osd_scrub_auto_repair setting which defaults to 'false'. > On 23.10.2018, at 12:12, Dominque Roux wrote: > > Hi all, > > We lately faced several scrub errors. > All of them were more or less easily fixed with the ceph pg repair X.Y > command. > > We're using ceph version 12.2.7 an

Re: [ceph-users] Scrub Errors

2016-05-06 Thread Blade
Oliver Dzombic writes: > > Hi Blade, > > you can try to set the min_size to 1, to get it back online, and if/when > the error vanish ( maybe after another repair command ) you can set the > min_size again to 2. > > you can try to simply out/down/?remove? the osd where it is on. > Hi Oliver

Re: [ceph-users] Scrub Errors

2016-05-04 Thread Oliver Dzombic
Hi Blade, you can try to set the min_size to 1, to get it back online, and if/when the error vanish ( maybe after another repair command ) you can set the min_size again to 2. you can try to simply out/down/?remove? the osd where it is on. -- Mit freundlichen Gruessen / Best regards Oliver Dz

Re: [ceph-users] Scrub Errors

2016-05-04 Thread Blade Doyle
When I issue the "ceph pg repair 1.32" command I *do* see it reported in the "ceph -w" output but I *do not* see any new messages about page 1.32 in the log of osd.6 - even if I turn debug messages way up. # ceph pg repair 1.32 instructing pg 1.32 on osd.6 to repair (ceph -w shows) 2016-05-04 11:

Re: [ceph-users] Scrub Errors

2016-05-03 Thread Oliver Dzombic
Hi Blade, if you dont see anything in the logs, then you should raise the debug level/frequency. You must at least see, that the repair command has been issued ( started ). Also i am wondering about the [6] from your output. That means, that there is only 1 copy of it ( on osd.6 ). What is yo

Re: [ceph-users] Scrub Errors

2016-05-03 Thread Blade Doyle
Hi Oliver, Thanks for your reply. The problem could have been caused by crashing/flapping OSD's. The cluster is stable now, but lots of pg problems remain. $ ceph health HEALTH_ERR 4 pgs degraded; 158 pgs inconsistent; 4 pgs stuck degraded; 1 pgs stuck inactive; 10 pgs stuck unclean; 4 pgs stuck

Re: [ceph-users] Scrub Errors

2016-04-30 Thread Oliver Dzombic
Hi, please check with ceph health which pg's cause trouble. Please try: ceph pg repair 4.97 And look if it can be resolved. If not, please paste the corresponding log. That repair can take some time... -- Mit freundlichen Gruessen / Best regards Oliver Dzombic IP-Interactive mailto:i...

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Gregory Farnum
It's just because the PG hadn't been scrubbed since the error occurred; then you upgraded, it scrubbed, and the error was found. You can deep-scrub all your PGs to check them if you like, but as I've said elsewhere this issue -- while scary! -- shouldn't actually damage any of your user data, so ju

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Randy Smith
Greg, This error occurred AFTER the upgrade. I upgraded to 0.80.4 last night and this error cropped up this afternoon. I ran `ceph pg repair 3.7f` (after I copied the pgs) which returned the cluster to health. However, I'm concerned that this showed up again so soon after I upgraded to 0.80.4. Is

Re: [ceph-users] scrub errors continue with 0.80.4

2014-07-18 Thread Gregory Farnum
The config option change in the upgrade will prevent *new* scrub errors from occurring, but it won't resolve existing ones. You'll need to run a scrub repair to fix those up. -Greg Software Engineer #42 @ http://inktank.com | http://ceph.com On Fri, Jul 18, 2014 at 2:59 PM, Randy Smith wrote: >