Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-21 Thread Shain Miley
Hi, Thank you for providing me this level of detail. I ended up just failing the drive since it is still under support and we had in fact gotten emails about the health of this drive in the past. I will however use this in the future if we have an issue with a pg and it is the first time we h

Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-19 Thread Mehmet
Hi Shain, what i would do: take the osd.32 out # systemctl stop ceph-osd@32 # ceph osd out osd.32 this will cause rebalancing. to repair/reuse the drive you can do: # smartctl -t long /dev/sdX This will start a long self-test on the drive and - i bet - abort this after a while with somethin

Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-17 Thread Shain Miley
Brian, Never mind...looking back though some older emails I do see an indication of a problem with that drive. I will fail out the osd and replace the drive. Thanks again for the help, Shian On 03/17/2017 03:08 PM, Shain Miley wrote: This sender failed our fraud detection checks

Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-17 Thread Shain Miley
Brian, Thank you for the detailed information. I was able to compare the 3 hexdump files and it looks like the primary pg is the odd man out. I stopped the OSD and then I attempted to move the object: root@hqosd3:/var/lib/ceph/osd/ceph-32/current/3.2b8_head/DIR_8/DIR_B/DIR_2/DIR_A/DIR_0# mv

Re: [ceph-users] active+clean+inconsistent and pg repair

2017-03-17 Thread Brian Andrus
We went through a period of time where we were experiencing these daily... cd to the PG directory on each OSD and do a find for "238e1f29.0076024c" (mentioned in your error message). This will likely return a file that has a slash in the name, something like rbd\udata. 238e1f29.0076024c_he

[ceph-users] active+clean+inconsistent and pg repair

2017-03-17 Thread Shain Miley
Hello, Ceph status is showing: 1 pgs inconsistent 1 scrub errors 1 active+clean+inconsistent I located the error messages in the logfile after querying the pg in question: root@hqosd3:/var/log/ceph# zgrep -Hn 'ERR' ceph-osd.32.log.1.gz ceph-osd.32.log.1.gz:846:2017-03-17 02:25:20.281608 7f7