The PG with the unfound object has been in active+recovering+degraded
state for much longer than usual. Most PGs spend about 20 minutes in
that state, then complete. This one has been in this in
active+recovering+degraded for about 4 hours now.
11.483 8851 1 8852 1 7974255906 3082 3082
active+recovering+degraded 2014-04-07 10:31:53.146930
13421'1242575 13855:1647415 [3,13] [3,13] 7936'1019031
2014-03-24 00:53:42.265828 7936'1019031 2014-03-24 00:53:42.265828
Is this because it can't find the unfound object? Or is this because I
set osd flag noout and nodown?
So far it's not a big deal. There's plenty of other backfilling and
recovery that needs to happen. It just seems strange to me.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 4/7/14 14:38 , Craig Lewis wrote:
Ceph is telling me that it can't find some data:
2014-04-07 11:15:09.901992 mon.0 [INF] pgmap v5436846: 2592 pgs: 2164
active+clean, 142 active+remapped+wait_backfill, 150
active+degraded+wait_backfill, 1 active+recovering+degraded, 2
active+degraded+backfilling, 133
active+degraded+remapped+wait_backfill; 15094 GB data, 28749 GB used,
30839 GB / 59588 GB avail; 3496837/37879443 objects degraded (9.231%);
*1/18361235 unfound (0.000%)*; 25900 kB/s, 26 objects/s recovering
querying all the PGs tells me that 11.483 has 1 missing object, named
.dir.us-west-1.51941060.1.
pg query says the recovery state is:
"might_have_unfound": [
{ "osd": 11,
"status": "querying"},
{ "osd": 13,
"status": "already probed"}],
Active OSDs for this PG are [3,13], so osd.13 is the 2ndry for this
PG. osd.11 does not have the data. I recently replaced osd.11, and
this data was unfound before the drive swap. So it looks like I have
no choice but to use mark_unfound_lost.
I have some concerns though. Pool 11 is .rgw.buckets. I assume from
the object's name, .dir.us-west-1 is related to replication. us-west-1
is the master zone, and these errors are occuring in the slave zone
(us-central-1).
What are the risks of using ceph pg {pgid} mark_unfound_lost revert on
that particular object? I'm comfortable losing objects in the slave,
I can re-upload them to the master zone. I just want to make sure I'm
not going to render the slave zone unusable.
Thanks for the help.
--
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com