The PG with the unfound object has been in active+recovering+degraded state for much longer than usual. Most PGs spend about 20 minutes in that state, then complete. This one has been in this in active+recovering+degraded for about 4 hours now. 11.483 8851 1 8852 1 7974255906 3082 3082 active+recovering+degraded 2014-04-07 10:31:53.146930 13421'1242575 13855:1647415 [3,13] [3,13] 7936'1019031 2014-03-24 00:53:42.265828 7936'1019031 2014-03-24 00:53:42.265828

Is this because it can't find the unfound object? Or is this because I set osd flag noout and nodown?

So far it's not a big deal. There's plenty of other backfilling and recovery that needs to happen. It just seems strange to me.


*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>

On 4/7/14 14:38 , Craig Lewis wrote:
Ceph is telling me that it can't find some data:
2014-04-07 11:15:09.901992 mon.0 [INF] pgmap v5436846: 2592 pgs: 2164 active+clean, 142 active+remapped+wait_backfill, 150 active+degraded+wait_backfill, 1 active+recovering+degraded, 2 active+degraded+backfilling, 133 active+degraded+remapped+wait_backfill; 15094 GB data, 28749 GB used, 30839 GB / 59588 GB avail; 3496837/37879443 objects degraded (9.231%); *1/18361235 unfound (0.000%)*; 25900 kB/s, 26 objects/s recovering

querying all the PGs tells me that 11.483 has 1 missing object, named .dir.us-west-1.51941060.1.

pg query says the recovery state is:
          "might_have_unfound": [
                { "osd": 11,
                  "status": "querying"},
                { "osd": 13,
                  "status": "already probed"}],

Active OSDs for this PG are [3,13], so osd.13 is the 2ndry for this PG. osd.11 does not have the data. I recently replaced osd.11, and this data was unfound before the drive swap. So it looks like I have no choice but to use mark_unfound_lost.


I have some concerns though. Pool 11 is .rgw.buckets. I assume from the object's name, .dir.us-west-1 is related to replication. us-west-1 is the master zone, and these errors are occuring in the slave zone (us-central-1).

What are the risks of using ceph pg {pgid} mark_unfound_lost revert on that particular object? I'm comfortable losing objects in the slave, I can re-upload them to the master zone. I just want to make sure I'm not going to render the slave zone unusable.



Thanks for the help.





--

*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter <http://www.twitter.com/centraldesktop> | Facebook <http://www.facebook.com/CentralDesktop> | LinkedIn <http://www.linkedin.com/groups?gid=147417> | Blog <http://cdblog.centraldesktop.com/>



_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to