On Tue, Feb 4, 2014 at 5:06 PM, Craig Lewis <cle...@centraldesktop.com> wrote:
>
>
> On 2/4/14 14:43 , Yehuda Sadeh wrote:
>
> Now that objects are missing in the slave, how do I fix it?  radosgw-agent
> --sync-scope=full ?
>
> That would do it, yes.
>
>
> I'm hesitant to do this, at least until I understand what's going on better.  
> I know something is wrong, but I don't know what is wrong.
> I want to solve that before using a --sync-scope=full.  Otherwise it'll just 
> happen again next time I start importing data.
>
> I'm going to shutdown replication cleanly, and leave it off.  I'll import 
> enough objects that I hit > 1000 entries, then I'll start up replication with 
> --verbose.  Then I'll check if all the imported objects exist in both 
> clusters.  Repeat until I find missing objects in the slave cluster.
>
>
>
>
> A shard was locked by the agent, but the agent never unlocked it
> (maybe because you took it down?).  The lock itself has a timeout, so
> it's supposed to get released after a while, and then processing
> should resume as usual. However, when it happens you can try playing
> with the rados lock commands (rados lock list, rados lock info, rados
> lock break) to release it (as long as there's no agent running that
> has locked the shard).
>
>
> The rados lock command requires an object name.  I'll see if I can figure out 
> how to map "shard 36" to a rados object in the .rgw.buckets pool.
>
> Thanks!
>
> Does it ever catching up? You mentioned before that most of the writes
> went to the same two buckets, so that's probably one of them. Note
> that writes to the same bucket are being handled in-order by the
> agent.
>
> Yehuda
>
>
> ... I think so.  This is what my graph looks like:
>
>
>
> Being able to answer that question is really what this graph is about.  If 
> you have any suggestions for generic ways to answer that question, I'm open 
> to suggestions.  If you'd like to see what I'm doing, take a look at 
> https://github.com/ceph/radosgw-agent/pull/7
>
> Now that I've started seeing missing objects, I'm not able to download 
> objects that should be on the slave if replication is up to date.  Either 
> it's not up to date, or it's skipping objects every pass.
>
> I'm trying to get the radosgw-agent --verbose output I mentioned above, but 
> this question is more fundamental.  If I don't know if it's up to date or 
> not, looking for missing objects isn't going to do me any good.  I'll work on 
> this now, and get back to the other experiment later.


You can run

$ radosgw-admin bilog list --bucket=<bucket> --marker=<id>

E.g.,

$ radosgw-admin bilog list --bucket=live-2 --marker=00000127871.328492.2

The entries there should have timestamp info.

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to