On Tue, Feb 4, 2014 at 5:06 PM, Craig Lewis <cle...@centraldesktop.com> wrote: > > > On 2/4/14 14:43 , Yehuda Sadeh wrote: > > Now that objects are missing in the slave, how do I fix it? radosgw-agent > --sync-scope=full ? > > That would do it, yes. > > > I'm hesitant to do this, at least until I understand what's going on better. > I know something is wrong, but I don't know what is wrong. > I want to solve that before using a --sync-scope=full. Otherwise it'll just > happen again next time I start importing data. > > I'm going to shutdown replication cleanly, and leave it off. I'll import > enough objects that I hit > 1000 entries, then I'll start up replication with > --verbose. Then I'll check if all the imported objects exist in both > clusters. Repeat until I find missing objects in the slave cluster. > > > > > A shard was locked by the agent, but the agent never unlocked it > (maybe because you took it down?). The lock itself has a timeout, so > it's supposed to get released after a while, and then processing > should resume as usual. However, when it happens you can try playing > with the rados lock commands (rados lock list, rados lock info, rados > lock break) to release it (as long as there's no agent running that > has locked the shard). > > > The rados lock command requires an object name. I'll see if I can figure out > how to map "shard 36" to a rados object in the .rgw.buckets pool. > > Thanks! > > Does it ever catching up? You mentioned before that most of the writes > went to the same two buckets, so that's probably one of them. Note > that writes to the same bucket are being handled in-order by the > agent. > > Yehuda > > > ... I think so. This is what my graph looks like: > > > > Being able to answer that question is really what this graph is about. If > you have any suggestions for generic ways to answer that question, I'm open > to suggestions. If you'd like to see what I'm doing, take a look at > https://github.com/ceph/radosgw-agent/pull/7 > > Now that I've started seeing missing objects, I'm not able to download > objects that should be on the slave if replication is up to date. Either > it's not up to date, or it's skipping objects every pass. > > I'm trying to get the radosgw-agent --verbose output I mentioned above, but > this question is more fundamental. If I don't know if it's up to date or > not, looking for missing objects isn't going to do me any good. I'll work on > this now, and get back to the other experiment later.
You can run $ radosgw-admin bilog list --bucket=<bucket> --marker=<id> E.g., $ radosgw-admin bilog list --bucket=live-2 --marker=00000127871.328492.2 The entries there should have timestamp info. Yehuda _______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com