Re: [ceph-users] RGW Replication

Craig Lewis Tue, 18 Mar 2014 16:44:43 -0700

For the record, I have one bucket in my slave zone that caught up to themaster zone. I stopped adding new data to my first bucket, andreplication stopped. I started tickling the bucket by uploading anddeleting a 0 byte file every 5 minutes. Now the slave has all of thefiles in that bucket.


I didn't need to use --sync-scope=full .

I'm still importing faster than I can replicate, but I know how to dealwith it now. The master zone has nearly completed it's import. Oncethat happens, replication should be able to catch up in a couple ofweeks, and stay caught up.




Thanks for all the help!





*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*

Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>


On 2/7/14 11:38 , Craig Lewis wrote:

I have confirmed this in production, with the default max-entries.
I have a bucket that I'm no longer writing to. Radosgw-agent hadstopped replicating this bucket. radosgw-admin bucket stats showsthat the slave is missing ~600k objects.
I uploaded a 1 byte file to the bucket. On the next pass,radosgw-agent replicated 1000 entries.
I'm uploading and deleting the same file every 5 minutes. I'm usingmore inter-colo bandwidth now. This bucket is catching up, slowly.
For now, I'm going to graph the delta of the total number of objectsin both clusters. If the slave is higher, it's catching up. If it'slower, it's falling behind.
*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>

*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter<http://www.twitter.com/centraldesktop> | Facebook<http://www.facebook.com/CentralDesktop> | LinkedIn<http://www.linkedin.com/groups?gid=147417> | Blog<http://cdblog.centraldesktop.com/>
On 2/6/14 18:32 , Craig Lewis wrote:
On 2/4/14 17:06 , Craig Lewis wrote:
Now that I've started seeing missing objects, I'm not able todownload objects that should be on the slave if replication is up todate. Either it's not up to date, or it's skipping objects every pass.
Using my --max-entries fix(https://github.com/ceph/radosgw-agent/pull/8), I think I see what'shappening.
Shut down replication
Upload 6 objects to an empty bucket on the master:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test2.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test3.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test4.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test5.jpg
None show on the slave, because replication is down.
Start radosgw-agent --max-entries=2 (1 doesn't seem to replicateanything)
Check contents of slave after pass #1:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg
Check contents of slave after pass #10:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg
Leave replication running
Upload 1 object, test6.jpg, to the master.  Check the master:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test2.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test3.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test4.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test5.jpg2014-02-07 02:06 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test6.jpg
Check contents of slave after next pass:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg
Upload another file, test7.jpg, to the master:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test2.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test3.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test4.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test5.jpg2014-02-07 02:06 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test6.jpg2014-02-07 02:08 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test7.jpg
The slave doesn't get it this time:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg
Upload another file, test8.jpg, to the master:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test2.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test3.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test4.jpg2014-02-07 02:03 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test5.jpg2014-02-07 02:06 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test6.jpg2014-02-07 02:08 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test7.jpg2014-02-07 02:10 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test8.jpg
The slave gets the 3rd file:
2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test0.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test1.jpg2014-02-07 02:02 10k dc5674336e2212a0819b7abcb811e323s3://bucket1/test2.jpg
So I think the problem is caused by the shard marker being set to thecurrent marker after every pass, even if the bucket replication capson max-entries.
Updating the shard marker by uploading a file causes another pass onthe bucket, and the bucket marker is being tracked correctly.
I would prefer to track the shard marker better, but I don't see anyway to get the last shard marker given the last bucket entry. If Itrack the shard marker correctly, then the stats I'm generating arestill somewhat useful (if incomplete). I'll be able to see whenreplication falls behind because the graphs keep growing.
The alternative is to change the bucket sync so that it loops untilit's replicated everything up to the shard marker. In this case,I'll be able to see that replication is falling behind because eachpass takes longer and longer to complete.
What do you guys think?
Either way, I believe all my data is waiting to be replicated. I justneed to fix this issue, and upload another object to every bucketthat's behind.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Replication

Reply via email to