Re: [ceph-users] RGW Replication

Yehuda Sadeh Tue, 04 Feb 2014 11:37:10 -0800

On Tue, Feb 4, 2014 at 10:07 AM, Craig Lewis <cle...@centraldesktop.com> wrote:
>
>
>
> On 2/3/14 14:34 , Craig Lewis wrote:
>
>
> On 2/3/14 10:51 , Gregory Farnum wrote:
>
> On Mon, Feb 3, 2014 at 10:43 AM, Craig Lewis <cle...@centraldesktop.com> 
> wrote:
>
> I've been noticing somethings strange with my RGW federation.  I added some
> statistics to radosgw-agent to try and get some insight
> (https://github.com/ceph/radosgw-agent/pull/7), but that just showed me that
> I don't understand how replication works.
>
> When PUT traffic was relatively slow to the master zone, replication had no
> issues keeping up.  Now I'm trying to cause replication to fall behind, by
> deliberately exceeding the amount of bandwidth between the two zones
> (they're in different datacenters).  Instead of falling behind, both the
> radosgw-agent logs and the stats I added say that slave zone is keeping up.
>
> But some of the numbers don't add up.  I'm not using enough bandwidth
> between the two facilities, and I'm not using enough disk space in the slave
> zone.  The disk usage in the slave zone continues to fall further and
> further behind the master.  Despite this, I'm always able to download
> objects from both zones.
>
>
> How does radosgw-agent actually replicate metadata and data?  Does
> radosgw-agent actually copy all the bytes, or does it create placeholders in
> the slave zone?  If radosgw-agent is creating placeholders in the slave
> zone, and radosgw populates the placeholder in the background, then that
> would explain the behavior I'm seeing.  If this is true, how can I tell if
> replication is keeping up?
>
> Are you overwriting the same objects? Replication copies over the
> "present" version of an object, not all the versions which have ever
> existed. Similarly, the slave zone doesn't keep all the
> (garbage-collected) logs that the master zone has to, so those factors
> would be one way to get differing disk counts.
> -Greg
> Software Engineer #42 @ http://inktank.com | http://ceph.com
>
>
>
> Before I started this import, the master zone was using 3.54TB (raw), and the 
> slave zone was using 3.42 TB (raw).  I did overwrite some objects, and the 
> 120GB is plausible for overwrites.
>
> I haven't deleted anything yet, so the only garbage collection would be 
> overwritten objects.  Right?
>
>
> I imported 1.93TB of data.  Replication is currently 2x, so that's 3.86TB 
> (raw).  Now the master is using 7.48TB (raw), and the slave is using 4.89TB 
> (raw).  The master zone looks correct, but the slave zone is missing 2.59TB 
> (raw).  That's 66% of my imported data.
>
> The 33% of data the slave does have is in line with the amount of bandwidth I 
> see between the two facilities.  I see an increase of ~150 Mbps when the 
> import is running on the master, and ~50 Mbps on the slave.
>
>
>
> Just to verify that I'm not over writing objects, I checked the apache logs.  
> Since I started the import, there have been 1328542 PUTs (including normal 
> site traffic).  1301511 of those are unique.  I'll investigate the 27031 
> duplicates, but the dups are only 34GB.  Not nearly enough to account for the 
> discrepancy.
>
>
> From your answer, I'll assume there are no placeholders involved.  If 
> radosgw-agent says we're up to date, the data should exist in the slave zone.
>
> Now I'm really confused.
>
>
> Craig Lewis
> Senior Systems Engineer
> Office +1.714.602.1309
> Email cle...@centraldesktop.com
>
> Central Desktop. Work together in ways you never thought possible.
> Connect with us   Website  |  Twitter  |  Facebook  |  LinkedIn  |  Blog
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@lists.ceph.com
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
>
>
>
> Here's another example.  At 23:40 PST, radosgw-agent stalled.  This is 
> something that's been happening, but I haven't dug into it yet.  Ignore that 
> for now.  The point is that replication now has a backlog of 106000 PUTs 
> requests, roughly 125GB.
>
> A graph using the --stats patch I submitted to radosgw-agent:
>
>
> X axis is time in UTC.
> Left axises is (to use pseudo code) sum( length( DataSyncer.get_log_enteries( 
> *allshards))).  This is the scale for the green line.
> The right axis is the delta between now and the oldest entry's timestamp from 
> all shards.  This is the scale for the blue area.
>
> I only have 2 buckets that are actively being written to.  Replication 
> entries cap at 1000 entries per shard, so that's why the green line levels 
> off at 2000.
>
>
> I restarted replication at 08:15 PDT:
>
>
>
> Both this graph and the radosgw-agent logs say that it took about 1h15m to 
> catch up.  If true, that would require a transfer rate (uncompressed) of 
> about 225 Mbps.  My inter-datacenter bandwidth graphs show a peak of 50 Mbps, 
> and an average of 20 Mbps.  This link is capped at 100 Mbps.
>
> During the 8h30m that replication was stalled,  the master zone's raw cluster 
> storage went from 7.99TB to 8.34TB.  The import accounts for 250GB of that 
> 350GB, the remaining 100GB is site traffic.  During the stall, the slave's 
> raw cluster storage was steady at 5.07TB.  After the replication restart, the 
> slave's cluster storage went from 5.07TB to 5.09TB.  This 20 GB increase is 
> consistent with the amount of inter-datacenter bandwidth used (20 Mbps * 
> 1h15m = 10GB transferred).
>
>
> I'm going to stop replication, and verify checksums on those 106000 objects 
> in the slave.  I've been spot checking, but I've never validated all of the 
> objects.
>
>



Also, verify whether any objects are missing. Start with just counting
the total number of objects in the buckets (radosgw-admin bucket stats
can give you that info).

Yehuda
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] RGW Replication

Reply via email to