On 2018-09-10 11:04:20-07:00 Jason Dillaman wrote:

On Mon, Sep 10, 2018 at 1:35 PM <patrick.mcl...@sony.com> wrote:
> We utilize Ceph RBDs for our users' storage and need to keep data 
> synchronized across data centres. For this we rely on 'rbd export-diff / 
> import-diff'. Lately we have been noticing cases in which the file system on 
> the 'destination RBD' is corrupt. We have been trying to isolate the issue, 
> which may or may not be due to Ceph. We suspect the problem could be in 'rbd 
> export-diff / import-diff' and are wondering if people have been seeing 
> issues with these tools. Let me explain our use case and issue in more detail.
> We have a number of data centres each with a Ceph cluster storing tens of 
> thousands of RBDs. We maintain extra copies of each RBD in other data 
> centres. After we are 'done' using a RBD, we create a snapshot and use 'rbd 
> export-diff' to create a diff between the most recent 'common' snapshot at 
> the other data center. We send the data over the network, and use 'rbd 
> import-diff' on the destination. When we apply a diff to a destination RBD we 
> can guarantee its 'HEAD' is clean. Of course we guarantee that an RBD is only 
> used in one data centre at a time.
> We noticed corruption at the destination RBD based on fsck failures, further 
> investigation showed that checksums on the RBD mismatch as well. Somehow the 
> data is sometimes getting corrupted either by our software or 'rbd 
> export-diff / import-diff'. Our investigation suggests that the the problem 
> is in 'rbd export-diff/import-diff'. The main evidence of this is that 
> occasionally we sync an RBD between multiple data centres. Each sync is a 
> separate job with its own 'rbd export-diff'. We noticed that both destination 
> locations have the same corruption (and the same checksum) and the source is 
> healthy.

Any chance you are using OSD tiering on your RBD pool? The
export-diffs from a cache tier pool are almost guaranteed to be
corrupt if that's the case since the cache tier provides incorrect
object diff stats [1].

No, we are not using any OSD tiering in our pools.

> In addition to this, we are seeing a similar type of corruption in another 
> use case when we migrate RBDs and snapshots across pools. In this case we 
> clone a version of an RBD (e.g. HEAD-3) to a new pool and rely on 'rbd 
> export-diff/import-diff' to restore the last 3 snapshots on top. Here too we 
> see cases of fsck and RBD checksum failures.
> We maintain various metrics and logs. Looking back at our data we have seen 
> the issue at a small scale for a while on Jewel, but the frequency increased 
> recently. The timing may have coincided with a move to Luminous, but this may 
> be coincidence. We are currently on Ceph 12.2.5.
> We are wondering if people are experiencing similar issues with 'rbd 
> export-diff / import-diff'. I'm sure many people use it to keep backups in 
> sync. Since it is backups, many people may not inspect the data often. In our 
> use case, we use this mechanism to keep data in sync and actually need the 
> data in the other location often. We are wondering if anyone else has 
> encountered any issues, it's quite possible that many people may have this 
> issue, buts simply don't realize. We are likely hitting it much more 
> frequently due to the scale of our operation (tens of thousands of syncs a 
> day).

If you are able to recreate this reliably without tiering, it would
assist in debugging if you could capture RBD debug logs during the
export along w/ the LBA of the filesystem corruption to compare
against.


We haven't been able to reproduce this reliably as of yet, as of yet we haven't 
actually figured out the exact conditions that cause this to happen, we have 
just been seeing it happen on some percentage of export/import-diff operations.

We will investigate creating ways to create debug logs of the export 
operations, and capture LBAs of the filesystem corruption when it occurs.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to