On 2018-09-12 17:35:16-07:00 Jason Dillaman wrote:

Any chance you know the LBA or byte offset of the corruption so I can
compare it against the log?

The LBAs of the corruption are 0xA74F000 through 175435776



On Wed, Sep 12, 2018 at 8:32 PM <patrick.mcl...@sony.com> wrote:
>
> Hi Jason,
>
> On 2018-09-10 11:15:45-07:00 ceph-users wrote:
>
> On 2018-09-10 11:04:20-07:00 Jason Dillaman wrote:
>
>
> > In addition to this, we are seeing a similar type of 
corruption in another use case when we migrate RBDs and snapshots across pools. 
In this case we clone a version of an RBD (e.g. HEAD-3) to a new pool and rely 
on 'rbd export-diff/import-diff' to restore the last 3 snapshots on top. Here 
too we see cases of fsck and RBD checksum failures.
> > We maintain various metrics and logs. Looking back at our 
data we have seen the issue at a small scale for a while on Jewel, but the 
frequency increased recently. The timing may have coincided with a move to 
Luminous, but this may be coincidence. We are currently on Ceph 12.2.5.
> > We are wondering if people are experiencing similar 
issues with 'rbd export-diff / import-diff'. I'm sure many people use it to 
keep backups in sync. Since it is backups, many people may not inspect the data 
often. In our use case, we use this mechanism to keep data in sync and actually 
need the data in the other location often. We are wondering if anyone else has 
encountered any issues, it's quite possible that many people may have this 
issue, buts simply don't realize. We are likely hitting it much more frequently 
due to the scale of our operation (tens of thousands of syncs a day).
>
> If you are able to recreate this reliably without tiering, it would
> assist in debugging if you could capture RBD debug logs during the
> export along w/ the LBA of the filesystem corruption to compare
> against.
>
> We haven't been able to reproduce this reliably as of yet, as of yet 
we haven't actually figured out the exact conditions that cause this to happen, 
we have just been seeing it happen on some percentage of export/import-diff 
operations.
>
>
> Logs from both export-diff and import-diff in a case where the result 
gets corrupted are attached. Please let me know if you need any more 
information.
>



--
Jason
</patrick.mcl...@sony.com>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to