We identified a potential rgw data loss situation on versioned bucket in
multisite settings. Please see the tracker for details:
https://tracker.ceph.com/issues/68466.
This is affecting Reef, Squid and main. Earlier versions have not been tested
though.
DoutPrefixProvider*,
rgw::cls::fifo::Completion::Ptr&&, int):1858 trim
failed: r=-5 tid=14844
...
2) Secondary site
...
2022-10-02T23:15:50.279-0400 7f679a2ce700 1 req 16201632253829371026
0.00102s op->ERRORHANDLER: err_no=-2002 new_err_no=-2002
...
We did a bucket sync run
RRORHANDLER: err_no=-2002 new_err_no=-2002
...
We did a bucket sync run on a broken bucket, but nothing happened and the
bucket still didn't sync.
$ sudo radosgw-admin bucket sync run
--bucket=jjm-4hr-test-1k-thisisbcstestload0011007
--source-zone=dev-zone-bcc-secondary
From: jane.dev.
We have encountered replication issues in our multisite settings with
Quincy v17.2.3.
Our Ceph clusters are brand new. We tore down our clusters and re-deployed
fresh Quincy ones before we did our test.
In our environment, we have 3 RGW nodes per site, each node has 2 instances
for client traffic