Hi folks,

We are trying multisite sync policy feature with Quincy release and we 
encounter something strange. Maybe our understanding of sync policy is 
incorrect. I hope the community could help us uncover the mystery.

Our test setup is very simple. I use mstart.sh to spin up 3 clusters, configure 
them with a single realm "world", a single zonegroup "zg" and 3 zones – z0, z1, 
z2, with z0 being the master. I created a zonegroup-level sync policy with 
status “allowed”, a symmetrical flow among all 3 zones and a pipe allowing all 
zones to all zones. I created a single bucket “test-bucket” at z0 and uploaded 
a single object to it. By now, there should be no sync since the policy status 
is “allowed” only and I can see the single file only exist in z0 and “bucket 
sync status” shows the sync is actually disabled. Finally, I created a 
bucket-level sync policy with status “enabled” and a pipe between z0 and z1 
only. I expected that sync should be kicked off between z0 and z1 and I did see 
from “sync info” that there are sources/dests being z0/z1. “bucket sync status” 
also shows the source zone and source bucket. At z0, it shows everything is 
caught up but at z1 it shows one shard is behind for data sync, which is 
expected since that only object exists in z0 but not in z1.
 
Now, here comes the strange part. Although z1 shows there is one shard behind, 
it doesn’t seem to make any progress on syncing it. It doesn’t seem to do any 
full sync at all since “bucket sync status” shows “full sync: 0/11 shards”. 
There hasn’t been any full sync since otherwise, z1 should have that only 
object. It is stuck in this condition forever until I make another upload on 
the same object. I suspect the update of the object triggers a new data log, 
which triggers the sync. My questions are:

1. With bucket-specific sync policy, why wasn’t there a full sync and how can 
one force a full sync?
2. Does the sync policy only work datalogs, without which no action it may 
take? I wonder if it would explain why it didn't sync even though it knows it 
is behind?

BTW, I also tried “sync error list” and they are all empty. I also tried to 
apply the fix in https://tracker.ceph.com/issues/57853, although I am not sure 
if it is relevant. The fix didn’t change the behavior that we observed.

Thanks in advance,
Yixin
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to