*Craig Lewis*
Senior Systems Engineer
Office +1.714.602.1309
Email cle...@centraldesktop.com <mailto:cle...@centraldesktop.com>
*Central Desktop. Work together in ways you never thought possible.*
Connect with us Website <http://www.centraldesktop.com/> | Twitter
<http://www.twitter.com/centraldesktop> | Facebook
<http://www.facebook.com/CentralDesktop> | LinkedIn
<http://www.linkedin.com/groups?gid=147417> | Blog
<http://cdblog.centraldesktop.com/>
On 2/4/14 11:36 , Yehuda Sadeh wrote:
Also, verify whether any objects are missing. Start with just counting
the total number of objects in the buckets (radosgw-admin bucket stats
can give you that info).
Yehuda
Thanks, I didn't know about bucket stats.
bucket stats reports that the slave have fewer objects and kB than the
master.
Now that objects are missing in the slave, how do I fix it?
radosgw-agent --sync-scope=full ?
I figured out why replication went so quickly after the restart. I
missed an error in the radosgw-agent logs:
2014-02-04T08:16:28.936 14145:WARNING:radosgw_agent.worker:error locking
shard 36 log, skipping for now. Traceback:
Traceback (most recent call last):
File "/usr/lib/python2.7/dist-packages/radosgw_agent/worker.py", line
58, in lock_shard
self.lock.acquire()
File "/usr/lib/python2.7/dist-packages/radosgw_agent/lock.py", line
65, in acquire
self.zone_id, self.timeout, self.locker_id)
File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line
241, in lock_shard
expect_json=False)
File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line
155, in request
check_result_status(result)
File "/usr/lib/python2.7/dist-packages/radosgw_agent/client.py", line
116, in check_result_status
HttpError)(result.status_code, result.content)
HttpError: Http error code 423 content {"Code":"Locked"}
2014-02-04T08:16:28.939 12730:ERROR:radosgw_agent.sync:error syncing
shard 36
Full radosgw-agent.log, starting at restart:
https://cd.centraldesktop.com/p/eAAAAAAAC60_AAAAAAia_J0
I shutdown radosgw-agent, and restarted all radosgw daemons in the slave
cluster. Replication is proceeding again on shard 36, but I'm seeing
the same behavior. The slave is catching up much too quickly.
Before the stall:
root@ceph1c:/var/log/ceph# zegrep '(live-2:us-west-1|shard 36)'
radosgw-agent.us-west-1.us-central-1.log.1.gz | grep -v
'WARNING:radosgw_agent.sync:shard 36 log has fallen behind' | tail
2014-02-03T23:19:11.434 11783:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000115883.315938.2"
2014-02-03T23:24:51.246 11783:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:25:30.185 6419:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:25:46.826 6468:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000116882.316964.3"
2014-02-03T23:30:13.648 6468:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:30:50.132 29240:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:31:06.808 29390:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000117881.317984.2"
2014-02-03T23:38:56.830 29390:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:39:58.408 3744:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-03T23:40:15.049 3837:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000118880.319057.3"
After the radosgw and radosgw-agent restart (contained in the full logs
linked above):
root@ceph1c:/var/log/ceph# egrep '(live-2:us-west-1|shard 36)'
radosgw-agent.us-west-1.us-central-1.log | grep -v
'WARNING:radosgw_agent.sync:shard 36 log has fallen behind'
2014-02-04T08:15:58.966 14045:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T08:16:28.936 14145:WARNING:radosgw_agent.worker:error locking
shard 36 log, skipping for now. Traceback:
2014-02-04T08:16:28.939 12730:ERROR:radosgw_agent.sync:error syncing
shard 36
2014-02-04T08:23:50.318 15231:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T08:24:05.970 15288:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000118880.319057.3"
2014-02-04T08:42:20.351 15288:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T08:48:36.509 24250:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T08:48:53.145 24280:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000119879.320127.2"
2014-02-04T08:57:22.429 24280:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:03:35.292 23586:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:03:53.561 23744:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000120878.321183.3"
2014-02-04T09:14:36.249 23744:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:20:15.250 30093:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:20:31.925 30330:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000121877.322255.2"
2014-02-04T09:26:46.652 30330:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:32:57.308 20145:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:33:13.897 20215:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000122876.323275.3"
2014-02-04T09:43:05.327 20215:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:49:20.255 25443:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T09:49:35.869 25479:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000123875.324352.2"
2014-02-04T09:57:12.177 25479:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T10:03:55.676 23373:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T10:04:11.318 23450:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000124874.325371.3"
2014-02-04T10:10:00.548 23450:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:29:05.528 28131:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:29:36.329 28219:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000125873.326393.2"
2014-02-04T13:35:25.659 28219:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:40:56.360 14609:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:41:12.087 14679:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000126872.327440.3"
2014-02-04T13:48:23.826 14679:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:56:18.406 15364:INFO:radosgw_agent.worker:finished
processing shard 36
2014-02-04T13:56:34.125 15578:INFO:radosgw_agent.worker:bucket instance
"live-2:us-west-1.35026898.2" has 1000 entries after "00000127871.328492.2"
2014-02-04T14:05:30.358 15578:INFO:radosgw_agent.worker:finished
processing shard 36
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com