Hello, thanks for the reply.
We have around 200k objects in the bucket. It is not automatic resharded (is that even supported in multisite?) What i see when i run a complete data sync with the debug logs after a while i see alot of informations that it is unable to perform some log and also some device or resource busy (also with alot of different osds, restarting the osds also doesnt make this error going away): 018-06-29 15:18:30.391085 7f38bf882cc0 20 cr:s=0x55de55700b20:op=0x55de55717010:20RGWContinuousLeaseCR: couldn't lock amsterdam.rgw.log:datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.59:sync_lock: retcode=-16 2018-06-29 15:18:30.391094 7f38bf882cc0 20 cr:s=0x55de55732750:op=0x55de5572d970:20RGWContinuousLeaseCR: couldn't lock amsterdam.rgw.log:datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.10:sync_lock: retcode=-16 2018-06-29 15:22:01.618744 7f38ad4c7700 1 -- 10.30.3.67:0/3390890604 <== osd.43 10.30.3.44:6800/29982 13272 ==== osd_op_reply(258628 datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.52 [call] v14448'24265315 uv24265266 ondisk = -16 ((16) Device or resource busy)) v8 ==== 209+0+0 (2379682838 0 0) 0x7f38a8005110 con 0x7f3868003380 2018-06-29 15:22:01.618829 7f38ad4c7700 1 -- 10.30.3.67:0/3390890604 <== osd.43 10.30.3.44:6800/29982 13273 ==== osd_op_reply(258629 datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.105 [call] v14448'24265316 uv24265256 ondisk = -16 ((16) Device or resource busy)) v8 ==== 210+0+0 (4086289880 0 0) 0x7f38a8005110 con 0x7f3868003380 There are no issues with the OSDs all other stuff in the cluster works (rbd, images to openstack etc.) Also that command with appending debug never finishes. On Tue, Jun 26, 2018 at 5:45 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com> wrote: > > > On Sun, Jun 24, 2018 at 12:59 AM, Enrico Kern <enrico.k...@glispamedia.com > > wrote: > >> Hello, >> >> We have two ceph luminous clusters (12.2.5). >> >> recently one of our big buckets stopped syncing properly. We have a one >> specific bucket which is around 30TB in size consisting of alot of >> directories with each one having files of 10-20MB. >> >> The secondary zone is often completly missing multiple days of data in >> this bucket, while all other smaller buckets sync just fine. >> >> Even with the complete data missing radosgw-admin sync status always says >> everything is fine. >> >> the sync error log doesnt show anything for those days. >> >> Running >> >> radosgw-admin metadata sync and data sync also doesnt solve the issue. >> The only way of making it sync again is to disable and re-eanble the sync. >> That needs to be done as often as like 10 times in an hour to make it sync >> properly. >> >> radosgw-admin bucket sync disable >> radosgw-admin bucket sync enable >> >> when i run data init i sometimes get this: >> >> radosgw-admin data sync init --source-zone berlin >> 2018-06-24 07:55:46.337858 7fe7557fa700 0 ERROR: failed to distribute >> cache for >> amsterdam.rgw.log:datalog.sync-status.6a9448d2-bdba-4bec-aad6-aba72cd8eac6 >> >> Sometimes when really alot of data is missing (yesterday it was more then >> 1 month) this helps making them get in sync again when run on the secondary >> zone: >> >> radosgw-admin bucket check --fix --check-objects >> >> how can i debug that problem further? We have so many requests on the >> cluster that is is hard to dig something out of the log files.. >> >> Given all the smaller buckets are perfectly in sync i suspect some >> problem because of the size of the bucket. >> > > How many objects in the bucket? Is it getting automatically resharded? > > >> >> Any points to the right direction are greatly appreciated. >> > > A few things to look at that might help identify the issue. > > What does this show (I think the luminous command is as follows): > > $ radosgw-admin bucket sync status --source-zone=<zone> > > You can try manually syncing the bucket, and get specific logs for that > operation: > > $ radosgw-admin bucket sync run -source-zone=<zone> --debug-rgw=20 > --debug-ms=1 > > And you can try getting more info from the sync trace module: > > $ ceph --admin-daemon <path to radosgw admin socket> sync trace history > <bucket name> > > You can also try the 'sync trace show' command. > > > Yehuda > > > >> >> Regards, >> >> Enrico >> >> -- >> >> *Enrico Kern* >> VP IT Operations >> >> enrico.k...@glispa.com >> +49 (0) 30 555713017 / +49 (0) 152 26814501 >> skype: flyersa >> LinkedIn Profile <https://www.linkedin.com/in/enricokern> >> >> >> <http://goog_59398030/> <https://www.glispa.com/> >> >> *Glispa GmbH* | Berlin Office >> Sonnenburger Straße 73 >> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >> 10437 Berlin >> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >> | >> <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> >> Germany >> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >> >> Managing Director Din Karol-Gavish >> Registered in Berlin >> AG Charlottenburg | >> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >> HRB >> 114678B >> ––––––––––––––––––––––––––––– >> >> _______________________________________________ >> ceph-users mailing list >> ceph-users@lists.ceph.com >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> > -- *Enrico Kern* VP IT Operations enrico.k...@glispa.com +49 (0) 30 555713017 / +49 (0) 152 26814501 skype: flyersa LinkedIn Profile <https://www.linkedin.com/in/enricokern> <http://goog_59398030/> <https://www.glispa.com/> *Glispa GmbH* | Berlin Office Sonnenburger Straße 73 <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> 10437 Berlin <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> | <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> Germany <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> Managing Director Din Karol-Gavish Registered in Berlin AG Charlottenburg | <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> HRB 114678B –––––––––––––––––––––––––––––
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com