Hi Enrico, On Fri, Jun 29, 2018 at 7:50 PM Enrico Kern <enrico.k...@glispa.com> wrote:
> hmm that also pops up right away when i restart all radosgw instances. But > i will check further and see if i can find something. Maybe doing the > upgrade to mimic too. > > That bucket is basically under load on the master zone all the time as we > use it as historical storage for druid, so there is constantly data written > to it. I just dont get why disabling/enabling sync on the bucket flawless > syncs everything while if i just keep it enabled it stops syncing at all. > For the last days i was just running disabling/enabling for the bucket in a > while loop with 30 minute interval, but thats no persistent fix ;) > > Are you using Haproxy? we have seen sync stales with it. The simplest work around is to configure the radosgw's addresses as the sync endpoints not the haproxy's. Regards, Orit > On Fri, Jun 29, 2018 at 6:15 PM Yehuda Sadeh-Weinraub <yeh...@redhat.com> > wrote: > >> >> >> On Fri, Jun 29, 2018 at 8:48 AM, Enrico Kern <enrico.k...@glispa.com> >> wrote: >> >>> also when i try to sync the bucket manual i get this error: >>> >>> ERROR: sync.run() returned ret=-16 >>> 2018-06-29 15:47:50.137268 7f54b7e4ecc0 0 data sync: ERROR: failed to >>> read sync status for >>> bucketname:6a9448d2-bdba-4bec-aad6-aba72cd8eac6.27150814.1 >>> >>> it works flawless with all other buckets. >>> >> >> error 16 is EBUSY: meaning it can't take a lease to do work on the >> bucket. This usually happens when another entity (e.g., a running radosgw >> process) is working on it at the same time. Either something took the lease >> and never gave it back (leases shouldn't be indefinite, usually are being >> taken for a short period but are renewed periodically), or there might be >> some other bug related to the lease itself. I would start by first figuring >> out whether it's the first case or the second one. On the messenger log >> there should be a message prior to that that shows the operation that got >> the -16 as a response (should have something like "...=-16 (Device or >> resource busy)" in it). The same line would also contain the name of the >> rados object that is used to manage the lease. Try to look at the running >> radosgw log at the same time when this happens, and check whether there are >> other operations on that object. >> One thing to note is that if you run a sync on a bucket and stop it >> uncleanly in the middle (e.g., like killing the process), the leak will >> stay locked for a period of time (Something in the order of 1 to 2 minutes). >> >> Yehuda >> >>> >>> >>> On Fri, Jun 29, 2018 at 5:39 PM Enrico Kern <enrico.k...@glispa.com> >>> wrote: >>> >>>> Hello, >>>> >>>> thanks for the reply. >>>> >>>> We have around 200k objects in the bucket. It is not automatic >>>> resharded (is that even supported in multisite?) >>>> >>>> What i see when i run a complete data sync with the debug logs after a >>>> while i see alot of informations that it is unable to perform some log and >>>> also some device or resource busy (also with alot of different osds, >>>> restarting the osds also doesnt make this error going away): >>>> >>>> >>>> 018-06-29 15:18:30.391085 7f38bf882cc0 20 >>>> cr:s=0x55de55700b20:op=0x55de55717010:20RGWContinuousLeaseCR: couldn't lock >>>> amsterdam.rgw.log:datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.59:sync_lock: >>>> retcode=-16 >>>> >>>> 2018-06-29 15:18:30.391094 7f38bf882cc0 20 >>>> cr:s=0x55de55732750:op=0x55de5572d970:20RGWContinuousLeaseCR: couldn't lock >>>> amsterdam.rgw.log:datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.10:sync_lock: >>>> retcode=-16 >>>> >>>> 2018-06-29 15:22:01.618744 7f38ad4c7700 1 -- 10.30.3.67:0/3390890604 >>>> <== osd.43 10.30.3.44:6800/29982 13272 ==== osd_op_reply(258628 >>>> datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.52 [call] >>>> v14448'24265315 uv24265266 ondisk = -16 ((16) Device or resource busy)) v8 >>>> ==== 209+0+0 (2379682838 0 0) 0x7f38a8005110 con 0x7f3868003380 >>>> >>>> 2018-06-29 15:22:01.618829 7f38ad4c7700 1 -- 10.30.3.67:0/3390890604 >>>> <== osd.43 10.30.3.44:6800/29982 13273 ==== osd_op_reply(258629 >>>> datalog.sync-status.shard.6a9448d2-bdba-4bec-aad6-aba72cd8eac6.105 [call] >>>> v14448'24265316 uv24265256 ondisk = -16 ((16) Device or resource busy)) v8 >>>> ==== 210+0+0 (4086289880 0 0) 0x7f38a8005110 con 0x7f3868003380 >>>> >>>> >>>> There are no issues with the OSDs all other stuff in the cluster works >>>> (rbd, images to openstack etc.) >>>> >>>> >>>> Also that command with appending debug never finishes. >>>> >>>> On Tue, Jun 26, 2018 at 5:45 PM Yehuda Sadeh-Weinraub < >>>> yeh...@redhat.com> wrote: >>>> >>>>> >>>>> >>>>> On Sun, Jun 24, 2018 at 12:59 AM, Enrico Kern < >>>>> enrico.k...@glispamedia.com> wrote: >>>>> >>>>>> Hello, >>>>>> >>>>>> We have two ceph luminous clusters (12.2.5). >>>>>> >>>>>> recently one of our big buckets stopped syncing properly. We have a >>>>>> one specific bucket which is around 30TB in size consisting of alot of >>>>>> directories with each one having files of 10-20MB. >>>>>> >>>>>> The secondary zone is often completly missing multiple days of data >>>>>> in this bucket, while all other smaller buckets sync just fine. >>>>>> >>>>>> Even with the complete data missing radosgw-admin sync status always >>>>>> says everything is fine. >>>>>> >>>>>> the sync error log doesnt show anything for those days. >>>>>> >>>>>> Running >>>>>> >>>>>> radosgw-admin metadata sync and data sync also doesnt solve the >>>>>> issue. The only way of making it sync again is to disable and re-eanble >>>>>> the >>>>>> sync. That needs to be done as often as like 10 times in an hour to make >>>>>> it >>>>>> sync properly. >>>>>> >>>>>> radosgw-admin bucket sync disable >>>>>> radosgw-admin bucket sync enable >>>>>> >>>>>> when i run data init i sometimes get this: >>>>>> >>>>>> radosgw-admin data sync init --source-zone berlin >>>>>> 2018-06-24 07:55:46.337858 7fe7557fa700 0 ERROR: failed to >>>>>> distribute cache for >>>>>> amsterdam.rgw.log:datalog.sync-status.6a9448d2-bdba-4bec-aad6-aba72cd8eac6 >>>>>> >>>>>> Sometimes when really alot of data is missing (yesterday it was more >>>>>> then 1 month) this helps making them get in sync again when run on the >>>>>> secondary zone: >>>>>> >>>>>> radosgw-admin bucket check --fix --check-objects >>>>>> >>>>>> how can i debug that problem further? We have so many requests on the >>>>>> cluster that is is hard to dig something out of the log files.. >>>>>> >>>>>> Given all the smaller buckets are perfectly in sync i suspect some >>>>>> problem because of the size of the bucket. >>>>>> >>>>> >>>>> How many objects in the bucket? Is it getting automatically resharded? >>>>> >>>>> >>>>>> >>>>>> Any points to the right direction are greatly appreciated. >>>>>> >>>>> >>>>> A few things to look at that might help identify the issue. >>>>> >>>>> What does this show (I think the luminous command is as follows): >>>>> >>>>> $ radosgw-admin bucket sync status --source-zone=<zone> >>>>> >>>>> You can try manually syncing the bucket, and get specific logs for >>>>> that operation: >>>>> >>>>> $ radosgw-admin bucket sync run -source-zone=<zone> --debug-rgw=20 >>>>> --debug-ms=1 >>>>> >>>>> And you can try getting more info from the sync trace module: >>>>> >>>>> $ ceph --admin-daemon <path to radosgw admin socket> sync trace >>>>> history <bucket name> >>>>> >>>>> You can also try the 'sync trace show' command. >>>>> >>>>> >>>>> Yehuda >>>>> >>>>> >>>>> >>>>>> >>>>>> Regards, >>>>>> >>>>>> Enrico >>>>>> >>>>>> -- >>>>>> >>>>>> *Enrico Kern* >>>>>> VP IT Operations >>>>>> >>>>>> enrico.k...@glispa.com >>>>>> +49 (0) 30 555713017 / +49 (0) 152 26814501 >>>>>> skype: flyersa >>>>>> LinkedIn Profile <https://www.linkedin.com/in/enricokern> >>>>>> >>>>>> >>>>>> <http://goog_59398030/> <https://www.glispa.com/> >>>>>> >>>>>> *Glispa GmbH* | Berlin Office >>>>>> Sonnenburger Straße 73 >>>>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>>>> 10437 Berlin >>>>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>>>> | >>>>>> <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> >>>>>> Germany >>>>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>>>> >>>>>> Managing Director Din Karol-Gavish >>>>>> Registered in Berlin >>>>>> AG Charlottenburg | >>>>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>>>> HRB >>>>>> 114678B >>>>>> ––––––––––––––––––––––––––––– >>>>>> >>>>>> _______________________________________________ >>>>>> ceph-users mailing list >>>>>> ceph-users@lists.ceph.com >>>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>>>> >>>>>> >>>>> >>>> >>>> -- >>>> >>>> *Enrico Kern* >>>> VP IT Operations >>>> >>>> enrico.k...@glispa.com >>>> +49 (0) 30 555713017 / +49 (0) 152 26814501 >>>> skype: flyersa >>>> LinkedIn Profile <https://www.linkedin.com/in/enricokern> >>>> >>>> >>>> <http://goog_59398030/> <https://www.glispa.com/> >>>> >>>> *Glispa GmbH* | >>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0+%7C+%C2%A0Germany&entry=gmail&source=g> >>>> Berlin >>>> Office >>>> Sonnenburger Straße 73 >>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>> 10437 Berlin >>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>> | >>>> <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> >>>> Germany >>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>> >>>> Managing Director Din Karol-Gavish >>>> Registered in Berlin >>>> AG Charlottenburg | >>>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>>> HRB >>>> 114678B >>>> ––––––––––––––––––––––––––––– >>>> >>> >>> >>> -- >>> >>> *Enrico Kern* >>> VP IT Operations >>> >>> enrico.k...@glispa.com >>> +49 (0) 30 555713017 / +49 (0) 152 26814501 >>> skype: flyersa >>> LinkedIn Profile <https://www.linkedin.com/in/enricokern> >>> >>> >>> <http://goog_59398030/> <https://www.glispa.com/> >>> >>> *Glispa GmbH* | Berlin Office >>> Sonnenburger Straße 73 >>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>> 10437 Berlin >>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>> | >>> <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> >>> Germany >>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>> >>> Managing Director Din Karol-Gavish >>> Registered in Berlin >>> AG Charlottenburg | >>> <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> >>> HRB >>> 114678B >>> ––––––––––––––––––––––––––––– >>> >> >> > > -- > > *Enrico Kern* > VP IT Operations > > enrico.k...@glispa.com > +49 (0) 30 555713017 / +49 (0) 152 26814501 > skype: flyersa > LinkedIn Profile <https://www.linkedin.com/in/enricokern> > > > <http://goog_59398030/> <https://www.glispa.com/> > > *Glispa GmbH* | Berlin Office > Sonnenburger Straße 73 > <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> > 10437 Berlin > <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> > | > <https://maps.google.com/?q=Sonnenburgerstra%C3%9Fe+73+10437+Berlin%C2%A0%7C%C2%A0Germany&entry=gmail&source=g> > Germany > <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> > > Managing Director Din Karol-Gavish > Registered in Berlin > AG Charlottenburg | > <https://maps.google.com/?q=Sonnenburger+Stra%C3%9Fe+73+10437+Berlin%C2%A0%7C+%3Chttps://maps.google.com/?q%3DSonnenburgerstra%25C3%259Fe%2B73%2B10437%2BBerlin%25C2%25A0%257C%25C2%25A0Germany%26entry%3Dgmail%26source%3Dg%3E%C2%A0Germany&entry=gmail&source=g> > HRB > 114678B > ––––––––––––––––––––––––––––– > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com