[ceph-users] RGW access logs with bucket name

2023-01-03 Thread Boris Behrens
Hi, I am looking forward to move our logs from /var/log/ceph/ceph-client...log to our logaggregator. Is there a way to have the bucket name in the log file? Or can I write the rgw_enable_ops_log into a file? Maybe I could work with this. Cheers and happy new year Boris _

[ceph-users] Re: [EXTERNAL] Re: S3 Deletes in Multisite Sometimes Not Syncing

2023-01-03 Thread Alex Hussein-Kershaw (HE/HIM)
Hi Matthew, That's interesting to hear - especially that you are not using bucket versioning and are seeing the same issue. I was hoping this might go away if I turned off versioning, but if that's not the case this gets a bit more worrying for us! Thanks, Alex -Original Message- Fro

[ceph-users] Re: [ext] Copying large file stuck, two cephfs-2 mounts on two cluster

2023-01-03 Thread Kuhring, Mathias
Trying to exclude clusters and/or clients might have gotten me on the right track. It might have been a client issue or actually a snapshot retention issue. As it turned out when I tried other routes for the data using a different client, the data was not available anymore since the snapshot had

[ceph-users] increasing number of (deep) scrubs

2023-01-03 Thread Frank Schilder
Hi all, we are using 16T and 18T spinning drives as OSDs and I'm observing that they are not scrubbed as often as I would like. It looks like too few scrubs are scheduled for these large OSDs. My estimate is as follows: we have 852 spinning OSDs backing a 8+2 pool with 2024 and an 8+3 pool with

[ceph-users] mon scrub error (scrub mismatch)

2023-01-03 Thread Frank Schilder
Hi all, we have these messages in our logs daily: 1/3/23 12:20:00 PM[INF]overall HEALTH_OK 1/3/23 12:19:46 PM[ERR] mon.2 ScrubResult(keys {auth=77,config=2,health=11,logm=10} crc {auth=688385498,config=4279003239,health=3522308637,logm=132403602}) 1/3/23 12:19:46 PM[ERR] mon.0 ScrubResult(keys

[ceph-users] rgw - unable to remove some orphans

2023-01-03 Thread Andrei Mikhailovsky
Happy New Year everyone! I have a bit of an issue with removing some of the orphan objects that were generated with the rgw-orphan-list tool. Over the years rgw generated over 14 million orphans with an overall waste of over 100TB in size, considering the overall data stored in rgw was well un

[ceph-users] Re: rgw - unable to remove some orphans

2023-01-03 Thread Boris Behrens
Hi Andrei, happy new year to you too. The file might be already removed. You can check if the radosobject is there with `rados -p ls ...` You can also check if the file is is still in the bucket with `radosgw-admin bucket radoslist --bucket BUCKET` Cheers Boris Am Di., 3. Jan. 2023 um 13:47 Uhr

[ceph-users] Re: rgw - unable to remove some orphans

2023-01-03 Thread Manuel Rios - EDH
Object index database get corrupted and no ones can fix. We wipped a 500TB cluster years ago and move out ceph due this orphans bugs. After move all our data we saw in disk more than 100TB data unable to be deleted by ceph, also know as orphans... no sense. We expended thousand hours with this b

[ceph-users] Re: mon scrub error (scrub mismatch)

2023-01-03 Thread Eugen Block
Hi Frank, I had this a few years back and ended up recreating the MON with the scrub mismatch, so in your case it probably would be mon.0. To test if the problem still exists you can trigger a mon scrub manually: ceph mon scrub Are all MONs on rocksdb back end in this cluster? I didn't che

[ceph-users] Re: pg deep scrubbing issue

2023-01-03 Thread Jeffrey Turmelle
Thank you Anthony. I did have an empty pool that I had provisioned for developers that was never used. I’ve removed that pool and the 0 object PGs are gone. I don’t know why I didn’t realize that. Removing that pool halved the # of PGs not scrubbed in time. This is entirely an HDD cluster.

[ceph-users] Re: rgw - unable to remove some orphans

2023-01-03 Thread Andrei Mikhailovsky
Hi Boris, The objects do exist and I can see it with ls. I can also verify that the total amount of objects in the pool is over 2m more than the amount of files. The total used space of all the buckets is about 10TB less than the total space used up by the .rgw.buckets pool. My colleague has s

[ceph-users] Re: rgw - unable to remove some orphans

2023-01-03 Thread Andrei Mikhailovsky
Manuel, Wow, I am pretty surprised to hear that the ceph developers hasn't addressed this issue already. It looks like it is a big issue, which is costing a lot of money to keep this orphan data unresolved. Could someone from the developers comment on the issue and let us know if there is a wo

[ceph-users] RGW - Keyring Storage Cluster Users ceph for secondary RGW multisite

2023-01-03 Thread Guillaume Morin
Hello, i need help for configure a Storage Cluster Users for a secondary rados gateway. My multisite RGW configuration & sync works with lot of capabilities (osd 'allow rwx, mon 'allow profile simple-rados-client', mgr 'allow profile rbd') but i would like avoided to use osd 'allow rwx'. Act

[ceph-users] Re: mon scrub error (scrub mismatch)

2023-01-03 Thread Frank Schilder
Hi Eugen, thanks for your answer. All our mons use rocksdb. I found some old threads, but they never really explained anything. What irritates me is that this is a silent corruption. If you don't read the logs every day, you will not see it, ceph status reports health ok. That's also why I'm w

[ceph-users] Re: mon scrub error (scrub mismatch)

2023-01-03 Thread Dan van der Ster
Hi Frank, Can you work backwards in the logs to when this first appeared? The scrub error is showing that mon.0 has 78 auth keys and the other two have 77. So you'd have query the auth keys of each mon to see if you get a different response each time (e.g. ceph auth list), and compare with what yo

[ceph-users] Telemetry service is temporarily down

2023-01-03 Thread Yaarit Hatuka
Hi everyone, We are having some infrastructure issues with our telemetry backend, and we are working on fixing it. Thanks Jan Horacek for opening this issue [1]. We will update once the service is back up. We are sorry for any inconvenience you may be experi

[ceph-users] Does Raid Controller p420i in HBA mode become Bottleneck?

2023-01-03 Thread hosseinz8...@yahoo.com
Hi Experts,In my new cluster, each of my storage nodes have 6x PM1643 Samsung SSD with P420i Raid Controller in HBA Mode.My Main concern is P420i working in HBA mode become bottleneck in IOPs & throughput or not.Each PM1643 support 30k write and 6 count of PM1643 result in 180k iops (30k * 6).I

[ceph-users] Re: rgw - unable to remove some orphans

2023-01-03 Thread Fabio Pasetti
Hi everyone, we’ve got the same issue with our cluster Ceph (release Pacific) and we saw this issue for the first time when we start to use it as offload storage for Veeam Backup. In fact Veeam, at the end of the offload job, when it try to delete the oldest files, gave us the “unknown error” wh