[ceph-users] Re: MDS_CLIENT_LATE_RELEASE: 1 clients failing to respond to capability release

2022-07-06 Thread Alexander Sporleder
Hello Kotresh, We have the same problem quite frequently for a few month now with Ceph 16.2.7. For us the only thing that helps is a reboot of the MDS/client or the warning might disappears after a few days by itself. Its a Ubuntu kernel (5.13) client. Best,  Alex Am Mittwoch, dem 06.07.

[ceph-users] Possible customer impact on resharding radosgw bucket indexes?

2022-07-06 Thread Boris Behrens
Hello everybody, since auto sharding does not work on replicated clusters (we only share the user account and metadata and not the actual data) I would like to implement it on my own. But when I reshard a bucket from 53 to 101 (yep, we have two buckets with around 8m files in it) it takes a long

[ceph-users] Re: CephPGImbalance: deviates by more than 30%

2022-07-06 Thread Joffrey
Ok, but I have not really SSD. My SSD is only for DB, not for data. Jof Le mar. 5 juil. 2022 à 18:01, Tatjana Dehler a écrit : > Hi, > > On 7/5/22 13:17, Joffrey wrote: > > Hi, > > > > I upgraded from 16.2.4 to 17.2.0 > > > > Now, I have a CephImbalance alert with many errors on my OSD "deviate

[ceph-users] Quincy recovery load

2022-07-06 Thread Jimmy Spets
Hi all I have a 10 node cluster with fairly modest hardware (6 HDD, 1 shared NVME for DB on each) on the nodes that I use for archival.After upgrading to Quincy I noticed that load avg on my servers is very high during recovery or rebalance.Changing the OSD recovery priority does not work, I assume

[ceph-users] Re: RFC: (deep-)scrub manager module

2022-07-06 Thread Rasha Shoaib
Hi all, Thanks for opening this discussion, Let me share with you some thoughts.. We discussed this in PetaSAN project a while ago, after getting complaints concerning pgs not deep scrubbed in time. The main question was whether Ceph should be responsible to finish scrubbing in the specified

[ceph-users] Re: Quincy recovery load

2022-07-06 Thread Sridhar Seshasayee
Hi Jimmy, As you rightly pointed out, the OSD recovery priority does not work because of the change to mClock. By default, the "high_client_ops" profile is enabled and this optimizes client ops when compared to recovery ops. Recovery ops will take the longest time to complete with this profile and

[ceph-users] Re: Quincy recovery load

2022-07-06 Thread Anthony D'Atri
Do you mean load average as reported by `top` or `uptime`? That figure can be misleading on multi-core systems. What CPU are you using? For context, when I ran systems with 32C/64T and 24x SATA SSD, the load average could easily hit 40-60 without anything being wrong. What CPU percentages in

[ceph-users] Re: Quincy recovery load

2022-07-06 Thread Jimmy Spets
Thanks for your reply. What I meant with high load was load as seen by the top command, all the servers have load average over 10. I added one more noode to add more space. This is what I get from ceph status: cluster: id: health: HEALTH_WARN 2 failed cephadm daemon(s

[ceph-users] Get filename from oid?

2022-07-06 Thread Toby Darling
Hi Cephers I've got a missing object, can anyone point me to a simple method of turning the oid into a /path/filename, that I could then recover from backup? root@ceph-s1 15:52 [~]: ceph pg 2.fff list_unfound { "num_missing": 1, "num_unfound": 1, "objects": [ {

[ceph-users] Re: Quincy recovery load

2022-07-06 Thread Jimmy Spets
> Do you mean load average as reported by `top` or `uptime`? yes > That figure can be misleading on multi-core systems. What CPU are you using? It's a 4c/4t low end CPU /Jimmy On Wed, Jul 6, 2022 at 4:52 PM Anthony D'Atri wrote: > Do you mean load average as reported by `top` or `uptime`? > >

[ceph-users] Re: [ext] Re: snap_schedule MGR module not available after upgrade to Quincy

2022-07-06 Thread Kuhring, Mathias
Hey Andreas, thanks for the info. We also had our MGR reporting crashes related to the module. We have a second cluster as mirror which we also updated to Quincy. But there the MGR is able to use the snap_module (so "ceph fs snap-schedule status" etc are not complaining). And I'm able to schedu

[ceph-users] rados df vs ls

2022-07-06 Thread stuart.anderson
I am wondering if it is safe to delete the following pool that rados ls reports is empty, but rados df indicates has a few thousand objects? [root@ceph-admin ~]# rados -p fs.data.user.hdd.ec ls | wc -l 0 [root@ceph-admin ~]# rados df | egrep -e 'POOL|fs.data.user.hdd.ec' POOL_NAME USED OBJECTS C

[ceph-users] Re: Performance in Proof-of-Concept cluster

2022-07-06 Thread Marc
This is from rbd hdd pool 3x replication (not really fast drives, 2.2 cpu's are on balanced not optimized, nautilus) [@~]# rados bench -p rbd 60 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects Object prefix: benchmar

[ceph-users] Re: Performance in Proof-of-Concept cluster

2022-07-06 Thread Marc
This is from rbd ssd pool 3x replication (sata ssd drives, 2.2 cpu's are on balanced not optimized, nautilus) [@~]# rados bench -p rbd.ssd 60 write hints = 1 Maintaining 16 concurrent writes of 4194304 bytes to objects of size 4194304 for up to 60 seconds or 0 objects Object prefix: benchmark_d

[ceph-users] Ceph Leadership Team Meeting Minutes (2022-07-06)

2022-07-06 Thread David Orman
Here are the main topics of discussion during the CLT meeting today: - make-check/API tests - Ignoring the doc/ directory would skip an expensive git checkout operation and save time - Stale PRs - Currently an issue with stalebot which is being investigated - Cephalocon

[ceph-users] Poor I/O performance on OpenStack block device (OpenStack Centos8:Ussuri)

2022-07-06 Thread Vinh Nguyen Duc
have a problem with I/O performance on Openstack block device. *Enviroment:* *Openstack version: Ussuri* - OS: CentOS8 - Kernel: 4.18.0-240.15.1.el8_3.x86_64 - KVM: qemu-kvm-5.1.0-20.el8 *CEPH version: Octopus* - OS: CentOS8 - Kernel: 4.18.0-240.15.1.el8_3.x86_64 In CEPH Cluster we have 2

[ceph-users] Re: snap_schedule MGR module not available after upgrade to Quincy

2022-07-06 Thread Venky Shankar
Hi Andreas, On Wed, Jul 6, 2022 at 8:36 PM Andreas Teuchert wrote: > > Hello Mathias and others, > > I also ran into this problem after upgrading from 16.2.9 to 17.2.1. > > Additionally I observed a health warning: "3 mgr modules have recently > crashed". > > Those are actually two distinct crash

[ceph-users] which tools can test compression performance

2022-07-06 Thread Feng, Hualong
Hi, I want to test compression performance in Ceph on my cluster. But I cannot find a tools to set compress ratio or set compress data. Now I use warp(https://github.com/minio/warp) to test compression performance, but the data is random that cannot compress. So who knows which tools can test s