[ceph-users] Re: CephFS thrashing through the page cache

2023-04-03 Thread Ashu Pachauri
Hi Xiubo, Did you get a chance to work on this? I am curious to test out the improvements. Thanks and Regards, Ashu Pachauri On Fri, Mar 17, 2023 at 3:33 PM Frank Schilder wrote: > Hi Ashu, > > thanks for the clarification. That's not an option that is easy to change. > I hope that the modifi

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Anthony D'Atri
Any chance you ran `rados bench` but didn’t fully clean up afterward? > On Apr 3, 2023, at 9:25 PM, Work Ceph > wrote: > > Hello guys! > > > We noticed an unexpected situation. In a recently deployed Ceph cluster we > are seeing a raw usage, that is a bit odd. We have the following setup: >

[ceph-users] Re: Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Work Ceph
To add more information, in case that helps: ``` # ceph -s cluster: id: health: HEALTH_OK task status: data: pools: 6 pools, 161 pgs objects: 223 objects, 7.0 KiB usage: 9.3 TiB used, 364 TiB / 373 TiB avail pgs: 161 active+clean # ceph df --- RA

[ceph-users] Recently deployed cluster showing 9Tb of raw usage without any load deployed

2023-04-03 Thread Work Ceph
Hello guys! We noticed an unexpected situation. In a recently deployed Ceph cluster we are seeing a raw usage, that is a bit odd. We have the following setup: We have a new cluster with 5 nodes with the following setup: - 128 GB of RAM - 2 cpus Intel(R) Intel Xeon Silver 4210R - 1 NVM

[ceph-users] Read and write performance on distributed filesystem

2023-04-03 Thread David Cunningham
Hello, We are considering CephFS as an alternative to GlusterFS, and have some questions about performance. Is anyone able to advise us please? This would be for file systems between 100GB and 2TB in size, average file size around 5MB, and a mixture of reads and writes. I may not be using the cor

[ceph-users] Re: quincy v17.2.6 QE Validation status

2023-04-03 Thread Yuri Weinstein
Josh, the release is ready for your review and approval. Adam, can you please update the LRC upgrade to 17.2.6 RC? Thx On Wed, Mar 29, 2023 at 3:07 PM Yuri Weinstein wrote: > The release has been approved. > > And the gibba cluster upgraded. > > We are awaiting the LRC upgrade and then/or in

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Anthony D'Atri
Mark Nelson's space amp sheet visualizes this really well. A nuance here is that Ceph always writes a full stripe, so with a 9,6 profile, on conventional media, a minimum of 15x4KB=20KB underlying storage will be consumed, even for a 1KB object. A 22 KB object would similarly tie up ~18KB of

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Michel Jouvin
Hi Frank, Thanks for this detailed answer. About your point of 4+2 or similar schemes defeating the purpose of a 3-datacenter configuration, you're right in principle. In our case, the goal is to avoid any impact for replicated pools (in particular RBD for the cloud) but it may be acceptable f

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Frank Schilder
Hi Michel, failure domain = datacenter doesn't work, because crush wants to put 1 shard per failure domain and you have 3 data centers and not 6. The modified crush rule you wrote should work. I believe equally well with x=0 or 2 -- but try it out before doing anything to your cluster. The eas

[ceph-users] Re: Misplaced objects greater than 100%

2023-04-03 Thread Johan Hattne
Thanks Mehmet; I took a closer look at what I sent you and the problem appears to be in the CRUSH map. At some point since anything was last rebooted, I created rack buckets and moved the OSD nodes in under them: # ceph osd crush add-bucket rack-0 rack # ceph osd crush add-bucket rack-1 ra

[ceph-users] Re: compiling Nautilus for el9

2023-04-03 Thread Marc
I am building with a centos9 stream container currently. I have been adding some rpms that were missing and not in the dependencies. Currently with these cmake options, these binaries are not build. Anyone an idea what this could be. cmake .. -DCMAKE_INSTALL_PREFIX=/usr -DCMAKE_INSTALL_LIBDIR=

[ceph-users] Crushmap rule for multi-datacenter erasure coding

2023-04-03 Thread Michel Jouvin
Hi, We have a 3-site Ceph cluster and would like to create a 4+2 EC pool with 2 chunks per datacenter, to maximise the resilience in case of 1 datacenter being down. I have not found a way to create an EC profile with this 2-level allocation strategy. I created an EC profile with a failure do

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Sridhar Seshasayee
Responses inline. I have a last question. Why is the bench performed using writes of 4 KiB. > Is any reason to choose that over another another value? > > Yes, the mClock scheduler considers this as a baseline in order to estimate costs for operations involving other block sizes. This is again an

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Luis Domingues
Hi, Thanks a lot for the information. I have a last question. Why is the bench performed using writes of 4 KiB. Is any reason to choose that over another another value? On my lab, I tested with various values, and I have mainly two type of disks. Some Seagates and Toshiba. If I do bench with

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Sridhar Seshasayee
Why was it done that way? I do not understand the reason why distributing > the IOPS accross different disks, when the measurement we have is for one > disk alone. This means with default parameters we will always be far from > reaching OSD limit right? > > It's not on different disks. We distribut

[ceph-users] Re: How mClock profile calculation works, and IOPS

2023-04-03 Thread Luis Domingues
Hi Sridhar Thanks for the information. > > The above values are a result of distributing the IOPS across all the OSD > shards as defined by the > osd_op_num_shards_[hdd|ssd] option. For HDDs, this is set to 5 and > therefore the IOPS will be > distributed across the 5 shards (i.e. for e.g., 675/