[ceph-users] Re: How to modify the destination pool name in the rbd-mirror configuration?

2024-11-26 Thread David Yang
Thanks, Eugen I'll look into how to do this. Eugen Block 于2024年11月27日周三 15:21写道: > > Sure, I haven't done that with rbd-mirror but I don't see a reason why > it shouldn't work. You can create different clusters within the same > subnet, you just have to pay attention to the correct configs etc.

[ceph-users] Re: How to modify the destination pool name in the rbd-mirror configuration?

2024-11-26 Thread Eugen Block
Sure, I haven't done that with rbd-mirror but I don't see a reason why it shouldn't work. You can create different clusters within the same subnet, you just have to pay attention to the correct configs etc. Each cluster has its own UUID and separate MONs, it should just work. If not, let us

[ceph-users] Re: How to modify the destination pool name in the rbd-mirror configuration?

2024-11-26 Thread David Yang
Hi Eugen Do you mean that it is possible to create multiple clusters on one infrastructure and then perform backups in each cluster? Eugen Block 于2024年11月27日周三 14:48写道: > > Hi, > > I don't think there's a way to achieve that. You could create two > separate backup clusters and configure mirrorin

[ceph-users] Re: How to modify the destination pool name in the rbd-mirror configuration?

2024-11-26 Thread Eugen Block
Hi, I don't think there's a way to achieve that. You could create two separate backup clusters and configure mirroring on both of them. You should already have enough space (enough OSDs) to mirror both pools, so that would work. You can colocate MONs with OSDs so you don't need additional

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Szabo, Istvan (Agoda)
Hi, I'd like to understand how the free space allocation is calculated when osd crash happens and says no free space on the device (maybe due to fragmentation or allocation issue). I checked all the graphs back to September when we had multiple osd failures on Octopus 15.2.17 co-located wal+db

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Szabo, Istvan (Agoda)
Got it, the perf dump can give information: ceph daemon osd.x perf dump|jq .bluefs From: Szabo, Istvan (Agoda) Sent: Wednesday, November 27, 2024 9:20 AM To: Frédéric Nass ; John Jasen ; Igor Fedotov Cc: ceph-users Subject: [ceph-users] Re: down OSDs, Bluesto

[ceph-users] How to modify the destination pool name in the rbd-mirror configuration?

2024-11-26 Thread David Yang
Hello, everyone. Under normal circumstances, we synchronize from PoolA of ClusterA to PoolA of ClusterB (same pool name), which is also easy to configure. Now the requirements are as follows: ClusterA/Pool synchronizes to BackCluster/PoolA ClusterB/Pool synchronizes to BackCluster/PoolB After re

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Szabo, Istvan (Agoda)
Hi, Is there a way to check without shutting down an osd the remaining free space on the co-located osd how much has left for DB or how full is it? From: Frédéric Nass Sent: Wednesday, November 27, 2024 6:11 AM To: John Jasen Cc: Igor Fedotov ; ceph-users Sub

[ceph-users] Re: CephFS empty files in a Frankenstein system

2024-11-26 Thread Gregory Farnum
I haven’t gone through all the details since you seem to know you’ve done some odd stuff, but the “.snap” issue is because you’ve run into a CephFS feature which I recently discovered is embarrassingly under-documented: https://docs.ceph.com/en/reef/dev/cephfs-snapshots So that’s a special fake di

[ceph-users] CephFS empty files in a Frankenstein system

2024-11-26 Thread Linas Vepstas
Don't laugh. I am experimenting with Ceph in an enthusiast, small-office, home-office setting. Yes, this is not the conventional use case, but I think Ceph almost is, almost could be used for this. Do I need to explain why? These kinds of people (i.e. me) already run RAID. And maybe CIFS/Samba or

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Szabo, Istvan (Agoda)
Hi, This issue should not happen anymore from 17.2.8 am I correct? In this version all the fragmentation issue should have gone even with collocated wal+db+block. From: Frédéric Nass Sent: Wednesday, November 27, 2024 6:12:46 AM To: John Jasen Cc: Igor Fedotov

[ceph-users] Squid: deep scrub issues

2024-11-26 Thread Laimis Juzeliūnas
Hello Ceph community, Wanted to highlight one observation and gather any Squid users having similar experiences. Since upgrading to 19.2.0 (from 18.4.0) we have observed that pg deep scrubbing times have drastically increased. Some pgs take 2-5 days to complete deep scrubbing while others incre

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Frédéric Nass
Hi John, That's about right. Two potential solutions exist:  1. Adding a new drive to the server and sharing it for RocksDB metadata, or  2. Repurposing one of the failed OSDs for the same purpose (if adding more drives isn't feasible). Igor's post #6 [1] explains the challenges with co-located

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Sergio Rabellino
May not apply, but usually when I have strange (and bad) behaviours like these, I double check name resolution/DNS configuration on all hosts involved in. Il 26/11/2024 11:19, Martin Gerhard Loschwitz ha scritto: Hi Alex, thank you for the reply. Here are all the steps we’ve done in the last

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Maged Mokhtar
On 26/11/2024 15:09, Peter Grandi wrote: Regardless of the specifics: 4KiB write IOPS is definitely not what Ceph was designed for. Yet so many people know better and use Ceph for VM disk images, even with logs and databases on them. It depends...for a single thread/queue depth Ceph will give

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Linder
That is indeed a lot nicer hardware and 1804 iops is faster, but still lower than a usd thumb drive. The thing with ceph is that is scales out really really well, but scaling up is harder. That is, if you run like 500 of these tests at the same time, then you can see what it can do. Some guy

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Linder
Not really. I'm assuming that they have been working hard at it and I remember hearing something about a more recent rocksdb version shaving off significant time. It would also depend on your CPU and memory speed. I wouldn't be all surprised if latency is lower today, but I havent really measu

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Frédéric Nass
Hi Martin, I think what Peter suggests is that you should try with --numjobs=128 and --iodepth=16 to see what your hardware is really capable of with this very small I/O workload. Regards, Frédéric. De : Martin Gerhard Loschwitz Envoyé : mardi 26 novembre 202

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Maged Mokhtar
can you check if you have any power saving settings, make sure cpu is set to max performance, use cpupower tool to check and disable all c-states, and run at max frequency. for hdd qd=1, 60 iops is ok for ssd qd=1, you should get roughly 3-5k iops read, 1k iops write, but if your cpu is pow

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Martin Gerhard Loschwitz
Here’s a benchmark of another setup I did a few months back, with NVME flash drives and a Mellanox EVPN fabric (Spectrum ASIC) between the nodes (no RDMA). 3 hosts and 24 drives in total. root@test01:~# fio --ioengine=libaio --filename=/dev/sdb --direct=1 --sync=1 --rw=write --bs=4K --numjobs=1

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Marc
> > In my experience, ceph will add around 1ms even if only on localhost. If > this is in the client code or on the OSD's, I dont really know. I don't > even know the precise reason, but the latency is there nevertheless. > Perhaps you can find the reason here among the tradeoffs ceph and > simila

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Linder
In my experience, ceph will add around 1ms even if only on localhost. If this is in the client code or on the OSD's, I dont really know. I don't even know the precise reason, but the latency is there nevertheless. Perhaps you can find the reason here among the tradeoffs ceph and similar systems

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Linder
With qd=1 (queue depth?) and a single thread, this isn't totally unreasonable. Ceph will have an internal latency of around 1ms or so, add some network to that and an operation can take 2-3ms. With a single operation in flight all the time, this means 333-500 operations per second. With hdds,

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Alex Gorbachev
Martin, are MONs set up on the same hosts, or is there latency to them by any chance? -- Alex Gorbachev https://alextelescope.blogspot.com On Tue, Nov 26, 2024 at 5:20 AM Martin Gerhard Loschwitz < martin.loschw...@true-west.com> wrote: > Hi Alex, > > thank you for the reply. Here are all the s

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread John Jasen
Let me see if I have the approach right'ish: scrounge some more disk for the servers with full/down OSDs. partition the new disks into LVs for each downed OSD. Attach as a lvm new-db to the downed OSDs. Restart the OSDs. Profit. Is that about right? On Tue, Nov 26, 2024 at 11:28 AM Igor Fedotov

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Igor Fedotov
Well, so there is a single shared volume (disk) per OSD, right? If so one can add dedicated DB volume to such an OSD - one done OSD will have two underlying devices: main(which is original shared disk) and new dedicated DB ones.  And hence this will effectively provide additional space for Blu

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread John Jasen
They're all bluefs_single_shared_device, if I understand your question. There's no room left on the devices to expand. We started at quincy with this cluster, and didn't vary too much from the Redhat Ceph storage 6 documentation for setting it up. On Tue, Nov 26, 2024 at 4:48 AM Igor Fedotov wr

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Anthony D'Atri
Wait … 1 gigabit?? That sure isn’t doing you any favors. Remember that RADOS sends replication sub-ops over that, though you mentioned a size=1 pool. You’ll have mon <> OSD traffic and OSD <—> OSD heartbeats going over that link as well. > On Nov 26, 2024, at 5:22 AM, Martin Gerhard Los

[ceph-users] Re: v17.2.8 Quincy released - failed on Debian 11

2024-11-26 Thread Konstantin Shalygin
Hi, > On 26 Nov 2024, at 16:10, Matthew Darwin wrote: > > I guess there is a missing dependency (which really should be > auto-installed), which is not also documented in the release notes as a new > requirement. This seems to fix it: This caused by [1], the fix was not backported to quincy,

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Grandi
> [...] All-SSD cluster I will get roughly 400 IOPS over more > than 250 devices. I’ve know SAS-SSDs are not ideal, but 250 > looks a bit on the low side of things to me. In the second > cluster, also All-SSD based, I get roughly 120 4k IOPS. And > the HDD-only cluster delivers 60 4k IOPS. Regardl

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Peter Grandi
>>> On Mon, 25 Nov 2024 15:22:32 +0100, Martin Gerhard Loschwitz >>> said: > [...] abysmal 4k IOPS performance [... Also: https://www.google.com/search?q=ceph+bench+small+blocks https://www.reddit.com/r/ceph/comments/10b2846/low_iops_with_all_ssd_cluster_for_4k_writes/ https://www.r

[ceph-users] Re: v17.2.8 Quincy released - failed on Debian 11

2024-11-26 Thread Matthew Darwin
I guess there is a missing dependency (which really should be auto-installed), which is not also documented in the release notes as a new requirement.  This seems to fix it: $ apt install --no-install-recommends python3-packaging On 2024-11-26 08:03, Matthew Darwin wrote: I have upgraded from

[ceph-users] Re: v17.2.8 Quincy released - failed on Debian 11

2024-11-26 Thread Matthew Darwin
I have upgraded from 17.2.7 to 17.2.8 on debian 11 and the OSDs fail to start.  Advise how to proceed would be welcome. This is my log from ceph-volume-systemd.log [2024-11-26 12:51:30,251][systemd][INFO  ] raw systemd input received: lvm-1-1c136e54-6f58-4f36-af10-d47d215b991b [2024-11-26 12:5

[ceph-users] Re: CephFS 16.2.10 problem

2024-11-26 Thread Alexey.Tsivinsky
Good morning! It's a novelty for our situation. The list of cluster locks now contains entries for the addresses of the servers on which the mds is running. The lifetime of the records is a day. For now, we decided to wait until the records are deleted and see how the cluster behaves. Best

[ceph-users] RGW Daemons Crash After Adding Secondary Zone with Archive module

2024-11-26 Thread mahnoosh shahidi
Hi everyone, I encountered an issue while setting up a multisite configuration in Ceph. After adding the second secondary zone and enabling the archive module, the RGW daemons started crashing repeatedly after running the period update command. The crash log shows the following error: Caught

[ceph-users] Multisite RGW-SYNC error: failed to remove omap key from error repo

2024-11-26 Thread Szabo, Istvan (Agoda)
Hi, I see in our logs continuously these kind of messaages: RGW-SYNC:data:sync:shard[7]:entry: ERROR: failed to remove omap key from error repo (hkg.rgw.log:datalog.sync-status.shard.61c9d940-fde4-4bed-9389-edc8d7741817.7.retry retcode=-1 And we started to receive large omaps in the log pool

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Martin Gerhard Loschwitz
Hi Anthony, I think problems have always been like this, albeit these setups are a bit older already. We’ve specifically set the MTU to 9000 on both switches and all affected machines, but MTU 1500 or MTU 9000 literally doesn’t make a difference. Network is non-LACP on one of the test clusters

[ceph-users] Re: 4k IOPS: miserable performance in All-SSD cluster

2024-11-26 Thread Martin Gerhard Loschwitz
Hi Alex, thank you for the reply. Here are all the steps we’ve done in the last weeks to reduce complexity (we’re focussing on the HDD cluster for now in which we are seeing the worst results in relation — but it also happens to be the easiest setup network-wise, despite only having a 1G link b

[ceph-users] Re: down OSDs, Bluestore out of space, unable to restart

2024-11-26 Thread Igor Fedotov
Hi John, you haven't described your OSD volume configuration but you might want to try adding standalone DB volume if OSD uses LVM and has single main device only. 'ceph-volume lvm new-db' command is the preferred way of doing that, see https://docs.ceph.com/en/quincy/ceph-volume/lvm/newdb/

[ceph-users] CephFS 16.2.10 problem

2024-11-26 Thread Alexey.Tsivinsky
Good morning! It's a novelty for our situation. The list of cluster locks now contains entries for the addresses of the servers on which the mds is running. The lifetime of the records is a day. For now, we decided to wait until the records are deleted and see how the cluster behaves. Best r

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-26 Thread Rongqi Sun
On Tue, Nov 26, 2024 at 4:38 PM Gregory Orange wrote: > I simply registered for the dev event, and put a note in there that I > was only going for the 3pm session. > If this is a common case, does it mean there are still some spaces available at the Dev Summit? Hope info could be updated~ Best

[ceph-users] How to synchronize pools with the same name in multiple clusters to multiple pools in one cluster

2024-11-26 Thread David Yang
We have ceph clusters in multiple regions to provide rbd services. We are currently preparing a remote backup plan, which is to synchronize pools with the same name in each region to different pools in one cluster. For example: Cluster A Pool synchronized to backup cluster poolA Cluster B Pool sy

[ceph-users] Re: [CephFS] Completely exclude some MDS rank from directory processing

2024-11-26 Thread Александр Руденко
And, Eugen, try to see ceph fs status during write. I can see next INOS, DNS and Reqs distribution: RANK STATE MDS ACTIVITY DNSINOS DIRS CAPS 0active c Reqs:127 /s 12.6k 12.5k 333505 1active b Reqs:11 /s21 24 19 1 I mean that e

[ceph-users] Re: [CephFS] Completely exclude some MDS rank from directory processing

2024-11-26 Thread Александр Руденко
> > Hm, the same test worked for me with version 16.2.13... I mean, I only > do a few writes from a single client, so this may be an invalid test, > but I don't see any interruption. I tried many times and I'm sure that my test is correct. Yes, write can be active for some time after rank 1 went

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-26 Thread Gregory Orange
On 26/11/24 09:47, Stefan Kooman wrote: > The dev event is full. So unable to register anymore and leave a note > like you did. Hence my question. I guess the contact email address on that page is worth a shot ceph-devsummit-2...@cern.ch ___ ceph-users

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-26 Thread Stefan Kooman
On 26-11-2024 09:37, Gregory Orange wrote: On 25/11/24 15:57, Stefan Kooman wrote: Update: The Ceph Developer Summit is nearing capacity for "Developers". There is still room for "Power Users" to register for the afternoon session. See below for details... However, it's unclear to me if you nee

[ceph-users] Re: Cephalocon Update - New Users Workshop and Power Users Session

2024-11-26 Thread Gregory Orange
On 25/11/24 15:57, Stefan Kooman wrote: > Update: The Ceph Developer Summit is nearing capacity for "Developers". > There is still room for "Power Users" to register for the afternoon > session. See below for details... > > However, it's unclear to me if you need to register for the "Power > Users