[ceph-users] Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
Hi, We're running latest Pacific on our production cluster and we've been seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after 15.00954s' error. We have reasons to believe this happens each time the RocksDB compaction process is launched on an OSD. My question is,

[ceph-users] Re: Is it possible (or meaningful) to revive old OSDs?

2023-09-07 Thread Frank Schilder
Hi, I did something like that in the past. If you have a sufficient amount of cold data in general and you can bring the OSDs back with their original IDs, recovery was significantly faster than rebalancing. It really depends how trivial the version update per object is. In my case it could re-u

[ceph-users] Awful new dashboard in Reef

2023-09-07 Thread Nicola Mori
Dear Ceph users, I just upgraded my cluster to Reef, and with the new version came also a revamped dashboard. Unfortunately the new dashboard is really awful to me: 1) it's no longer possible to see the status of the PGs: in the old dashboard it was very easy to see e.g. how many PGs were rec

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Konstantin Shalygin
Hi, > On 7 Sep 2023, at 10:05, J-P Methot wrote: > > We're running latest Pacific on our production cluster and we've been seeing > the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after > 15.00954s' error. We have reasons to believe this happens each time the > RocksDB co

[ceph-users] Re: Awful new dashboard in Reef

2023-09-07 Thread Nigel Williams
On Thu, 7 Sept 2023 at 18:05, Nicola Mori wrote: > Is it just me or maybe my impressions are shared by someone else? Is > there anything that can be done to improve the situation? > I wonder about the implementation choice for this dashboard. I find with our Reef cluster it seems to get stuck du

[ceph-users] Re: Awful new dashboard in Reef

2023-09-07 Thread Nicola Mori
My cluster has 104 OSDs, so I don't think this can be a factor for the malfunctioning. smime.p7s Description: S/MIME Cryptographic Signature ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
We're talking about automatic online compaction here, not running the command. On 9/7/23 04:04, Konstantin Shalygin wrote: Hi, On 7 Sep 2023, at 10:05, J-P Methot wrote: We're running latest Pacific on our production cluster and we've been seeing the dreaded 'OSD::osd_op_tp thread 0x7f346a

[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-07 Thread Rok Jaklič
Hi, we have also experienced several ceph-mgr oom kills on ceph v16.2.13 on 120T/200T data. Is there any tracker about the problem? Does upgrade to 17.x "solves" the problem? Kind regards, Rok On Wed, Sep 6, 2023 at 9:36 PM Ernesto Puerta wrote: > Dear Cephers, > > Today brought us an even

[ceph-users] Re: Debian/bullseye build for reef

2023-09-07 Thread Matthew Vernon
Hi, On 21/08/2023 17:16, Josh Durgin wrote: We weren't targeting bullseye once we discovered the compiler version problem, the focus shifted to bookworm. If anyone would like to help maintaining debian builds, or looking into these issues, it would be welcome: https://bugs.debian.org/cgi-bin

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Alexander E. Patrakov
On an HDD-based Quincy 17.2.5 cluster (with DB/WAL on datacenter-class NVMe with enhanced power loss protection), I sometimes (once or twice per week) see log entries similar to what I reproduced below (a bit trimmed): Wed 2023-09-06 22:41:54 UTC ceph-osd09 ceph-osd@39.service[5574]: 2023-09-06T22

[ceph-users] Re: Is it possible (or meaningful) to revive old OSDs?

2023-09-07 Thread ceph-mail
Thanks all for the advice, very helpful! The node also had a mon, which happily slotted right back into the cluster. The node's been up and running for a number of days now, but the systemd OSD processes don't seem to be trying continously, they're never progressing or getting a newer map. As

[ceph-users] Re: Permissions of the .snap directory do not inherit ACLs in 17.2.6

2023-09-07 Thread Eugen Block
Your description seems to match my observations trying to create cephfs snapshots via dashboard. In latest Octopus it works, in Pacific 16.2.13 and Quincy 17.2.6 it doesn't, in Reef 18.2.0 it works again. Zitat von MARTEL Arnaud : Hi Eugen, We have a lot of shared directories in cephfs

[ceph-users] failure domain and rack awareness

2023-09-07 Thread Reza Bakhshayeshi
Hello, What is the best strategy regarding failure domain and rack awareness when there are only 2 physical racks and we need 3 replicas of data? In this scenario what is your point of view if we create 4 artificial racks at least to be able to manage deliberate node maintenance in a more efficie

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Stefan Kooman
On 07-09-2023 09:05, J-P Methot wrote: Hi, We're running latest Pacific on our production cluster and we've been seeing the dreaded 'OSD::osd_op_tp thread 0x7f346aa64700' had timed out after 15.00954s' error. We have reasons to believe this happens each time the RocksDB compaction process

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
Hi, Since my post, we've been speaking with a member of the Ceph dev team. He did, at first, believe it was an issue linked to the common performance degradation after huge deletes operation. So we did do offline compactions on all our OSDs. It fixed nothing and we are going through the logs

[ceph-users] Upgrading OS [and ceph release] nondestructively for oldish Ceph cluster

2023-09-07 Thread Sam Skipsey
Hello all, We've had a Nautilus [latest releases] cluster for some years now, and are planning the upgrade process - both moving off Centos7 [ideally to a RHEL9 compatible spin like Alma 9 or Rocky 9] and also moving to a newer Ceph release [ideally Pacific or higher to avoid too many later upg

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Konstantin Shalygin
Hi, > On 7 Sep 2023, at 18:21, J-P Methot wrote: > > Since my post, we've been speaking with a member of the Ceph dev team. He > did, at first, believe it was an issue linked to the common performance > degradation after huge deletes operation. So we did do offline compactions on > all our OS

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Mark Nelson
Hello, There are two things that might help you here.  One is to try the new "rocksdb_cf_compaction_on_deletion" feature that I added in Reef and we backported to Pacific in 16.2.13.  So far this appears to be a huge win for avoiding tombstone accumulation during iteration which is often the

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
We went from 16.2.13 to 16.2.14 Also, timeout is 15 seconds because it's the default in Ceph. Basically, 15 seconds before Ceph shows a warning that OSD is timing out. We may have found the solution, but it would be, in fact, related to bluestore_allocator and not the compaction process. I'll

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
Hi, By this point, we're 95% sure that, contrary to our previous beliefs, it's an issue with changes to the bluestore_allocator and not the compaction process. That said, I will keep this email in mind as we will want to test optimizations to compaction on our test environment. On 9/7/23 12:

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Mark Nelson
Ok, good to know.  Please feel free to update us here with what you are seeing in the allocator.  It might also be worth opening a tracker ticket as well.  I did some work in the AVL allocator a while back where we were repeating the linear search from the same offset every allocation, getting

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread J-P Methot
To be quite honest, I will not pretend I have a low level understanding of what was going on. There is very little documentation as to what the bluestore allocator actually does and we had to rely on Igor's help to find the solution, so my understanding of the situation is limited. What I under

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Mark Nelson
Oh that's very good to know.  I'm sure Igor will respond here, but do you know which PR this was related to? (possibly https://github.com/ceph/ceph/pull/50321) If we think there's a regression here we should get it into the tracker ASAP. Mark On 9/7/23 13:45, J-P Methot wrote: To be quite

[ceph-users] Re: ceph_leadership_team_meeting_s18e06.mkv

2023-09-07 Thread Mark Nelson
Hi Rok, We're still try to catch what's causing the memory growth, so it's hard to guess at which releases are affected. We know it's happening intermittently on a live Pacific cluster at least. If you have the ability to catch it while it's happening, there are several approaches/tools tha

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread xiaowenhao111
I also see the dreaded.  i find this is bcache problem .you can use blktrace tools capture iodatas analysis 发自我的小米在 Stefan Kooman ,2023年9月7日 下午10:52写道:On 07-09-2023 09:05, J-P Methot wrote: > Hi, > > We're running latest Pacific on our production cluster and we've been > seeing the dreaded 'O

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Stefan Kooman
On 07-09-2023 19:20, J-P Methot wrote: We went from 16.2.13 to 16.2.14 Also, timeout is 15 seconds because it's the default in Ceph. Basically, 15 seconds before Ceph shows a warning that OSD is timing out. We may have found the solution, but it would be, in fact, related to bluestore_alloca

[ceph-users] Re: Rocksdb compaction and OSD timeout

2023-09-07 Thread Konstantin Shalygin
This cluster use the default settings or something for Bluestore was changed? You can check this via `ceph config diff` As Mark said, it will be nice to have a tracker, if this really release problem Thanks, k Sent from my iPhone > On 7 Sep 2023, at 20:22, J-P Methot wrote: > > We went from