[ceph-users] Rack outage test failing when nodes get integrated again

2024-01-10 Thread Steve Baker
Hi, we're currently testing a ceph (v 16.2.14) cluster, 3 mon nodes, 6 osd nodes à 8 nvme ssd osds distributed over 3 racks. Daemons are deployed in containers with cephadm / podman. We got 2 pools on it, one with 3x replication and min_size=2, one with an EC (k3m3). With 1 mon node and 2 osd nodes

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-10 Thread Janne Johansson
Den ons 10 jan. 2024 kl 19:20 skrev huxia...@horebdata.cn : > Dear Ceph folks, > > I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, > one with replication 3, and the other with EC 4+2. After around 400 days > runing quietly and smoothly, recently the two clusters occured

[ceph-users] Re: Pacific bluestore_volume_selection_policy

2024-01-10 Thread Reed Dier
Hi Igor, That’s correct (shown below). Would it be helpful for me to add logs/uploaded crash UUID’s to 53906 , 53907 , 54209 , 62928 , 631

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-10 Thread Dan van der Ster
Hi Samuel, It can be a few things. A good place to start is to dump_mempools of one of those bloated OSDs: `ceph daemon osd.123 dump_mempools` Cheers, Dan -- Dan van der Ster CTO Clyso GmbH p: +49 89 215252722 | a: Vancouver, Canada w: https://clyso.com | e: dan.vanders...@clyso.com We are h

[ceph-users] Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-10 Thread huxia...@horebdata.cn
Dear Ceph folks, I am responsible for two Ceph clusters, running Nautilius 14.2.22 version, one with replication 3, and the other with EC 4+2. After around 400 days runing quietly and smoothly, recently the two clusters occured with similar problems: some of OSDs consume ca 18 GB while the memo

[ceph-users] Re: Pacific bluestore_volume_selection_policy

2024-01-10 Thread Igor Fedotov
Hi Reed, it looks to me like your settings aren't effective. You might want to check OSD log rather than crash info and see the assertion's backtrace. Does it mention RocksDBBlueFSVolumeSelector as the one in https://tracker.ceph.com/issues/53906: ceph version 17.0.0-10229-g7e035110 (7e0351

[ceph-users] Re: Pacific bluestore_volume_selection_policy

2024-01-10 Thread Reed Dier
Well, sadly, that setting doesn’t seem to resolve the issue. I set the value in ceph.conf for the OSDs with small WAL/DB devices that keep running into the issue, > $ ceph tell osd.12 config show | grep bluestore_volume_selection_policy > "bluestore_volume_selection_policy": "rocksdb_origin

[ceph-users] Re: Join us for the User + Dev Monthly Meetup - January 18th!

2024-01-10 Thread Laura Flores
> > You are invited to join us at the User + Dev meeting this week Thursday, > January 18th at 10:00 AM Eastern Time! See below for more meeting details. > Correction, the meeting will take place **next** week Thursday, January 18th, not this week. Thanks, Laura On Tue, Jan 9, 2024 at 4:44 PM La

[ceph-users] physical vs osd performance

2024-01-10 Thread Curt
Hello all, Looking at grafana reports, can anyone point me to documentation that outlines physical vs osd? https://docs.ceph.com/en/latest/monitoring/ gives some basic info, but I'm trying to get a better understanding. For instance, physical latency is 20ms and osd is 200ms, these are just made

[ceph-users] Re: Stuck in upgrade process to reef

2024-01-10 Thread Igor Fedotov
Hi Jan, indeed this looks like some memory allocation problem - may be OSD's RAM usage threshold reached or something? Curious if you have any custom OSD settings or may be any memory caps for Ceph containers? Could you please set debug_bluestore to 5/20 and debug_prioritycache to 10 and t