On 1/17/24 15:57, Eugen Block wrote:
Hi,
this is not an easy topic and there is no formula that can be applied
to all clusters. From my experience, it is exactly how the discussion
went in the thread you mentioned, trial & error.
Looking at your session ls output, this reminds of a debug sess
On 1/13/24 07:02, Özkan Göksu wrote:
Hello.
I have 5 node ceph cluster and I'm constantly having "clients failing to
respond to cache pressure" warning.
I have 84 cephfs kernel clients (servers) and my users are accessing their
personal subvolumes located on one pool.
My users are software d
I have compiled nautilus for el9 and am going to test adding a el9 osd node the
the existing el7 cluster. If that is ok, I will upgrade all nodes first to el9.
> -Original Message-
> From: Szabo, Istvan (Agoda)
> Sent: Wednesday, 17 January 2024 08:09
> To: balli...@45drives.com; Eugen B
Hi,
We went the „long“ way.
- first emptied osd node by node (for each pool), purged all OSDs
- moved the OS from centos 7 to ubuntu 20 (reinstalled every node)
- removed the cache pool and cleaned up some config
- installed all OSDs and moved the data back
- upgraded ceph nautilus to octopus (co
Hi folks.
I had a quick search but found nothing concrete on this so thought I would ask.
We currently have a 4 host CEPH cluster with an NVMe pool (1 OSD per host) and
an HDD Pool (1 OSD per host). Both OSD's use a separate NVMe for DB/WAL. These
machines are identical (Homogenous) and are Ry
Hi Igor,
many thanks for advice!
I've tried to start osd.1 and it started already, now it's
resynchronizing data.
I will start daemons one-by-one.
What do you mean about osd.0, which have a problem with
bluestore fsck? Is there a way to repair it?
Sincerely
Jan
Dne Út, led 16, 2024 at 08:15:
Hi Jan,
w.r.t. osd.0 - if this is the only occurrence then I'd propose simply
redeploy the OSD. This looks like some BlueStore metadata inconsistency
which could occur long before the upgrade. Likely the upgrade just
revealed the issue. And honestly I can hardly imagine how to
investigate it
On 16/1/24 11:39, Anthony D'Atri wrote:
by “RBD for cloud”, do you mean VM / container general-purposes volumes
on which a filesystem is usually built? Or large archive / backup
volumes that are read and written sequentially without much concern for
latency or throughput?
General purpose vol
> Also in our favour is that the users of the cluster we are currently
> intending for this have established a practice of storing large objects.
That definitely is in your favor.
> but it remains to be seen how 60x 22TB behaves in practice.
Be sure you don't get SMR drives.
> and it's har
I'm following the guide @ https://docs.ceph.com/en/latest/rbd/rados-rbd-cmds/
but I'm not following why would an `mgr` permission be required to have a
functioning RBD client?
Thanks.
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe sen
Hi
We have a cluster with which currently looks like so:
services:
mon: 5 daemons, quorum lazy,jolly,happy,dopey,sleepy (age 13d)
mgr: jolly.tpgixt(active, since 25h), standbys: dopey.lxajvk, lazy.xuhetq
mds: 1/1 daemons up, 2 standby
osd: 449 osds: 425 up (since 15m)
On Tue, Jan 16, 2024 at 12:11 AM Chris Palmer
wrote:
> Updates on both problems:
>
> Problem 1
> --
>
> The bookworm/reef cephadm package needs updating to accommodate the last
> change in /usr/share/doc/adduser/NEWS.Debian.gz:
>
>System user home defaults to /nonexistent if --hom
On 17.01.24 11:13, Tino Todino wrote:
> Hi folks.
>
> I had a quick search but found nothing concrete on this so thought I would
> ask.
>
> We currently have a 4 host CEPH cluster with an NVMe pool (1 OSD per host)
> and an HDD Pool (1 OSD per host). Both OSD's use a separate NVMe for DB/WAL.
Conventional wisdom is that with recent Ceph releases there is no longer a
clear advantage to this.
> On Jan 17, 2024, at 11:56, Peter Sabaini wrote:
>
> One thing that I've heard people do but haven't done personally with fast
> NVMes (not familiar with the IronWolf so not sure if they qualif
On 17/01/2024 16:11, kefu chai wrote:
On Tue, Jan 16, 2024 at 12:11 AM Chris Palmer
wrote:
Updates on both problems:
Problem 1
--
The bookworm/reef cephadm package needs updating to accommodate
the last
change in /usr/share/doc/adduser/NEWS.Debian.gz:
It's a little tricky. In the upstream lab we don't strictly see an IOPS
or average latency advantage with heavy parallelism by running muliple
OSDs per NVMe drive until per-OSD core counts get very high. There does
seem to be a fairly consistent tail latency advantage even at moderately
low c
Very informative article you did Mark.
IMHO if you find yourself with very high per-OSD core count, it may be
logical to just pack/add more nvmes per host, you'd be getting the best
price per performance and capacity.
/Maged
On 17/01/2024 22:00, Mark Nelson wrote:
It's a little tricky. In
Hi,
this sounds a bit like a customer issue we had almost two years ago.
Basically, it was about mon_max_pg_per_osd (default 250) which was
exceeded during the first activating OSD (and the last remaining
stopping OSD). You can read all the details in the lengthy thread [1].
But if this i
On 17-01-2024 22:20, Eugen Block wrote:
Hi,
Hi
this sounds a bit like a customer issue we had almost two years ago.
Basically, it was about mon_max_pg_per_osd (default 250) which was
exceeded during the first activating OSD (and the last remaining
stopping OSD). You can read all the details
+1 to this, great article and great research. Something we've been keeping a
very close eye on ourselves.
Overall we've mostly settled on the old keep it simple stupid methodology with
good results. Especially as the benefits have gotten less beneficial the more
recent your ceph version, and h
Thanks kindly Maged/Bailey! As always it's a bit of a moving target.
New hardware comes out that reveals bottlenecks in our code. Doubling
up the OSDs sometimes improves things. We figure out how to make the
OSDs faster and the old assumptions stop being correct. Even newer
hardware comes
Hi,
-3281> 2024-01-17T14:57:54.611+ 7f2c6f7ef540 0 osd.431 2154828
load_pgs opened 750 pgs <---
I'd say that's close enough to what I suspected. ;-) Not sure why the
"maybe_wait_for_max_pg" message isn't there but I'd give it a try with
a higher osd_max_pg_per_osd_hard_ratio.
Zit
On 18/01/2024 07:48, Eugen Block wrote:
Hi,
-3281> 2024-01-17T14:57:54.611+ 7f2c6f7ef540 0 osd.431 2154828
load_pgs opened 750 pgs <---
I'd say that's close enough to what I suspected. ;-) Not sure why the
"maybe_wait_for_max_pg" message isn't there but I'd give it a try with a
high
23 matches
Mail list logo