On 10/10/24 08:01, Anthony D'Atri wrote:
The main problem was the increase in ram use scaling with PGs, which in
normal operation is often fine but as we all know balloons in failure
conditions.
Less so with BlueStore in my experience. I think in part this surfaces a bit of
Filestore legacy
> Hi Anthony.
>
>> ... Bump up pg_num on pools and see how the average / P90 ceph-osd process
>> size changes?
>> Grafana FTW. osd_map_cache_size I think defaults to 50 now; I want to say
>> it used to be much higher.
>
> That's not an option. What would help is a-priori information based on
> but simply on the physical parameter of IOPS-per-TB (a "figure of merit" that
> is widely underestimate or ignored)
hear hear!
> of HDDs, and having enough IOPS-per-TB to sustain both user and admin
> workload.
Even with SATA SSDs I twice had to expand a cluster to meet SLO long before it
> [... number of PGs per OSD ...]
> So it is mainly related to PG size.
Indeed and secondarily number of objects: many objects per PG
mean lower metadata overhead, but bigger PGs mean higher admin
workload latency.
>> Note: HDDs larger than 1TB are not really suitable for
>> significant parallel
> The main problem was the increase in ram use scaling with PGs, which in
> normal operation is often fine but as we all know balloons in failure
> conditions.
Less so with BlueStore in my experience. I think in part this surfaces a bit of
Filestore legacy that we might re-examine with Filestor
> But not, I suspect, nearly as many tentacles.
No, that's the really annoying part. It just works.
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Anthony D'Atri
Sent: Thursday, October 10, 2024 2:13 PM
To: Frank Schilder
Cc:
> I'm afraid nobody will build a 100PB cluster with 1T drives. That's just
> absurd
Check the archives for the panoply of absurdity that I’ve encountered ;)
> So, the sharp increase of per-device capacity has to be taken into account.
> Specifically as the same development is happening with S
==
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Peter Grandi
Sent: Thursday, October 10, 2024 1:01 PM
To: list Linux fs Ceph
Subject: [ceph-users] Re: What is the problem with many PGs per OSD
>>> On Thu, 10 Oct 2024 08:53:08
>>> On Thu, 10 Oct 2024 08:53:08 +, Frank Schilder said:
> The guidelines are *not* good enough for EC pools on large
> HDDs that store a high percentage of small objects, in our
> case, files.
Arguably *nothing* is good enough for that, because it is the
worst possible case scenario (A Ceph
10, 2024 10:19 AM
To: Frank Schilder
Cc: Janne Johansson; Anthony D'Atri; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD
Yes, this was an old lesson and AFAIK nobody has intentionally pushed the
bounds in a long time because it was a very painful les
> From: Janne Johansson
> Sent: Thursday, October 10, 2024 8:51 AM
> To: Frank Schilder
> Cc: Anthony D'Atri; ceph-users@ceph.io
> Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD
>
> Den ons 9 okt. 2024 kl 20:48 skrev Frank Schilder :
&
any more.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Janne Johansson
Sent: Thursday, October 10, 2024 8:51 AM
To: Frank Schilder
Cc: Anthony D'Atri; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: What is the probl
Den ons 9 okt. 2024 kl 20:48 skrev Frank Schilder :
> The PG count per OSD is a striking exception. Its just a number (well a range
> with 100 recommended and 200 as a max:
> https://docs.ceph.com/en/latest/rados/operations/pgcalc/#keyDL). It just is.
> And this doesn't make any sense unless th
Hi Anthony,
replying here to points that were somewhat outside the scope of my original
question:
> > That's why deploying multiple OSDs per SSD is such a great way to
> > improve performance on devices where 4K random IO throughput scales with
> > iodepth.
>
> Mark’s testing have shown this to
Hi Anthony.
> ... Bump up pg_num on pools and see how the average / P90 ceph-osd process
> size changes?
> Grafana FTW. osd_map_cache_size I think defaults to 50 now; I want to say it
> used to be much higher.
That's not an option. What would help is a-priori information based on the
implemen
> Unfortunately, it doesn't really help answering my questions either.
Sometimes the best we can do is grunt and shrug :-/. Before Nautilus we
couldn’t merge PGs, so we could raise pg_num for a pool but not decrease it, so
a certain fear of overshooting was established. Mark is the go-to here
Den ons 9 okt. 2024 kl 11:34 skrev Frank Schilder :
> Hi Janne,
> thanks for looking at this. I'm afraid I have to flag this as rumor as well,
> you are basically stating it yourself:
> It is a good idea to collect such hypotheses, assuming that a dev drops by
> and can comment on that with backg
___
From: Janne Johansson
Sent: Wednesday, October 9, 2024 11:20 AM
To: Frank Schilder
Cc: Anthony D'Atri; ceph-users@ceph.io
Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD
> Thanks for chiming in. Unfortunately, it doesn't really help ans
> Thanks for chiming in. Unfortunately, it doesn't really help answering my
> questions either.
>
> Concurrency: A system like ceph that hashes data into PGs translates any IO
> into random IO anyways. So it's irrelevant for spinners, they have to seek
> anyways and the degree of parallelism doe
_
From: Eugen Block
Sent: Wednesday, October 9, 2024 9:24 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: What is the problem with many PGs per OSD
Hi,
half a year ago I asked a related question
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/I3TQC42KN2FCKYV77
Hi Anthony,
Thanks for chiming in. Unfortunately, it doesn't really help answering my
questions either.
Concurrency: A system like ceph that hashes data into PGs translates any IO
into random IO anyways. So it's irrelevant for spinners, they have to seek
anyways and the degree of parallelism d
T Risø Campus
Bygning 109, rum S14
From: Eugen Block
Sent: Wednesday, October 9, 2024 9:24 AM
To: ceph-users@ceph.io
Subject: [ceph-users] Re: What is the problem with many PGs per OSD
Hi,
half a year ago I asked a related question
(https://lists.ceph.io/
Hi,
half a year ago I asked a related question
(https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/I3TQC42KN2FCKYV774VWJ7AVAWTTXEAA/#GLALD3DSTO6NSM2DY2PH4UCE4UBME3HM), when we needed to split huge PGs on a customer cluster. I wasn't sure either how far we could go with the ratio PG
I’ve sprinkled minimizers below. Free advice and worth every penny. ymmv. Do
not taunt Happy Fun Ball.
> during a lot of discussions in the past the comment that having "many PGs per
> OSD can lead to issues" came up without ever explaining what these issues
> will (not might!) be or how on
24 matches
Mail list logo