[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-13 Thread Mark Nelson
On 10/10/24 08:01, Anthony D'Atri wrote: The main problem was the increase in ram use scaling with PGs, which in normal operation is often fine but as we all know balloons in failure conditions. Less so with BlueStore in my experience. I think in part this surfaces a bit of Filestore legacy

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Anthony D'Atri
> Hi Anthony. > >> ... Bump up pg_num on pools and see how the average / P90 ceph-osd process >> size changes? >> Grafana FTW. osd_map_cache_size I think defaults to 50 now; I want to say >> it used to be much higher. > > That's not an option. What would help is a-priori information based on

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Anthony D'Atri
> but simply on the physical parameter of IOPS-per-TB (a "figure of merit" that > is widely underestimate or ignored) hear hear! > of HDDs, and having enough IOPS-per-TB to sustain both user and admin > workload. Even with SATA SSDs I twice had to expand a cluster to meet SLO long before it

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Peter Grandi
> [... number of PGs per OSD ...] > So it is mainly related to PG size. Indeed and secondarily number of objects: many objects per PG mean lower metadata overhead, but bigger PGs mean higher admin workload latency. >> Note: HDDs larger than 1TB are not really suitable for >> significant parallel

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Anthony D'Atri
> The main problem was the increase in ram use scaling with PGs, which in > normal operation is often fine but as we all know balloons in failure > conditions. Less so with BlueStore in my experience. I think in part this surfaces a bit of Filestore legacy that we might re-examine with Filestor

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
> But not, I suspect, nearly as many tentacles. No, that's the really annoying part. It just works. = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Anthony D'Atri Sent: Thursday, October 10, 2024 2:13 PM To: Frank Schilder Cc:

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Anthony D'Atri
> I'm afraid nobody will build a 100PB cluster with 1T drives. That's just > absurd Check the archives for the panoply of absurdity that I’ve encountered ;) > So, the sharp increase of per-device capacity has to be taken into account. > Specifically as the same development is happening with S

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
== Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Peter Grandi Sent: Thursday, October 10, 2024 1:01 PM To: list Linux fs Ceph Subject: [ceph-users] Re: What is the problem with many PGs per OSD >>> On Thu, 10 Oct 2024 08:53:08

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Peter Grandi
>>> On Thu, 10 Oct 2024 08:53:08 +, Frank Schilder said: > The guidelines are *not* good enough for EC pools on large > HDDs that store a high percentage of small objects, in our > case, files. Arguably *nothing* is good enough for that, because it is the worst possible case scenario (A Ceph

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
10, 2024 10:19 AM To: Frank Schilder Cc: Janne Johansson; Anthony D'Atri; ceph-users@ceph.io Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD Yes, this was an old lesson and AFAIK nobody has intentionally pushed the bounds in a long time because it was a very painful les

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Gregory Farnum
> From: Janne Johansson > Sent: Thursday, October 10, 2024 8:51 AM > To: Frank Schilder > Cc: Anthony D'Atri; ceph-users@ceph.io > Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD > > Den ons 9 okt. 2024 kl 20:48 skrev Frank Schilder : &

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-10 Thread Frank Schilder
any more. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Janne Johansson Sent: Thursday, October 10, 2024 8:51 AM To: Frank Schilder Cc: Anthony D'Atri; ceph-users@ceph.io Subject: Re: [ceph-users] Re: What is the probl

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Janne Johansson
Den ons 9 okt. 2024 kl 20:48 skrev Frank Schilder : > The PG count per OSD is a striking exception. Its just a number (well a range > with 100 recommended and 200 as a max: > https://docs.ceph.com/en/latest/rados/operations/pgcalc/#keyDL). It just is. > And this doesn't make any sense unless th

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
Hi Anthony, replying here to points that were somewhat outside the scope of my original question: > > That's why deploying multiple OSDs per SSD is such a great way to > > improve performance on devices where 4K random IO throughput scales with > > iodepth. > > Mark’s testing have shown this to

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
Hi Anthony. > ... Bump up pg_num on pools and see how the average / P90 ceph-osd process > size changes? > Grafana FTW. osd_map_cache_size I think defaults to 50 now; I want to say it > used to be much higher. That's not an option. What would help is a-priori information based on the implemen

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Anthony D'Atri
> Unfortunately, it doesn't really help answering my questions either. Sometimes the best we can do is grunt and shrug :-/. Before Nautilus we couldn’t merge PGs, so we could raise pg_num for a pool but not decrease it, so a certain fear of overshooting was established. Mark is the go-to here

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Janne Johansson
Den ons 9 okt. 2024 kl 11:34 skrev Frank Schilder : > Hi Janne, > thanks for looking at this. I'm afraid I have to flag this as rumor as well, > you are basically stating it yourself: > It is a good idea to collect such hypotheses, assuming that a dev drops by > and can comment on that with backg

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
___ From: Janne Johansson Sent: Wednesday, October 9, 2024 11:20 AM To: Frank Schilder Cc: Anthony D'Atri; ceph-users@ceph.io Subject: Re: [ceph-users] Re: What is the problem with many PGs per OSD > Thanks for chiming in. Unfortunately, it doesn't really help ans

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Janne Johansson
> Thanks for chiming in. Unfortunately, it doesn't really help answering my > questions either. > > Concurrency: A system like ceph that hashes data into PGs translates any IO > into random IO anyways. So it's irrelevant for spinners, they have to seek > anyways and the degree of parallelism doe

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Eugen Block
_ From: Eugen Block Sent: Wednesday, October 9, 2024 9:24 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: What is the problem with many PGs per OSD Hi, half a year ago I asked a related question (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/I3TQC42KN2FCKYV77

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
Hi Anthony, Thanks for chiming in. Unfortunately, it doesn't really help answering my questions either. Concurrency: A system like ceph that hashes data into PGs translates any IO into random IO anyways. So it's irrelevant for spinners, they have to seek anyways and the degree of parallelism d

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Frank Schilder
T Risø Campus Bygning 109, rum S14 From: Eugen Block Sent: Wednesday, October 9, 2024 9:24 AM To: ceph-users@ceph.io Subject: [ceph-users] Re: What is the problem with many PGs per OSD Hi, half a year ago I asked a related question (https://lists.ceph.io/

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-09 Thread Eugen Block
Hi, half a year ago I asked a related question (https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/I3TQC42KN2FCKYV774VWJ7AVAWTTXEAA/#GLALD3DSTO6NSM2DY2PH4UCE4UBME3HM), when we needed to split huge PGs on a customer cluster. I wasn't sure either how far we could go with the ratio PG

[ceph-users] Re: What is the problem with many PGs per OSD

2024-10-08 Thread Anthony D'Atri
I’ve sprinkled minimizers below. Free advice and worth every penny. ymmv. Do not taunt Happy Fun Ball. > during a lot of discussions in the past the comment that having "many PGs per > OSD can lead to issues" came up without ever explaining what these issues > will (not might!) be or how on