[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Frank Schilder
Hi, there seem to be replies missing to this list. For example, I can't find any messages that contain information that could lead to this conclusion: > * pg_num too low (defaults are too low) > * pg_num not a power of 2 > * pg_num != number of OSDs in the pool > * balancer not enabled It is hor

[ceph-users] Re: How radosgw considers that the file upload is done?

2024-06-12 Thread Daniel Gryniewicz
On 6/12/24 5:43 AM, Szabo, Istvan (Agoda) wrote: Hi, Wonder how radosgw knows that a transaction is done and didn't break the connection between the user interface and gateway? Let's see this is one request: 2024-06-12T16:26:03.386+0700 7fa34c7f0700 1 beast: 0x7fa5bc776750: 1.1.1.1 - - [202

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Michael Worsham
Interesting. How do you set this "maintenance mode"? If you have a series of documented steps that you have to do and could provide as an example, that would be beneficial for my efforts. We are in the process of standing up both a dev-test environment consisting of 3 Ceph servers (strictly for

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Eugen Block
Which version did you upgrade from to 18.2.2? I can’t pin it down to a specific issue, but somewhere in the back of my mind is something related to a new omap format or something. But I’m really not sure at all. Zitat von Lars Köppel : I am happy to help you with as much information as pos

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Anthony D'Atri
That's just setting noout, norebalance, etc. > On Jun 12, 2024, at 11:28, Michael Worsham > wrote: > > Interesting. How do you set this "maintenance mode"? If you have a series of > documented steps that you have to do and could provide as an example, that > would be beneficial for my efforts

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Eugen Block
There’s also a maintenance mode available for the orchestrator: https://docs.ceph.com/en/reef/cephadm/host-management/#maintenance-mode There’s some more information about that in the dev section: https://docs.ceph.com/en/reef/dev/cephadm/host-maintenance/ Zitat von Anthony D'Atri : That's ju

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Daniel Brown
I have two ansible roles, one for enter, one for exit. There’s likely better ways to do this — and I’ll not be surprised if someone here lets me know. They’re using orch commands via the cephadm shell. I’m using Ansible for other configuration management in my environment, as well, including s

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Marc
On 12 June 2024 13:19:10 UTC, "Lars Köppel" wrote: >I am happy to help you with as much information as possible. I probably >just don't know where to look for it. >Below are the requested information. The cluster is rebuilding the >zapped OSD at the moment. This will probably take the next

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Janne Johansson
> We made a mistake when we moved the servers physically so while the > replica 3 is intact the crush tree is not accurate. > > If we just remedy the situation with "ceph osd crush move ceph-flashX > datacenter=Y" we will just end up with a lot of misplaced data and some > churn, right? Or will the

[ceph-users] Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Torkil Svensgaard
Hi We have 3 servers for replica 3 with failure domain datacenter: -1 4437.29248 root default -33 1467.84814 datacenter 714 -69 69.86389 host ceph-flash1 -34 1511.25378 datacenter HX1 -73 69.86389 host ceph-fl

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Matthias Grandl
Correct, this should only result in misplaced objects. > We made a mistake when we moved the servers physically so while the replica 3 > is intact the crush tree is not accurate. Can you elaborate on that? Does this mean after the move, multiple hosts are inside the same physical datacenter? I

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Torkil Svensgaard
On 12/06/2024 10:22, Matthias Grandl wrote: Correct, this should only result in misplaced objects. > We made a mistake when we moved the servers physically so while the replica 3 is intact the crush tree is not accurate. Can you elaborate on that? Does this mean after the move, multiple ho

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Matthias Grandl
Yeah that should work no problem. In this case I would even recommend setting `norebalance` and using the trusty old upmap-remapped script (credits to Cern), to avoid unnecessary data movements: https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py Cheers! -- Matt

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Lars Köppel
Since my last update the size of the largest OSD increased by 0.4 TiB while the smallest one only increased by 0.1 TiB. How is this possible? Because the metadata pool reported to have only 900MB space left, I stopped the hot-standby MDS. This gave me 8GB back but these filled up in the last 2h. I

[ceph-users] How radosgw considers that the file upload is done?

2024-06-12 Thread Szabo, Istvan (Agoda)
Hi, Wonder how radosgw knows that a transaction is done and didn't break the connection between the user interface and gateway? Let's see this is one request: 2024-06-12T16:26:03.386+0700 7fa34c7f0700 1 beast: 0x7fa5bc776750: 1.1.1.1 - - [2024-06-12T16:26:03.386063+0700] "PUT /bucket/0/2/9663

[ceph-users] Re: Safe to move misplaced hosts between failure domains in the crush tree?

2024-06-12 Thread Torkil Svensgaard
On 12/06/2024 11:20, Matthias Grandl wrote: Yeah that should work no problem. In this case I would even recommend setting `norebalance` and using the trusty old upmap-remapped script (credits to Cern), to avoid unnecessary data movements: https://github.com/cernceph/ceph-scripts/blob/master

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Eugen Block
I don't have any good explanation at this point. Can you share some more information like: ceph pg ls-by-pool ceph osd df (for the relevant OSDs) ceph df Thanks, Eugen Zitat von Lars Köppel : Since my last update the size of the largest OSD increased by 0.4 TiB while the smallest one only

[ceph-users] Patching Ceph cluster

2024-06-12 Thread Michael Worsham
What is the proper way to patch a Ceph cluster and reboot the servers in said cluster if a reboot is necessary for said updates? And is it possible to automate it via Ansible? This message and its attachments are from Data Dimensions and are intended only for the use of the individual or entity

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Anthony D'Atri
Do you mean patching the OS? If so, easy -- one node at a time, then after it comes back up, wait until all PGs are active+clean and the mon quorum is complete before proceeding. > On Jun 12, 2024, at 07:56, Michael Worsham > wrote: > > What is the proper way to patch a Ceph cluster and reb

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Anthony D'Atri
If you have: * pg_num too low (defaults are too low) * pg_num not a power of 2 * pg_num != number of OSDs in the pool * balancer not enabled any of those might result in imbalance. > On Jun 12, 2024, at 07:33, Eugen Block wrote: > > I don't have any good explanation at this point. Can you shar

[ceph-users] Re: Patching Ceph cluster

2024-06-12 Thread Daniel Brown
There’s also a Maintenance mode that you can set for each server, as you’re doing updates, so that the cluster doesn’t try to move data from affected OSD’s, while the server being updated is offline or down. I’ve worked some on automating this with Ansible, but have found my process (and/or my

[ceph-users] Re: CephFS metadata pool size

2024-06-12 Thread Lars Köppel
I am happy to help you with as much information as possible. I probably just don't know where to look for it. Below are the requested information. The cluster is rebuilding the zapped OSD at the moment. This will probably take the next few days. sudo ceph pg ls-by-pool metadata PG OBJECTS DE