Hi, there seem to be replies missing to this list. For example, I can't find
any messages that contain information that could lead to this conclusion:
> * pg_num too low (defaults are too low)
> * pg_num not a power of 2
> * pg_num != number of OSDs in the pool
> * balancer not enabled
It is hor
On 6/12/24 5:43 AM, Szabo, Istvan (Agoda) wrote:
Hi,
Wonder how radosgw knows that a transaction is done and didn't break the
connection between the user interface and gateway?
Let's see this is one request:
2024-06-12T16:26:03.386+0700 7fa34c7f0700 1 beast: 0x7fa5bc776750: 1.1.1.1 - -
[202
Interesting. How do you set this "maintenance mode"? If you have a series of
documented steps that you have to do and could provide as an example, that
would be beneficial for my efforts.
We are in the process of standing up both a dev-test environment consisting of
3 Ceph servers (strictly for
Which version did you upgrade from to 18.2.2?
I can’t pin it down to a specific issue, but somewhere in the back of
my mind is something related to a new omap format or something. But
I’m really not sure at all.
Zitat von Lars Köppel :
I am happy to help you with as much information as pos
That's just setting noout, norebalance, etc.
> On Jun 12, 2024, at 11:28, Michael Worsham
> wrote:
>
> Interesting. How do you set this "maintenance mode"? If you have a series of
> documented steps that you have to do and could provide as an example, that
> would be beneficial for my efforts
There’s also a maintenance mode available for the orchestrator:
https://docs.ceph.com/en/reef/cephadm/host-management/#maintenance-mode
There’s some more information about that in the dev section:
https://docs.ceph.com/en/reef/dev/cephadm/host-maintenance/
Zitat von Anthony D'Atri :
That's ju
I have two ansible roles, one for enter, one for exit. There’s likely better
ways to do this — and I’ll not be surprised if someone here lets me know.
They’re using orch commands via the cephadm shell. I’m using Ansible for other
configuration management in my environment, as well, including s
On 12 June 2024 13:19:10 UTC, "Lars Köppel" wrote:
>I am happy to help you with as much information as possible. I probably
>just don't know where to look for it.
>Below are the requested information. The cluster is rebuilding the
>zapped OSD at the moment. This will probably take the next
> We made a mistake when we moved the servers physically so while the
> replica 3 is intact the crush tree is not accurate.
>
> If we just remedy the situation with "ceph osd crush move ceph-flashX
> datacenter=Y" we will just end up with a lot of misplaced data and some
> churn, right? Or will the
Hi
We have 3 servers for replica 3 with failure domain datacenter:
-1 4437.29248 root default
-33 1467.84814 datacenter 714
-69 69.86389 host ceph-flash1
-34 1511.25378 datacenter HX1
-73 69.86389 host ceph-fl
Correct, this should only result in misplaced objects.
> We made a mistake when we moved the servers physically so while the replica 3
> is intact the crush tree is not accurate.
Can you elaborate on that? Does this mean after the move, multiple hosts are
inside the same physical datacenter? I
On 12/06/2024 10:22, Matthias Grandl wrote:
Correct, this should only result in misplaced objects.
> We made a mistake when we moved the servers physically so while the
replica 3 is intact the crush tree is not accurate.
Can you elaborate on that? Does this mean after the move, multiple ho
Yeah that should work no problem.
In this case I would even recommend setting `norebalance` and using the trusty
old upmap-remapped script (credits to Cern), to avoid unnecessary data
movements:
https://github.com/cernceph/ceph-scripts/blob/master/tools/upmap/upmap-remapped.py
Cheers!
--
Matt
Since my last update the size of the largest OSD increased by 0.4 TiB while
the smallest one only increased by 0.1 TiB. How is this possible?
Because the metadata pool reported to have only 900MB space left, I stopped
the hot-standby MDS. This gave me 8GB back but these filled up in the last
2h.
I
Hi,
Wonder how radosgw knows that a transaction is done and didn't break the
connection between the user interface and gateway?
Let's see this is one request:
2024-06-12T16:26:03.386+0700 7fa34c7f0700 1 beast: 0x7fa5bc776750: 1.1.1.1 - -
[2024-06-12T16:26:03.386063+0700] "PUT /bucket/0/2/9663
On 12/06/2024 11:20, Matthias Grandl wrote:
Yeah that should work no problem.
In this case I would even recommend setting `norebalance` and using the
trusty old upmap-remapped script (credits to Cern), to avoid unnecessary
data movements:
https://github.com/cernceph/ceph-scripts/blob/master
I don't have any good explanation at this point. Can you share some
more information like:
ceph pg ls-by-pool
ceph osd df (for the relevant OSDs)
ceph df
Thanks,
Eugen
Zitat von Lars Köppel :
Since my last update the size of the largest OSD increased by 0.4 TiB while
the smallest one only
What is the proper way to patch a Ceph cluster and reboot the servers in said
cluster if a reboot is necessary for said updates? And is it possible to
automate it via Ansible? This message and its attachments are from Data
Dimensions and are intended only for the use of the individual or entity
Do you mean patching the OS?
If so, easy -- one node at a time, then after it comes back up, wait until all
PGs are active+clean and the mon quorum is complete before proceeding.
> On Jun 12, 2024, at 07:56, Michael Worsham
> wrote:
>
> What is the proper way to patch a Ceph cluster and reb
If you have:
* pg_num too low (defaults are too low)
* pg_num not a power of 2
* pg_num != number of OSDs in the pool
* balancer not enabled
any of those might result in imbalance.
> On Jun 12, 2024, at 07:33, Eugen Block wrote:
>
> I don't have any good explanation at this point. Can you shar
There’s also a Maintenance mode that you can set for each server, as you’re
doing updates, so that the cluster doesn’t try to move data from affected
OSD’s, while the server being updated is offline or down. I’ve worked some on
automating this with Ansible, but have found my process (and/or my
I am happy to help you with as much information as possible. I probably
just don't know where to look for it.
Below are the requested information. The cluster is rebuilding the
zapped OSD at the moment. This will probably take the next few days.
sudo ceph pg ls-by-pool metadata
PG OBJECTS DE
22 matches
Mail list logo