That's an idea,
moreover I've discovered my two clusters can't be named the same (i.e.
ceph) but I have to change cluster name with environment variable in
/etc/default/ceph (I've deployed ceph via Proxmox 6.x and that's the name
it gives by default to my cluster)
This is kind of an issue cause i
Hi,
I have a healthy (test) cluster running 17.2.5:
root@cephtest20:~# ceph status
cluster:
id: ba37db20-2b13-11eb-b8a9-871ba11409f6
health: HEALTH_OK
services:
mon: 3 daemons, quorum cephtest31,cephtest41,cephtest21 (age 2d)
mgr: cephtest22.lqzdnk(acti
Hi,
I only have one remark on your assumption regarding maintenance with
your current setup. With your profile k4 m2 you'd have a min_size of 5
(k + 1 which is recommended), taking one host down would still result
in IO pause because min_size is not met. To allow IO you'd need to
reduce m
> As services grew, we relied
> more and more on its legacy storage solution, which was never migrated to
> Ceph. Over the last few months, this legacy storage solution had several
> instances of silent data corruption, rendering the VMs unbootable, taking
> down various services, and requiring res
Good morning ceph community,
for quite some time I was wondering if it would not make sense to add an
iftop alike interface to ceph that shows network traffic / iops on a per
IP basis?
I am aware of "rbd perf image iotop", however I am much more interested
into a combined metric featuring 1) Wh
Hi Len,
Indeed, this is not possible with ceph-ansible.
One option would be to do it manually with `ceph-volume lvm migrate`:
(Note that it can be tedious given that it requires a lot of manual
operations, especially for clusters with a large number of OSDs.)
Initial setup:
```
# cat group_vars/
Hi Thomas,
This looks like it requires more investigation than I expected. What's the
current status ?
Did the crashed mds come back and become active ?
Increase the debug log level to 20 and share the mds logs. I will create a
tracker and share it here.
You can upload the mds logs there.
Thanks
Hi Thomas,
I have created the tracker https://tracker.ceph.com/issues/58489 to track
this. Please upload the debug mds logs here.
Thanks,
Kotresh H R
On Wed, Jan 18, 2023 at 4:56 PM Kotresh Hiremath Ravishankar <
khire...@redhat.com> wrote:
> Hi Thomas,
>
> This looks like it requires more inve
Thank you. I'm setting the debug level and await authorization for Tracker.
I'll upload the logs as soon as I can collect them.
Thank you so much for your help
On 18.01.23 12:26, Kotresh Hiremath Ravishankar wrote:
Hi Thomas,
This looks like it requires more investigation than I expected. Wha
Hi all,
we are observing a problem on a libvirt virtualisation cluster that might come
from ceph rbd clients. Something went wrong during execution of a
live-migration operation and as a result we have two instances of the same VM
running on 2 different hosts, the source- and the destination ho
Hi,
We have a full SSD production cluster running on Pacific 16.2.10 and
deployed with cephadm that is experiencing OSD flapping issues.
Essentially, random OSDs will get kicked out of the cluster and then
automatically brought back in a few times a day. As an example, let's
take the case of
On Wed, Jan 18, 2023 at 1:19 PM Frank Schilder wrote:
>
> Hi all,
>
> we are observing a problem on a libvirt virtualisation cluster that might
> come from ceph rbd clients. Something went wrong during execution of a
> live-migration operation and as a result we have two instances of the same VM
Do you have any network congestion or packet loss on the replication network?
are you sharing nics between public / replication? That is another metric that
needs looking into.
From: J-P Methot
Sent: 18 January 2023 12:42
To: ceph-users
Subject: [ceph-users] F
This was my first thought as well, especially if the OSDs log something like
“wrongly marked down”. It’s one of the reasons why I favor not having a
replication network.
> On Jan 18, 2023, at 8:28 AM, Danny Webb wrote:
>
> Do you have any network congestion or packet loss on the replication n
At the network level we're using bonds (802.3ad). There are 2 nics, each
with two 25gbps port. 1 port per nic is used for the public network, the
other for the replication network. That suggests a network bandwidth of
50gbps (in theory) for each network load. The network graph is showing
me loa
Hi Ilya,
thanks a lot for the information. Yes, I was talking about the exclusive lock
feature and was under the impression that only one rbd client can get write
access on connect and will keep it until disconnect. The problem we are facing
with multi-VM write access is, that this will inevita
Hi Guillaume,
thank you very much for the quick clarification and elaborate workaround.
We’ll check if manual migration is feasible with our setup with respect to the
time needed. Alternatively, we’re looking into completely redeploying all
affected OSDs (i.e. shrinking the cluster with ceph-an
In case anyone was wondering, I figured out the problem...
This nasty bug in Pacific 16.2.10 https://tracker.ceph.com/issues/56031 - I
think it is fixed in the upcoming .11 release and in Quincy.
This bug causes the computation of the bluestore DB partition to be much
smaller than it shoul
Do you have CPU soft lock-ups around these times? We had these timeouts due to
using the cfq/bfq disk schedulers with SSDs. The osd_op_tp thread timeout is
typical when CPU lockups happen. Could be a sporadic problem with the disk IO
path.
Best regards,
=
Frank Schilder
AIT Risø
There's nothing in the CPU graph that suggests soft lock-ups at these
times. However, thank you for pointing out that the disk io scheduler
could have an impact. Ubuntu seems to be on mq-deadline by default, so
we just switched to none, as it fits our workload best I believe. I
don't know if th
I'm not sure what you look for in the CPU graph. If its load or a similar
metric you will not see these lock-ups. You need to look into the syslog and
search for it. If these warnings are there, it might give give a clue as to
what hardware component is causing it. They look something like "BUG:
On Wed, Jan 18, 2023 at 3:25 PM Frank Schilder wrote:
>
> Hi Ilya,
>
> thanks a lot for the information. Yes, I was talking about the exclusive lock
> feature and was under the impression that only one rbd client can get write
> access on connect and will keep it until disconnect. The problem we
Hey all! I’ve run into an MDS crash on a cluster recently upgraded from Ceph
16.2.7 to 16.2.10. I’m hitting an assert nearly identical to this one gathered
by the telemetry module:
https://tracker.ceph.com/issues/54747
I have a new build compiling to test whether
https://github.com/ce
Hi,
How to set up values for rgw_keystone_url and other related fields that
are not possible to change via the GUI under cluster configuration. ?
ceph qunicy is deployed using cephadm.
--
Cheers,
Shashi
___
ceph-users mailing list -- ceph-users@cep
Am 18.01.23 um 10:12 schrieb Robert Sander:
root@cephtest20:~# ceph fs status
Error EINVAL: Traceback (most recent call last):
File "/usr/share/ceph/mgr/mgr_module.py", line 1757, in _handle_command
return CLICommand.COMMANDS[cmd['prefix']].call(self, cmd, inbuf)
File "/usr/share/ceph
25 matches
Mail list logo