If the switch needs an update and needs to be restarted (expected 2 minutes).
Can I just leave the cluster as it is, because ceph will handle this correctly?
Or should I eg. put some vm's I am running in pause mode, or even stop them.
What happens to the monitors? Can they handle this, or mayb
Hi!
I have Ceph cluster version 16.2.7 with this error:
root@s-26-9-19-mon-m1:~# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.91 on s-26-8-2-1 is in error state
But I don't have that osd anymore. I deleted it.
r
Hello,
I've just heard about storage classes and imagined how we could use them
to migrate all S3 objects within a placement pool from an ec pool to a
replicated pool (or vice-versa) for data resiliency reasons, not to save
space.
It looks possible since ;
1. data pools are associated to st
If you can stop VMs it will help, even if the cluster recovers
quickly, VMs take great offense if a write does not finish within
120s, and many will put filesystems in readonly-mode if writes are
delayed for so long, so if there is a 120s outage of IO, the VMs will
be stuck/useless anyhow so you mi
Hi Benjamin,
Apologies that I can't help for the bluestore issue.
But that huge 100GB OSD consumption could be related to similar
reports linked here: https://tracker.ceph.com/issues/53729
Does your cluster have the pglog_hardlimit set?
# ceph osd dump | grep pglog
flags sortbitwise,recovery_de
Hello Michal,
With cephfs and a single filesystem shared across multiple k8s clusters,
you should subvolumegroups to limit data exposure. You'll find an
example of how to use subvolumegroups in the ceph-csi-cephfs helm chart
[1]. Essentially you just have to set the subvolumeGroup to whatever
Le 25/01/2022 à 12:09, Frédéric Nass a écrit :
Hello Michal,
With cephfs and a single filesystem shared across multiple k8s
clusters, you should subvolumegroups to limit data exposure. You'll
find an example of how to use subvolumegroups in the ceph-csi-cephfs
helm chart [1]. Essentially yo
I would still set noout on relevant parts of the cluster in case something
goes south and it does take longer than 2 minutes. Otherwise OSDs will
start outing themselves after 10 minutes or so by default and then you have
a lot of churn going on.
The monitors monitors will be fine unless you lose
Thanks,
I had another review of the configuration and it appears that the
configuration *is* properly propagated to the daemon (also visible in
my second link).
I traced down my issues further and it looks like I have first tripped
over the following issue again...
https://tracker.ceph.com/issue
On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass
wrote:
>
> Hello,
>
> I've just heard about storage classes and imagined how we could use them
> to migrate all S3 objects within a placement pool from an ec pool to a
> replicated pool (or vice-versa) for data resiliency reasons, not to save
> space.
Hey all,
Sorry for the late notice. We will be having a Ceph science/research/big
cluster call on Wednesday January 26th. If anyone wants to discuss
something specific they can add it to the pad linked below. If you have
questions or comments you can contact me.
This is an informal open call
Hello team,
I would like to monitor my ceph cluster using one of the
monitoring tool, does someone has a help on that ?
Michel
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
On Tue, Jan 25, 2022 at 4:07 PM Frank Schilder wrote:
>
> Hi Dan,
>
> in several threads I have now seen statements like "Does your cluster have
> the pglog_hardlimit set?". In this context, I would be grateful if you could
> shed some light on the following:
>
> 1) How do I check that?
>
> Ther
Le 25/01/2022 à 14:48, Casey Bodley a écrit :
On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass
wrote:
Hello,
I've just heard about storage classes and imagined how we could use them
to migrate all S3 objects within a placement pool from an ec pool to a
replicated pool (or vice-versa) for data re
On Tue, Jan 25, 2022 at 11:59 AM Frédéric Nass
wrote:
>
>
> Le 25/01/2022 à 14:48, Casey Bodley a écrit :
> > On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass
> > wrote:
> >> Hello,
> >>
> >> I've just heard about storage classes and imagined how we could use them
> >> to migrate all S3 objects with
Would like to know that as well.
I have the same setup - cephadm, Pacific, CentOS8, and a host with a number of
HDDs which are all connect by 2 paths.
No way to use these without multipath
> ceph orch daemon add osd serverX:/dev/sdax
> Cannot update volume group ceph-51f8b9b0-2917-431d-8a6d-8f
Le 25/01/2022 à 18:28, Casey Bodley a écrit :
On Tue, Jan 25, 2022 at 11:59 AM Frédéric Nass
wrote:
Le 25/01/2022 à 14:48, Casey Bodley a écrit :
On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass
wrote:
Hello,
I've just heard about storage classes and imagined how we could use them
to migrate
Hi Jake,
Many thanks for contributing the data.
Indeed, our data scientists use the data from Backblaze too.
Have you found strong correlations between device health metrics (such as
reallocated sector count, or any combination of attributes) and read/write
errors in /var/log/messages from what
Thank you for your responses!
Since yesterday we found that several OSD pods still had memory limits set,
and in fact some of them (but far from all) were getting OOM killed, so we
have fully removed those limits again. Unfortunately this hasn't helped
much and there are still 50ish OSDs down. W
Is there also (going to be) something available that works 'offline'?
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Hey Igor,
thank you for your response!
>>
>> Do you suggest to disable the HDD write-caching and / or the
>> bluefs_buffered_io for productive clusters?
>>
> Generally upstream recommendation is to disable disk write caching, there
> were multiple complains it might negatively impact the perf
Hello Ceph users,
I have a problem with scheduled snapshots on ceph 16.2.7 (in a Proxmox install).
While trying to understand how snap schedules work, I created more schedules
than I needed to:
root@vis-mgmt:~# ceph fs snap-schedule list /backups/nassie/NAS
/backups/nassie/NAS 1h 24h7d8w12m
/b
Thank you for your email Szabo, these can be helpful , can you provide
links then I start to work on it.
Michel.
On Tue, 25 Jan 2022, 18:51 Szabo, Istvan (Agoda),
wrote:
> Which monitoring tool? Like prometheus or nagios style thing?
> We use sensu for keepalive and ceph health reporting + prom
23 matches
Mail list logo