[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities

2024-06-13 Thread Frédéric Nass
Hello, 'ceph osd deep-scrub 5' deep-scrubs all PGs for which osd.5 is primary (and only those). You can check that from ceph-osd.5.log by running: for pg in $(grep 'deep-scrub starts' /var/log/ceph/*/ceph-osd.5.log | awk '{print $8}') ; do echo "pg: $pg, primary osd is osd.$(ceph pg $pg query -

[ceph-users] Re: Incomplete PGs. Ceph Consultant Wanted

2024-06-25 Thread Frédéric Nass
Hello Wesley, I couldn't find any tracker related to this and since min_size=1 has been involved in many critical situations with data loss, I created this one: https://tracker.ceph.com/issues/66641 Regards, Frédéric. - Le 17 Juin 24, à 19:14, Wesley Dillingham w...@wesdillingham.com a écr

[ceph-users] Re: OSD service specs in mixed environment

2024-06-27 Thread Frédéric Nass
Hi Torkil, Ruben, I see two theoretical ways to do this without additional OSD service. One that probably doesn't work :-) and another one that could work depending on how the orchestrator prioritize its actions based on services criteria. The one that probably doesn't work is by specifying mul

[ceph-users] Re: OSD service specs in mixed environment

2024-06-28 Thread Frédéric Nass
com/show_bug.cgi?id=2219373 [2] https://github.com/ceph/ceph/pull/53803 - Le 28 Juin 24, à 10:34, Torkil Svensgaard tor...@drcmr.dk a écrit : > On 27-06-2024 10:56, Frédéric Nass wrote: >> Hi Torkil, Ruben, > > Hi Frédéric > >> I see two theoretical ways to do this wi

[ceph-users] Re: OSD service specs in mixed environment

2024-06-28 Thread Frédéric Nass
- Le 26 Juin 24, à 10:50, Torkil Svensgaard tor...@drcmr.dk a écrit : > On 26/06/2024 08:48, Torkil Svensgaard wrote: >> Hi >> >> We have a bunch of HDD OSD hosts with DB/WAL on PCI NVMe, either 2 x >> 3.2TB or 1 x 6.4TB. We used to have 4 SSDs pr node for journals before >> bluestore and t

[ceph-users] Re: OSD service specs in mixed environment

2024-06-28 Thread Frédéric Nass
- Le 28 Juin 24, à 15:27, Anthony D'Atri anthony.da...@gmail.com a écrit : >>> >>> But this in a spec doesn't match it: >>> >>> size: '7000G:' >>> >>> This does: >>> >>> size: '6950G:' > > There definitely is some rounding within Ceph, and base 2 vs base 10 > shenanigans. > >> >> $ ce

[ceph-users] Re: Viability of NVMeOF/TCP for VMWare

2024-06-28 Thread Frédéric Nass
We came to the same conclusions as Alexander when we studied replacing Ceph's iSCSI implementation with Ceph's NFS-Ganesha implementation: HA was not working. During failovers, vmkernel would fail with messages like this: 2023-01-14T09:39:27.200Z Wa(180) vmkwarning: cpu18:2098740)WARNING: NFS41:

[ceph-users] Re: Pacific 16.2.15 `osd noin`

2024-07-06 Thread Frédéric Nass
Hi, Another way to prevent data movement at OSD creation time (appart from using norebalance and nobackfill cluster flags) is to pre-create the host buckets in another root, named for example "closet", let the orchestrator create the OSDs and move these host buckets to their final bucket locati

[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2024-07-08 Thread Frédéric Nass
Hello, I just wanted to share that the following command also helped us move slow used bytes back to the fast device (without using bluefs-bdev-expand), when several compactions couldn't: $ cephadm shell --fsid $cid --name osd.${osd} -- ceph-bluestore-tool bluefs-bdev-migrate --path /var/lib/c

[ceph-users] Re: Fixing BlueFS spillover (pacific 16.2.14)

2024-07-09 Thread Frédéric Nass
/ceph-${osd}/block --dev-target /var/lib/ceph/osd/ceph-${osd}/block.db 3/ ceph orch daemon start osd.${osd} 4/ ceph tell osd.${osd} compact Regards, Frédéric. - Le 8 Juil 24, à 17:39, Frédéric Nass frederic.n...@univ-lorraine.fr a écrit : > Hello, > > I just wanted to share

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-12 Thread Frédéric Nass
- Le 11 Juil 24, à 0:23, Richard Bade hitr...@gmail.com a écrit : > Hi Casey, > Thanks for that info on the bilog. I'm in a similar situation with > large omap objects and we have also had to reshard buckets on > multisite losing the index on the secondary. > We also now have a lot of bucket

[ceph-users] Re: Help with Mirroring

2024-07-12 Thread Frédéric Nass
- Le 11 Juil 24, à 20:50, Dave Hall kdh...@binghamton.edu a écrit : > Hello. > > I would like to use mirroring to facilitate migrating from an existing > Nautilus cluster to a new cluster running Reef. RIght now I'm looking at > RBD mirroring. I have studied the RBD Mirroring section of th

[ceph-users] Re: Large omap in index pool even if properly sharded and not "OVER"

2024-07-15 Thread Frédéric Nass
-- > Agoda Services Co., Ltd. > e: [ mailto:istvan.sz...@agoda.com | istvan.sz...@agoda.com ] > ------- > From: Frédéric Nass > Sent: Friday, July 12, 2024 6:52 PM > To: Richard Bade ; Szabo, Istvan (Agoda) > > Cc: Cas

[ceph-users] Re: How to detect condition for offline compaction of RocksDB?

2024-07-16 Thread Frédéric Nass
Hi Rudenko, There's been this bug [1] in the past preventing BlueFS alert from popping up on ceph -s due to some code refactoring. You might just be facing over spilling without noticing. I'm saying this because you're running v16.2.13 and this bug was fixed in v16.2.14 (by [3], based on Pacifi

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-17 Thread Frédéric Nass
Hi David, Redeploying 2 out of 3 MONs a few weeks back (to have them using RocksDB to be ready for Quincy) prevented some clients from connecting to the cluster and mounting cephfs volumes. Before the redeploy, these clients were using port 6789 (v1) explicitly as connections wouldn't work wit

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-17 Thread Frédéric Nass
kend. >> >> But v2 is absent on the public OSD and MDS network >> >> The specific point is that the public network has been changed. >> >> At first, I thought it was the order of declaration of my_host (v1 before v2) >> but apparently, that's

[ceph-users] Re: Unable to mount with 18.2.2

2024-07-18 Thread Frédéric Nass
ces, came back later and succeeded. Maybe that explains it. Cheers, Frédéric. - Le 17 Juil 24, à 16:22, Frédéric Nass frederic.n...@univ-lorraine.fr a écrit : > - Le 17 Juil 24, à 15:53, Albert Shih albert.s...@obspm.fr a écrit : > >> Le 17/07/2024 à 09:40:59+0200,

[ceph-users] Re: How to detect condition for offline compaction of RocksDB?

2024-07-19 Thread Frédéric Nass
Hi Josh, Thank you for sharing this information. Can I ask what symptoms made you interested in tombstones? For the past few months, we've been observing successive waves of a large number of OSDs in overspilling. When the phenomenon occurs, we automatically compact the OSDs (on the fly, one a

[ceph-users] Re: [RGW] Setup 2 zones within a cluster does not sync data

2024-07-29 Thread Frédéric Nass
Hi Huy, The sync result you posted earlier appears to be from master zone. Have you checked the secondary zone with 'radosgw-admin sync status --rgw-zone=hn2'? Can you check that: - sync user exists in the realm with 'radosgw-admin user list --rgw-realm=multi-region' - sync user's access_key an

[ceph-users] Re: Can you return orphaned objects to a bucket?

2024-08-02 Thread Frédéric Nass
Hello, Not sure this exactly matches your case but you could try to reindex those orphan objects with 'radosgw-admin object reindex --bucket {bucket_name}'. See [1] for command arguments, like realm, zonegroup, zone, etc. This command scans the data pool for objects that belong to a given bucket

[ceph-users] Re: Please guide us in identifying the cause of the data miss in EC pool

2024-08-03 Thread Frédéric Nass
Hi, First thing that comes to mind when it comes to data unavailability or inconsistencies after a power outage is that some dirty data may have been lost along the IO path before reaching persistent storage. This can happen with non enterprise grade SSDs using non-persistent cache or with HDDs

[ceph-users] Re: Can you return orphaned objects to a bucket?

2024-08-07 Thread Frédéric Nass
Hi, You're right. The object reindex subcommand backport was rejected for P and is still pending for Q and R. [1] Use rgw-restore-bucket-index script instead. Regards, Frédéric. [1] https://tracker.ceph.com/issues/61405 De : vuphun...@gmail.com Envoyé : mercre

[ceph-users] Re: Please guide us inidentifying thecause ofthedata miss in EC pool

2024-08-07 Thread Frédéric Nass
déric. De : Best Regards Envoyé : jeudi 8 août 2024 08:10 À : Frédéric Nass Cc: ceph-users Objet : Re:Re: Re:Re: Re:Re: Re:Re: [ceph-users] Please guide us inidentifying thecause ofthedata miss in EC pool Hi, Frédéric Nass Thank you for your continued attention and guidance. Let's a

[ceph-users] Re: Please guide us inidentifying thecause ofthedata miss in EC pool

2024-08-08 Thread Frédéric Nass
crashed. Your thoughts? Frédéric. De : Best Regards Envoyé : jeudi 8 août 2024 09:16 À : Frédéric Nass Cc: ceph-users Objet : Re:[ceph-users] Re: Please guide us inidentifying thecause ofthedata miss in EC pool Hi, Frédéric Nass Yes. I checked the host running

[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool

2024-08-09 Thread Frédéric Nass
sure you don't run out of disk space. Best regards, Frédéric. De : Best Regards Envoyé : jeudi 8 août 2024 11:32 À : Frédéric Nass Cc: ceph-users Objet : Re:Re: Re:[ceph-users] Re: Please guide us inidentifying thecauseofthedata miss in EC pool Hi,Fr

[ceph-users] Re: memory leak in mds?

2024-08-18 Thread Frédéric Nass
Hi Dario, A workaround may be to downgrade client's kernel or ceph-fuse version to a lower version than those listed in Enrico's comment #22, I believe. Can't say for sure though since I couldn't verify it myself. Cheers, Frédéric. De : Dario Graña Envoyé : ven

[ceph-users] Re: Pull failed on cluster upgrade

2024-08-21 Thread Frédéric Nass
Hi Nicola, You might want to post in the ceph-dev list about this or discuss it with devs in the ceph-devel slack channel for quicker help. Bests, Frédéric. De : Nicola Mori Envoyé : mercredi 21 août 2024 15:52 À : ceph-users@ceph.io Objet : [ceph-users] Re: Pu

[ceph-users] Re: ceph orch host drain daemon type

2024-08-29 Thread Frédéric Nass
Hello Eugen, A month back, while playing with a lab cluster, I drained a multi-service host (OSDs, MGR, MON, etc.) in order to recreate all of its OSDs. During this operation, all cephadm containers were removed as expected, including the MGR. As a result, I got into a situation where the orche

[ceph-users] Re: ceph orch host drain daemon type

2024-08-29 Thread Frédéric Nass
when a > label > is removed from the host the services eventually drain. > > > > -Original Message- > From: Frédéric Nass > Sent: Thursday, August 29, 2024 11:30 AM > To: Eugen Block > Cc: ceph-users ; dev > Subject: [ceph-users] Re: ceph orch host drain

[ceph-users] Re: squid release codename

2024-08-29 Thread Frédéric Nass
- Le 19 Aoû 24, à 15:45, Yehuda Sadeh-Weinraub yeh...@redhat.com a écrit : > On Sat, Aug 17, 2024 at 9:12 AM Anthony D'Atri wrote: >> >> > It's going to wreak havoc on search engines that can't tell when >> > someone's looking up Ceph versus the long-establish Squid Proxy. >> >> Search engin

[ceph-users] Re: ceph orch host drain daemon type

2024-08-30 Thread Frédéric Nass
hich is already > present in 'ceph orch ps --daemon-type' command. You could either > drain a specific daemon-type or drain the entire host (can be the > default with the same behaviour as it currently works). That would > allow more control about non-osd daemons. > > Zitat

[ceph-users] Re: The journey to CephFS metadata pool’s recovery

2024-09-03 Thread Frédéric Nass
Hi Marco, Have you checked the output of: dd if=/dev/ceph-xxx/osd-block-x of=/tmp/foo bs=4K count=2 hexdump -C /tmp/foo and: /usr/bin/ceph-bluestore-tool show-label --log-level=30 --dev /dev/nvmexxx -l /var/log/ceph/ceph-volume.log to see if it's aligned with OSD's metadata. You

[ceph-users] Re: The journey to CephFS metadata pool’s recovery

2024-09-03 Thread Frédéric Nass
it be that the LV > is > not correctly mapped? > Basically here the question is: is there a way to recover the data of an OSD > in > an LV, if it was ceph osd purge before the cluster had a chance to replicate > it > (after ceph osd out )? > Thanks for your time! > fm

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-09 Thread Frédéric Nass
Hi Istvan, This can only ease when adding new storage capacity to the cluster (and maybe when data migration is involved like when changing cluster's topology or crush rules?). When adding new nodes, PGs will be remapped to make use of the new OSDs, which will trigger some data migration. The

[ceph-users] Re: Somehow throotle recovery even further than basic options?

2024-09-10 Thread Frédéric Nass
c. - Le 9 Sep 24, à 17:15, Frédéric Nass frederic.n...@univ-lorraine.fr a écrit : > Hi Istvan, > > This can only ease when adding new storage capacity to the cluster (and maybe > when data migration is involved like when changing cluster's topology or crush > rules?). >

[ceph-users] Re: Ceph RBD w/erasure coding

2024-09-16 Thread Frédéric Nass
As a reminder, there's this one waiting ;-) https://tracker.ceph.com/issues/66641 Frédéric. PS: For the record, Andre's problem was related to the 'caps' (https://www.reddit.com/r/ceph/comments/1ffzfjc/ceph_rbd_werasure_coding/) - Le 15 Sep 24, à 18:02, Anthony D'Atri anthony.da...@gmail.c

[ceph-users] Re: Metric or any information about disk (block) fragmentation

2024-09-16 Thread Frédéric Nass
Hey, Yes, you can use either of these commands depending on whether or not you are using containers to get live OSDs's bluestore fragmentation: ceph daemon osd.0 bluestore allocator score block or cephadm shell ceph daemon osd.0 bluestore allocator score block ... { "fragmentation_rating":

[ceph-users] OverlayFS with Cephfs to mount a snapshot read/write

2020-11-09 Thread Frédéric Nass
Hello, I would like to use a cephfs snapshot as a read/write volume without having to clone it first as the cloning operation is - if I'm not mistaken - still inefficient as of now. This is for a data restore use case with Moodle application needing a writable data directory to start. The id

[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-09 Thread Frédéric Nass
ile: upperdir user.name="upperdir" Are you able to modify the content of a snapshot directory using overlayfs on your side? Frédéric. Le 09/11/2020 à 12:39, Luis Henriques a écrit : Frédéric Nass writes: Hello, I would like to use a cephfs snapshot as a read/write volume without havin

[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-09 Thread Frédéric Nass
Luis, I gave RHEL 8 and kernel 4.18 a try and it's working perfectly! \o/ Same commands, same mount options. Does anyone know why and if there's any chances I can have this working with CentOS/RHEL 7 and 3.10 kernel? Best regards, Frédéric. Le 09/11/2020 à 15:04, Frédéric Na

[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-09 Thread Frédéric Nass
I feel lucky to have you on this one. ;-) Do you mean applying a specific patch on 3.10 kernel? Or is this one too old to have it working anyways. Frédéric. Le 09/11/2020 à 19:07, Luis Henriques a écrit : Frédéric Nass writes: Hi Luis, Thanks for your help. Sorry I forgot about the

[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-11 Thread Frédéric Nass
rself. No RHEL7 kernels have that patch (so far). Newer RHEL8 kernels _do_ if that's an option for you. -- Jeff On Mon, 2020-11-09 at 19:21 +0100, Frédéric Nass wrote: I feel lucky to have you on this one. ;-) Do you mean applying a specific patch on 3.10 kernel? Or is this one too old to hav

[ceph-users] Re: OverlayFS with Cephfs to mount a snapshot read/write

2020-11-13 Thread Frédéric Nass
RHEL8 kernels _do_ if that's an > option for you. > -- Jeff > > On Mon, 2020-11-09 at 19:21 +0100, Frédéric Nass wrote: >> I feel lucky to have you on this one. ;-) Do you mean applying a >> specific patch on 3.10 kernel? Or is this one too old to have it working >>

[ceph-users] Re: NoSuchKey on key that is visible in s3 list/radosgw bk

2020-11-23 Thread Frédéric Nass
Hi Denis, You might want to look at rgw_gc_obj_min_wait from [1] and try increasing the default value of 7200s (2 hours) to whatever suits your need < 2^64. Just remind that at some point you'll have to get these objects processed by the gc. Or manually through the API [2]. One thing that co

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Frédéric Nass
Hi Stefan, Initial data removal could also have resulted from a snapshot removal leading to OSDs OOMing and then pg remappings leading to more removals after OOMed OSDs rejoined the cluster and so on. As mentioned by Igor : "Additionally there are users' reports that recent default value's m

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-14 Thread Frédéric Nass
I forgot to mention "If with bluefs_buffered_io=false, the %util is over 75% most of the time ** during data removal (like snapshot removal) **, then you'd better change it to true." Regards, Frédéric. Le 14/12/2020 à 21:35, Frédéric Nass a écrit : Hi Stefan, Initial dat

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-16 Thread Frédéric Nass
a healthy state. Thanks, Stefan On 12/14/20, 3:35 PM, "Frédéric Nass" wrote: Hi Stefan, Initial data removal could also have resulted from a snapshot removal leading to OSDs OOMing and then pg remappings leading to more removals after OOMed OSDs rejoined the cluster

[ceph-users] Re: OSD reboot loop after running out of memory

2020-12-16 Thread Frédéric Nass
, Frédéric Nass a écrit : Hi Sefan, This has me thinking that the issue your cluster may be facing is probably with bluefs_buffered_io set to true, as this has been reported to induce excessive swap usage (and OSDs flapping or OOMing as consequences) in some versions starting from Nautilus I

[ceph-users] Re: Ceph Outage (Nautilus) - 14.2.11

2020-12-16 Thread Frédéric Nass
Hi Suresh, 24 HDDs backed by only by 2 NVMes looks like a high ratio. What triggers my bell in your post is "upgraded from Luminous to Nautilus" and "Elasticsearch" which mainly reads to index data and also "memory leak". You might want to take a look at the current value of bluefs_buffered_i

[ceph-users] Re: CephFS max_file_size

2021-03-25 Thread Frédéric Nass
o the file size) really existed. The max_file_size setting prevents users from creating files that appear to be eg. exabytes in size, causing load on the MDS as it tries to enumerate the objects during operations like stats or deletes." Thought it might help. -- Cordialement, Fré

[ceph-users] Re: Cephfs metadata and MDS on same node

2021-03-26 Thread Frédéric Nass
rimary OSD is local to the MDS the client's talking to which in realy life is impossible to achieve as you cannot pin cephfs trees and their related metadata objects to specific PGs. Best regards, Frédéric. -- Cordialement, Frédéric Nass Direction du Numérique Sous-Direction Infrastructu

[ceph-users] Re: osd_memory_target=level0 ?

2021-09-30 Thread Frédéric Nass
to 4k. If your OSDs were created with 32k alloc size then it might explain the unexpected overspilling with a lot of objects in the cluster. Hope that helps, Regards, Frédéric. -- Cordialement, Frédéric Nass Direction du Numérique Sous-direction Infrastructures et Services Tél : 03.72.74.11.

[ceph-users] How does mclock work?

2024-01-09 Thread Frédéric Nass
reads.   With HDD only setups (RocksDB+WAL+Data on HDD), if mclock only considers write performance, the OSD may not take advantage of higher read performance.   Can someone please shed some light on this?   Best regards, Frédéric Nass Sous-direction Infrastructures et Services

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass
Hello,   We've had a similar situation recently where OSDs would use way more memory than osd_memory_target and get OOM killed by the kernel. It was due to a kernel bug related to cgroups [1].   If num_cgroups below keeps increasing then you may hit this bug.   $ cat /proc/cgroups | grep

[ceph-users] Re: 3 DC with 4+5 EC not quite working

2024-01-12 Thread Frédéric Nass
Hello Torkil,   We're using the same ec scheme than yours with k=5 and m=4 over 3 DCs with the below rule:   rule ec54 {         id 3         type erasure         min_size 3         max_size 9         step set_chooseleaf_tries 5         step set_choose_tries 100         step take def

[ceph-users] Re: Ceph Nautilous 14.2.22 slow OSD memory leak?

2024-01-12 Thread Frédéric Nass
the suggestions. We are using the valilla Linux 4.19 LTS version. Do you think we may be suffering from the same bug?   best regards,   Samuel   huxia...@horebdata.cn   From: Frédéric Nass Date: 2024-01-12 09:19 To:  huxiaoyu CC: ceph-users Subject: Re: [ceph-users] Ceph Nautilous 14.2

[ceph-users] Re: How does mclock work?

2024-01-16 Thread Frédéric Nass
Sridhar,   Thanks a lot for this explantation. It's clearer now.   So at the end of the day (at least with balanced profile) it's a lower bound and no upper limit and a balanced distribution between client and cluster IOPS.   Regards, Frédéric. -Message original- De: Sr

[ceph-users] Re: [Urgent] Ceph system Down, Ceph FS volume in recovering

2024-03-16 Thread Frédéric Nass
  Hello Van Diep,   I read this after you got out of trouble.   According to your ceph osd tree, it looks like your problems started when the ceph orchestrator created osd.29 on node 'cephgw03' because it looks very unlikely that you created a 100MB OSD on a node that's named after "GW".

[ceph-users] Leaked clone objects

2024-03-19 Thread Frédéric Nass
  Hello,   Over the last few weeks, we have observed a abnormal increase of a pool's data usage (by a factor of 2). It turns out that we are hit by this bug [1].   In short, if you happened to take pool snapshots and removed them by using the following command   'ceph osd pool rmsnap

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
Hello Michel, Pierre also suggested checking the performance of this OSD's device(s) which can be done by running a ceph tell osd.x bench. One think I can think of is how the scrubbing speed of this very OSD could be influenced by mclock sheduling, would the max iops capacity calculated by thi

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
others have a value ~5 which I find also very low (all OSDs are using the same recent HW/HDD). Thanks for these informations. I'll follow your suggestions to rerun the benchmark and report if it improved the situation. Best regards, Michel Le 22/03/2024 à 12:18, Frédéric Nass a éc

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
I find also very low (all > OSDs are using the same recent HW/HDD). > > Thanks for these informations. I'll follow your suggestions to rerun > the benchmark and report if it improved the situation. > > Best regards, > > Michel > > Le 22/03/2024 à 12:18, Frédéric

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-22 Thread Frédéric Nass
nal- De: Kai à: Frédéric Cc: Michel ; Pierre ; ceph-users Envoyé: vendredi 22 mars 2024 18:32 CET Sujet : Re: [ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month On Fri, Mar 22, 2024 at 04:29:21PM +0100, Frédéric Nass wrote: >A/ these incredibly low values were

[ceph-users] Re: Reef (18.2): Some PG not scrubbed/deep scrubbed for 1 month

2024-03-23 Thread Frédéric Nass
not be set seems relevant. CC'ing Sridhar to have his thoughts. Cheers, Frédéric. - Le 22 Mar 24, à 19:37, Kai Stian Olstad ceph+l...@olstad.com a écrit : > On Fri, Mar 22, 2024 at 06:51:44PM +0100, Frédéric Nass wrote: >> >>> The OSD run bench and update osd_mclock_

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
Hello Eugen, Is this cluster using WPQ or mClock scheduler? (cephadm shell ceph daemon osd.0 config show | grep osd_op_queue) If WPQ, you might want to tune osd_recovery_sleep* values as they do have a real impact on the recovery/backfilling speed. Just lower osd_max_backfills to 1 before doi

[ceph-users] Re: Impact of large PG splits

2024-04-12 Thread Frédéric Nass
ell enough or not) after adding new OSDs. BTW, what ceph version is this? You should make sure you're running v16.2.11+ or v17.2.4+ before splitting PGs to avoid this nasty bug: https://tracker.ceph.com/issues/53729 Cheers, Frédéric. - Le 12 Avr 24, à 10:41, Frédéric Nass fre

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass
Hello Albert, Have you check the hardware status of the involved drives other than with smartctl? Like with the manufacturer's tools / WebUI (iDrac / perccli for DELL hardware for example). If these tools don't report any media error (that is bad blocs on disks) then you might just be facing t

[ceph-users] Re: PG inconsistent

2024-04-12 Thread Frédéric Nass
- Le 12 Avr 24, à 15:17, Albert Shih albert.s...@obspm.fr a écrit : > Le 12/04/2024 à 12:56:12+0200, Frédéric Nass a écrit >> > Hi, > >> >> Have you check the hardware status of the involved drives other than with >> smartctl? Like with the manufacturer

[ceph-users] Re: cephadm custom jinja2 service templates

2024-04-17 Thread Frédéric Nass
Hello Felix, You can download haproxy.cfg.j2 and keepalived.conf.j2 from here [1], tweak them to your needs and set them via: ceph config-key set mgr/cephadm/services/ingress/haproxy.cfg -i haproxy.cfg.j2 ceph config-key set mgr/cephadm/services/ingress/keepalived.conf -i keepalived.conf.j2

[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Frédéric Nass
Hello, My turn ;-) Ceph is strongly consistent. Either you read/write objects/blocs/files with an insured strong consistency OR you don't. Worst thing you can expect from Ceph, as long as it's been properly designed, configured and operated is a temporary loss of access to the data. There are

[ceph-users] Re: Why CEPH is better than other storage solutions?

2024-04-23 Thread Frédéric Nass
. Regards, Frédéric. - Le 23 Avr 24, à 13:04, Janne Johansson icepic...@gmail.com a écrit : > Den tis 23 apr. 2024 kl 11:32 skrev Frédéric Nass > : >> Ceph is strongly consistent. Either you read/write objects/blocs/files with >> an >> insured strong consistency OR yo

[ceph-users] Re: Orchestrator not automating services / OSD issue

2024-04-23 Thread Frédéric Nass
Hello Michael, You can try this: 1/ check that the host shows up on ceph orch ls with the right label 'osds' 2/ check that the host is OK with ceph cephadm check-host . It should look like: (None) ok podman (/usr/bin/podman) version 4.6.1 is present systemctl is present lvcreate is present Unit

[ceph-users] Re: Impact of large PG splits

2024-04-25 Thread Frédéric Nass
s all too easy to forget to >> reduce them later, or think that it's okay to run all the time with >> reduced headroom. >> >> Until a host blows up and you don't have enough space to recover into. >> >>> On Apr 12, 2024, at 05:01, Frédéric Nass >>

[ceph-users] Re: MDS crash

2024-04-26 Thread Frédéric Nass
Hello, 'almost all diagnostic ceph subcommands hang!' -> this triggered my bell. We've had a similar issue with many ceph commands hanging due to a missing L3 ACL between MGRs and a new MDS machine that we added to the cluster. I second Eugen analysis: network issue, whatever the OSI layer. Re

[ceph-users] Re: Problem with take-over-existing-cluster.yml playbook

2024-05-14 Thread Frédéric Nass
Hello Vlad, We've seen this before a while back. Not sure to recall how we got around this but you might want to try setting 'ip_version: ipv4' in your all.yaml file since this seems to be a condition to the facts setting. - name: Set_fact _monitor_addresses - ipv4 ansible.builtin.set_fact:

[ceph-users] Re: Problem with take-over-existing-cluster.yml playbook

2024-05-14 Thread Frédéric Nass
#x27;t work either. > Regards, > [ https://about.me/vblando | Vlad Blando ] > On Tue, May 14, 2024 at 4:10 PM Frédéric Nass < [ > mailto:frederic.n...@univ-lorraine.fr | frederic.n...@univ-lorraine.fr ] > > wrote: >> Hello Vlad, >> We've seen this before a

[ceph-users] Re: Problem with take-over-existing-cluster.yml playbook

2024-05-14 Thread Frédéric Nass
d/or group_vars/*.yaml files. You can also try adding multiple - on the ansible-playbook command and see if you get something useful. Regards, Frédéric. De : vladimir franciz blando Envoyé : mardi 14 mai 2024 21:23 À : Frédéric Nass Cc: Eugen Block; ceph-us

[ceph-users] Re: User + Dev Meetup Tomorrow!

2024-05-24 Thread Frédéric Nass
Hello everyone, Nice talk yesterday. :-) Regarding containers vs RPMs and orchestration, and the related discussion from yesterday, I wanted to share a few things (which I wasn't able to share yesterday on the call due to a headset/bluetooth stack issue) to explain why we use cephadm and ceph

[ceph-users] Re: User + Dev Meetup Tomorrow!

2024-05-24 Thread Frédéric Nass
containers or distribution packages? > * Do you run bare-metal or virtualized? > > Best, > Sebastian > > Am 24.05.24 um 12:28 schrieb Frédéric Nass: >> Hello everyone, >> >> Nice talk yesterday. :-) >> >> Regarding containers vs RPMs and orchestrat

[ceph-users] Re: How to setup NVMeoF?

2024-05-30 Thread Frédéric Nass
Hello Robert, You could try: ceph config set mgr mgr/cephadm/container_image_nvmeof "quay.io/ceph/nvmeof:1.2.13" or whatever image tag you need (1.2.13 is current latest). Another way to run the image is by editing the unit.run file of the service or by directly running the container with pod

[ceph-users] Re: Excessively Chatty Daemons RHCS v5

2024-06-07 Thread Frédéric Nass
Hi Joshua, These messages actually deserve more attention than you think, I believe. You may hit this one [1] that Mark (comment #4) also hit with 16.2.10 (RHCS 5). PR's here: https://github.com/ceph/ceph/pull/51669 Could you try raising osd_max_scrubs to 2 or 3 (now defaults to 3 in quincy and

[ceph-users] Re: Testing CEPH scrubbing / self-healing capabilities

2024-06-07 Thread Frédéric Nass
Hello Petr, - Le 4 Juin 24, à 12:13, Petr Bena petr@bena.rocks a écrit : > Hello, > > I wanted to try out (lab ceph setup) what exactly is going to happen > when parts of data on OSD disk gets corrupted. I created a simple test > where I was going through the block device data until I found

[ceph-users] Re: Increase the recovery throughput

2022-12-26 Thread Frédéric Nass
Hi Monish, You might also want to check the values of osd_recovery_sleep_* if they are not the default ones. Regards, Frédéric. - Le 12 Déc 22, à 11:32, Monish Selvaraj mon...@xaasability.com a écrit : > Hi Eugen, > > We tried that already. the osd_max_backfills is in 24 and the > osd_rec

[ceph-users] Re: iscsi target lun error

2023-01-12 Thread Frédéric Nass
Hi Xiubo, Randy, This is due to ' host.containers.internal' being added to the container's /etc/hosts since Podman 4.1+. The workaround consists of either downgrading Podman package to v4.0 (on RHEL8, dnf downgrade podman-4.0.2-6.module+el8.6.0+14877+f643d2d6) or adding the --no-hosts option t

[ceph-users] Re: Crushmap rule for multi-datacenter erasure coding

2023-04-04 Thread Frédéric Nass
Hello Michel, What you need is: step choose indep 0 type datacenter step chooseleaf indep 2 type host step emit I think you're right about the need to tweak the crush rule by editing the crushmap directly. Regards Frédéric. - Le 3 Avr 23, à 18:34, Michel Jouvin mic

[ceph-users] Critical Information: DELL/Toshiba SSDs dying after 70,000 hours of operation

2023-06-19 Thread Frédéric Nass
Hello, This message does not concern Ceph itself but a hardware vulnerability which can lead to permanent loss of data on a Ceph cluster equipped with the same hardware in separate fault domains. The DELL / Toshiba PX02SMF020, PX02SMF040, PX02SMF080 and PX02SMB160 SSD drives of the 13G gene

[ceph-users] Re: Critical Information: DELL/Toshiba SSDs dying after 70,000 hours of operation

2023-09-01 Thread Frédéric Nass
SSD drives back to life with their data (after the upgrade, you may need to import foreign config by pressing 'F' key on the next start) Many thanks to DELL French TAMs and DELL engineering for providing this firmware in a short time. Best regards, Frédéric. - Le 19 Juin 23

[ceph-users] Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
ic. -- Cordialement, Frédéric Nass Direction du Numérique Sous-direction Infrastructures et Services Tél : 03.72.74.11.35 ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: CephFS keyrings for K8s

2022-01-25 Thread Frédéric Nass
arted. Let me know if you see any clever/safer caps to use. Regards, Frédéric. [1] https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-cephfs/values.yaml#L20 [2] https://github.com/ceph/ceph-csi/blob/devel/charts/ceph-csi-rbd/values.yaml#L20 [3] https://github.com/ceph/ceph-csi/blob/de

[ceph-users] Re: CephFS keyrings for K8s

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 12:09, Frédéric Nass a écrit : Hello Michal, With cephfs and a single filesystem shared across multiple k8s clusters, you should subvolumegroups to limit data exposure. You'll find an example of how to use subvolumegroups in the ceph-csi-cephfs helm chart [1]. Essent

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 14:48, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass wrote: Hello, I've just heard about storage classes and imagined how we could use them to migrate all S3 objects within a placement pool from an ec pool to a replicated pool (or vice-versa) for

[ceph-users] Re: Moving all s3 objects from an ec pool to a replicated pool using storage classes.

2022-01-25 Thread Frédéric Nass
Le 25/01/2022 à 18:28, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 11:59 AM Frédéric Nass wrote: Le 25/01/2022 à 14:48, Casey Bodley a écrit : On Tue, Jan 25, 2022 at 4:49 AM Frédéric Nass wrote: Hello, I've just heard about storage classes and imagined how we could use th

[ceph-users] Do not use VMware Storage I/O Control with Ceph iSCSI GWs!

2022-01-26 Thread Frédéric Nass
e I/O Control **and** statistics collection" on each Datastore. Regards, Frédéric. -- Cordialement, Frédéric Nass Direction du Numérique Sous-direction Infrastructures et Services Tél : 03.72.74.11.35 ___ ceph-users mailing list -- ceph-users@

[ceph-users] Re: Forced upgrade OSD from Luminous to Pacific

2024-10-09 Thread Frédéric Nass
root@helper:~# ceph osd require-osd-release mimic > Error EPERM: not all up OSDs have CEPH_FEATURE_SERVER_MIMIC feature > > On 09.10.24 15:18, Frédéric Nass wrote: >> Here's an example of what a Pacific cluster upgraded from Hammer shows: >> >> $ ceph osd dump | head -13

[ceph-users] Re: Forced upgrade OSD from Luminous to Pacific

2024-10-09 Thread Frédéric Nass
- Le 8 Oct 24, à 15:24, Alex Rydzewski rydzewski...@gmail.com a écrit : > Hello, dear community! > > I kindly ask for your help in resolving my issue. > > I have a server with a single-node CEPH setup with 5 OSDs. This server > has been powered off for about two years, and when I needed the

[ceph-users] Re: Forced upgrade OSD from Luminous to Pacific

2024-10-09 Thread Frédéric Nass
a7d43a51b03) pacific > (stable) > 3. > Yes, now MON started and OSDs started, but they cannot connect to MON. At the > same time, the MON journal has a message: > disallowing boot of octopus+ OSD osd.xx > And I tried rebuild the MON with this ceph (Pacific) version and it is runn

[ceph-users] Re: Forced upgrade OSD from Luminous to Pacific

2024-10-09 Thread Frédéric Nass
luminous > stretch_mode_enabled false > > You are right, I did't run such commands. This is because I have two > other clusters that I have gradulally upgraded to Quincy from Luminous, > but following the proxmox instructions, and I don't see there any such. > > On 09.10

[ceph-users] Re: cephadm crush_device_class not applied

2024-10-04 Thread Frédéric Nass
Hey Eugen, Check this one here: https://github.com/ceph/ceph/pull/55534 It's fixed in 18.2.4 and should be in upcoming 17.2.8. Cheers, Frédéric. De : Eugen Block Envoyé : jeudi 3 octobre 2024 23:21 À : ceph-users@ceph.io Objet : [ceph-users] Re: cephadm crush_d

[ceph-users] Re: v19 & IPv6: unable to convert chosen address to string

2024-10-02 Thread Frédéric Nass
BTW, running first Squid stable release (v19.2.0) in production seems a bit audacious at this time. :-) Frédéric. - Le 2 Oct 24, à 9:03, Frédéric Nass frederic.n...@univ-lorraine.fr a écrit : > Hi, > > Probably. This one [1] was posted 2 months ago. No investigations yet. >

[ceph-users] Re: Help with "27 osd(s) are not reachable" when also "27 osds: 27 up.. 27 in"

2024-10-16 Thread Frédéric Nass
Hi Harry, Do you have a 'cluster_network' set to the same subnet as the 'public_network' like in the issue [1]? Doesn't make much sens setting up a cluster_network when it's not different than the public_network. Maybe that's what triggers the OSD_UNREACHABLE recently coded here [2] (even thoug

  1   2   3   4   >