[ceph-users] Re: Deep scrub debug option

2023-02-09 Thread Frank Schilder
Hi Bob, deep scrub on HDDs has, at least in newer versions, a negligible effect on performance even with default settings (with op_queue wpq and cut-off high). You might be affected by a combination of two issues: a change of OSD meta that happens with bcache devices on reboot and cache promoti

[ceph-users] Re: OSD logs missing from Centralised Logging

2023-02-09 Thread Tarrago, Eli (RIS-BCT)
Please include your promtai; logs, loki logs, promtail configuration, and your loki configuration. From: Peter van Heusden Date: Wednesday, February 8, 2023 at 7:45 AM To: ceph-users@ceph.io Subject: [ceph-users] OSD logs missing from Centralised Logging You don't often get email from p...@sanb

[ceph-users] Re: OSD logs missing from Centralised Logging

2023-02-09 Thread Peter van Heusden
Thanks Eli I found the problem - the old logs were owned by a user ceph (uid 1001) and since the move to cephadm the daemons run as uid / gid 167. Thus they didn't have permission to write to the existing logs / log directories. Chown-ing the log directories to be owned by 167:167 and restarting t

[ceph-users] Throttle down rebalance with Quincy

2023-02-09 Thread Victor Rodriguez
Hello, I'm adding OSDs to a 5 node cluster using Quincy 17.2.5. The network is a bonded 2x10G link. The issue I'm having is that the rebalance operation seems to impact client I/O and running VMs do not . OSDs are big 6'4TB NVMe drives, so there will be a lot of data to move. With previous v

[ceph-users] Frequent calling monitor election

2023-02-09 Thread Frank Schilder
Hi all, our monitors have enjoyed democracy since the beginning. However, I don't share a sudden excitement about voting: 2/9/23 4:42:30 PM[INF]overall HEALTH_OK 2/9/23 4:42:30 PM[INF]mon.ceph-01 is new leader, mons ceph-01,ceph-02,ceph-03,ceph-25,ceph-26 in quorum (ranks 0,1,2,3,4) 2/9/23 4:42

[ceph-users] No such file or directory when issuing "rbd du"

2023-02-09 Thread Mehmet
Hello Friends, i have a strange output when issuing following command root@node35:~# rbd du -p cephhdd-001-mypool NAME PROVISIONED USED ... vm-99936587-disk-0@H202302091535 400 GiB 5.2 GiB vm-99936587-disk-0@H202302091635 400 GiB 1.2 GiB vm-99936587-disk

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Dan van der Ster
Hi Frank, Check the mon logs with some increased debug levels to find out what the leader is busy with. We have a similar issue (though, daily) and it turned out to be related to the mon leader timing out doing a SMART check. See https://tracker.ceph.com/issues/54313 for how I debugged that. Chee

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Gregory Farnum
Also, that the current leader (ceph-01) is one of the monitors proposing an election each time suggests the problem is with getting commit acks back from one of its followers. On Thu, Feb 9, 2023 at 8:09 AM Dan van der Ster wrote: > > Hi Frank, > > Check the mon logs with some increased debug lev

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Frank Schilder
Hi Dan and Gregory, thanks! These are good pointers. Will look into that tomorrow. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Gregory Farnum Sent: 09 February 2023 17:12:23 To: Dan van der Ster Cc: Frank Sch

[ceph-users] Re: Corrupt bluestore after sudden reboot (17.2.5)

2023-02-09 Thread Peter van Heusden
I am trying to do this, but the log file is 26 GB and growing. Is there perhaps a subset of the logs that would be useful? Peter On Mon, 16 Jan 2023 at 18:42, wrote: > Hi Peter, > > Could you add debug_bluestore = 20 to your ceph.conf and restart the OSD, > then send the log after it crashes? >

[ceph-users] OSD fail to authenticate after node outage

2023-02-09 Thread tsmgeek
Release: 16.2.7 (pacific) Infra: 4 x Nodes (4xOSD HDD), 3 x Nodes (mon/mds, 1 x OSD NVMe) We recently had a couple of node which went offline unexpectedly triggering a rebalance which is still ongoing. The OSDs on the restarted node are marked as down and they keep showing in the log `authentica

[ceph-users] Rotate lockbox keyring

2023-02-09 Thread Zhongzhou Cai
Hi, I'm on Ceph 16.2.10, and I'm trying to rotate the ceph lockbox keyring. I used ceph-authtool to create a new keyring, and used `ceph auth import -i ` to update the lockbox keyring. I also updated the keyring file, which is /var/lib/ceph/osd/ceph-/lockbox.keyring. Then I ran `systemctl restart

[ceph-users] [Quincy] Module 'devicehealth' has failed: disk I/O error

2023-02-09 Thread Satish Patel
Folks, Any idea what is going on, I am running 3 node quincy version of openstack and today suddenly i noticed the following error. I found reference link but not sure if that is my issue or not https://tracker.ceph.com/issues/51974 root@ceph1:~# ceph -s cluster: id: cd748128-a3ea-11ed

[ceph-users] RGW archive zone lifecycle

2023-02-09 Thread ondrej
Hi, I have two Ceph clusters in a multi-zone setup. The first one (master zone) would be accessible to users for their interaction using RGW. The second one is set to sync from the master zone with the tier type of the zone set as an archive (to version all files). My question here is. Is there

[ceph-users] Re: Nautilus to Octopus when RGW already on Octopus

2023-02-09 Thread r . burrowes
We are standing up a small Nautilus cluster using VMs on an old storage node, and will go through the upgrade on this test environment, just to make sure. We did have a nasty experience with our last upgrade from Luminous to Nautilus. We had installed 2 Mimic MDS daemons, with the intention of t

[ceph-users] Generated signurl is accessible from restricted IPs in bucket policy

2023-02-09 Thread Aggelos Toumasis
Hi there, We noticed after creating a signurl that the bucket resources were accessible from IPs that were originally restricted from accessing them (using a bucket policy). Using the s3cmd utility we confirmed that the Policy is correctly applied and you can access it only for the allowed IPs.

[ceph-users] Re: Permanently ignore some warning classes

2023-02-09 Thread Nicola Mori
I finally found the (hard) way to avoid receiving unwanted email alerts: I modified the alerts module in order to be able to specify the set of alert codes for which no notification is sent. If someone is interested I can share it, just let me know. __

[ceph-users] Re: Exit yolo mode by increasing size/min_size does not (really) work

2023-02-09 Thread stefan . pinter
hi, thank you Eugen for being interested in solving this ;) certainly, here are some more infos: ceph osd tree https://privatebin.net/?db7b93a623095879#AKJNy6pKNxa5XssjUpxxjMnggc3d4PirTH1pwHQFF3Qk ceph osd df https://privatebin.net/?0f7c3b091b683d65#8K4KQW5a2G2mFgcnTdUjQXJvcZCpAJGxcPRc1nUYiLXz

[ceph-users] Re: Frequent calling monitor election

2023-02-09 Thread Frank Schilder
Hi Dan and Gregory, the MONs went a bit too wild and started voting every 1-2 minutes. The logs were clean, so I started with the usual method of restarting one by one. Seems they got the message and the frenzy has stopped. We won't be able to diagnose this further, I'm afraid. Thanks for your

[ceph-users] Is ceph bootstrap keyrings in use after bootstrap?

2023-02-09 Thread Zhongzhou Cai
Hi, I'm on Ceph version 16.2.10, and I found there are a bunch of bootstrap keyrings (i.e., client.bootstrap-) located at /var/lib/ceph/bootstrap-/ceph.keyring after bootstrap. Are they still in use after bootstrap? Is it safe to remove them from host and even from ceph monitor? Thanks, Zhongzhou

[ceph-users] Re: Generated signurl is accessible from restricted IPs in bucket policy

2023-02-09 Thread Robin H. Johnson
On Wed, Feb 08, 2023 at 03:07:20PM -, Aggelos Toumasis wrote: > Hi there, > > We noticed after creating a signurl that the bucket resources were > accessible from IPs that were originally restricted from accessing > them (using a bucket policy). Using the s3cmd utility we confirmed > that the

[ceph-users] Ceph Quincy On Rocky 8.x - Upgrade To Rocky 9.1

2023-02-09 Thread duluxoz
Hi All, Sorry if this was mentioned previously (I obviously missed it if it was) but can we upgrade a Ceph Quincy Host/Cluster from Rocky Linux (RHEL) v8.6/8.7 to v9.1 (yet), and if so, what is / where can I find the procedure to do this - ie is there anything "special" that needs to be done

[ceph-users] mds damage cannot repair

2023-02-09 Thread Andrej Filipcic
Hi, there is mds damage on our cluster, version 17.2.5, [    {    "damage_type": "backtrace",    "id": 2287166658,    "ino": 3298564401782,    "path": "/hpc/home/euliz/.Xauthority"    } ] The recursive repair does not fix it, ...ceph tell mds.0 scrub start /hpc/home/euliz for

[ceph-users] Re: Exit yolo mode by increasing size/min_size does not (really) work

2023-02-09 Thread Eugen Block
Could you also share more details about the pools: ceph osd pool ls detail ceph osd crush rule dump (for each pool if they use different rules) Thanks Eugen Zitat von stefan.pin...@bearingpoint.com: hi, thank you Eugen for being interested in solving this ;) certainly, here are some more