[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Venky Shankar
On Tue, Nov 7, 2023 at 9:46 AM Venky Shankar wrote: > > Hi Yuri, > > On Tue, Nov 7, 2023 at 3:01 AM Yuri Weinstein wrote: > > > > Details of this release are summarized here: > > > > https://tracker.ceph.com/issues/63443#note-1 > > > > Seeking approvals/reviews for: > > > > smoke - Laura, Radek,

[ceph-users] Ceph dashboard reports CephNodeNetworkPacketErrors

2023-11-07 Thread Dominique Ramaekers
Hi, I'm using Ceph on a 4-host cluster for a year now. I recently discovered the Ceph Dashboard :-) No I see that the Dashboard reports CephNodeNetworkPacketErrors >0.01% or >10 packets/s... Although all systems work great, I'm worried. 'ip -s link show eno5' results: 2: eno5: mtu 1500 qdisc

[ceph-users] Re: Ceph dashboard reports CephNodeNetworkPacketErrors

2023-11-07 Thread David C.
Hi Dominique, The consistency of the data should not be at risk with such a problem. But on the other hand, it's better to solve the network problem. Perhaps look at the state of bond0 : cat /proc/net/bonding/bond0 As well as the usual network checks __

[ceph-users] Redeploy ceph orch OSDs after reboot, but don't mark as 'unmanaged'

2023-11-07 Thread Janek Bevendorff
Hi, We have our cluster RAM-booted, so we start from a clean slate after every reboot. That means I need to redeploy all OSD daemons as well. At the moment, I run cephadm deploy via Salt on the rebooted node, which brings the deployed OSDs back up, but the problem with this is that the deploy

[ceph-users] Re: Redeploy ceph orch OSDs after reboot, but don't mark as 'unmanaged'

2023-11-07 Thread Janek Bevendorff
Actually, ceph cephadm osd activate doesn't do what I expected it to do. It  seems to be looking for new OSDs to create instead of looking for existing OSDs to activate. Hence, it does nothing on my hosts and only prints 'Created no osd(s) on host XXX; already created?' So this wouldn't be an o

[ceph-users] Re: Ceph dashboard reports CephNodeNetworkPacketErrors

2023-11-07 Thread Dominique Ramaekers
Hi David, Thanks for the quick response! The bond reports not a single link failure. Nor do I register packet losses with ping. The network cards in the server are already replaced. Cables are new. With my setup I easily reach 2KIOPS over the cluster. So I do not assume network congestion when

[ceph-users] pool(s) do not have an application enabled after upgrade ti 17.2.7

2023-11-07 Thread Dmitry Melekhov
Hello! I'm very new to ceph ,s orry I'm asking extremely basic questions. I just upgraded 17.2.6 to 17.2.7 and got warning: 2 pool(s) do not have an application enabled These pools are 5 cephfs.cephfs.meta 6 cephfs.cephfs.data I don't remember why and how I created them, I just followed

[ceph-users] Found unknown daemon type ceph-exporter on host after upgrade to 17.2.7

2023-11-07 Thread Dmitry Melekhov
Hello! I see [WRN] Found unknown daemon type ceph-exporter on host for all 3 ceph servers in logs, after upgrade to 17.2.7 from 17.2.6 in dashboard and cephadm ['--image', 'quay.io/ceph/ceph@sha256:1fcdbead4709a7182047f8ff9726e0f17b0b209aaa6656c5c8b2339b818e70bb', '--timeout', '895', 'ls'

[ceph-users] Seagate Exos power settings - any experiences at your sites?

2023-11-07 Thread Alex Gorbachev
We have been seeing some odd behavior with scrubbing (very slow) and OSD warnings on a couple of new clusters. A bit of research turned up this: https://www.reddit.com/r/truenas/comments/p1ebnf/seagate_exos_load_cyclingidling_info_solution/ We've installed the tool from https://github.com/Seagat

[ceph-users] OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
Firstly I'm rolling out a rook update from v1.12.2 to v1.12.7 (latest stable) and ceph from 17.2.6 to 17.2.7 at the same time. I mention this in case the problem is actually caused by rook rather than ceph. It looks like ceph to my uninitiated eyes, though. The update just started bumping my OSDs

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
FYI I left rook as is and reverted to ceph 17.2.6 and the issue is resolved. The code change was added by commit 2e52c029bc2b052bb96f4731c6bb00e30ed209be: ceph-volume: fix broken workaround for atari partitions broken by bea9f4b643ce32268ad79c0fc257b25ff2f8333c This commits fixes that

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Casey Bodley
On Mon, Nov 6, 2023 at 4:31 PM Yuri Weinstein wrote: > > Details of this release are summarized here: > > https://tracker.ceph.com/issues/63443#note-1 > > Seeking approvals/reviews for: > > smoke - Laura, Radek, Prashant, Venky (POOL_APP_NOT_ENABLE failures) > rados - Neha, Radek, Travis, Ernesto,

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Jayanth Reddy
Hello Wesley and Casey, We've ended up with the same issue and here it appears that even the user with "--admin" isn't able to do anything. We're now unable to figure out if it is due to bucket policies, ACLs or IAM of some sort. I'm seeing these IAM errors in the logs ``` Nov 7 00:02:00 ceph-0

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > FYI I left rook as is and reverted to ceph 17.2.6 and the issue is > resolved. > > The code change was added by > commit 2e52c029bc2b052bb96f4731c6bb00e30ed209be: > ceph-volume: fix broken workaround for atari partitions > > broken by be

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Casey Bodley
On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy wrote: > > Hello Wesley and Casey, > > We've ended up with the same issue and here it appears that even the user > with "--admin" isn't able to do anything. We're now unable to figure out if > it is due to bucket policies, ACLs or IAM of some sort. I

[ceph-users] Re: OSD fails to start after 17.2.6 to 17.2.7 update

2023-11-07 Thread Matthew Booth
I just discovered that rook is tracking this here: https://github.com/rook/rook/issues/13136 On Tue, 7 Nov 2023 at 18:09, Matthew Booth wrote: > On Tue, 7 Nov 2023 at 16:26, Matthew Booth wrote: > >> FYI I left rook as is and reverted to ceph 17.2.6 and the issue is >> resolved. >> >> The code

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Jayanth Reddy
Hello Casey, Thank you for the quick response. I see `rgw_policy_reject_invalid_principals` is not present in v17.2.7. Please let me know. Regards Jayanth On Tue, Nov 7, 2023 at 11:50 PM Casey Bodley wrote: > On Tue, Nov 7, 2023 at 12:41 PM Jayanth Reddy > wrote: > > > > Hello Wesley and Case

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Yuri Weinstein
3 PRs above mentioned were merged and I am returning some tests: https://pulpito.ceph.com/?sha1=55e3239498650453ff76a9b06a37f1a6f488c8fd Still seeing approvals. smoke - Laura, Radek, Prashant, Venky in progress rados - Neha, Radek, Travis, Ernesto, Adam King rgw - Casey in progress fs - Venky orch

[ceph-users] Re: reef 18.2.1 QE Validation status

2023-11-07 Thread Adam King
I think the orch code itself is doing fine, but a bunch of tests are failing due to https://tracker.ceph.com/issues/63151. I think that's likely related to the ganesha build we have included in the container and if we want nfs over rgw to work properly in this release I think we'll have to update i

[ceph-users] Re: MDS stuck in rejoin

2023-11-07 Thread Xiubo Li
Hi Frank, Recently I found a new possible case could cause this, please see https://github.com/ceph/ceph/pull/54259. This is just a ceph side fix, after this we need to fix it in kclient too, which hasn't done yet. Thanks - Xiubo On 8/8/23 17:44, Frank Schilder wrote: Dear Xiubo, the near

[ceph-users] how to disable ceph version check?

2023-11-07 Thread zxcs
Hi, Experts, we have a ceph cluster report HEALTH_ERR due to multiple old versions. health: HEALTH_ERR There are daemons running multiple old versions of ceph after run `ceph version`, we see three ceph versions in {16.2.*} , these daemons are ceph osd. our question is: how to

[ceph-users] Permanent KeyError: 'TYPE' ->17.2.7: return self.blkid_api['TYPE'] == 'part'

2023-11-07 Thread Harry G Coin
These repeat for every host, only after upgrading from prev release Quincy to 17.2.7.   As a result, the cluster is always warned, never indicates healthy. root@noc1:~# ceph health detail HEALTH_WARN failed to probe daemons or devices [WRN] CEPHADM_REFRESH_FAILED: failed to probe daemons or de

[ceph-users] Re: owner locked out of bucket via bucket policy

2023-11-07 Thread Jayanth Reddy
Hello Casey, And on further inspection, we identified that there were bucket policies set from the initial days; we were in v16.2.12. We upgraded the cluster to v17.2.7 two days ago and it seems obvious that the IAM error logs are generated the next minute rgw daemon upgraded from v16.2.12 to v

[ceph-users] Re: pool(s) do not have an application enabled after upgrade ti 17.2.7

2023-11-07 Thread Dmitry Melekhov
08.11.2023 00:15, Eugen Block пишет: Hi, I think I need to remove pools cephfs.cephfs.meta and cephfs.cephfs.data  using cephosdpooldelete{pool-name}[{pool-name}--yes-i-really-really-mean-it] by the way, as far as I know,  deleting pools not allowed by default, I have to allow it first. |c

[ceph-users] Re: how to disable ceph version check?

2023-11-07 Thread Boris
You can mute it with "ceph health mute ALERT" where alert is the caps keyword from "ceph health detail" But I would update asap. Cheers Boris > Am 08.11.2023 um 02:02 schrieb zxcs : > > Hi, Experts, > > we have a ceph cluster report HEALTH_ERR due to multiple old versions. > >health: