[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-04-12 Thread Jan-Tristan Kruse
Hi, we solved our latency-problems in two clusters by redeploying all OSDs. Since then, we did not reencounter the problems described. Greetings, Jan ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@cep

[ceph-users] Re: Module 'cephadm' has failed: invalid literal for int() with base 10:

2023-04-12 Thread Eugen Block
Hi, have you tried a mgr failover? Zitat von Duncan M Tooke : Hi, Our Ceph cluster is in an error state with the message: # ceph status cluster: id: 58140ed2-4ed4-11ed-b4db-5c6f69756a60 health: HEALTH_ERR Module 'cephadm' has failed: invalid literal for int() with

[ceph-users] Re: How can I use not-replicated pool (replication 1 or raid-0)

2023-04-12 Thread Janne Johansson
Den mån 10 apr. 2023 kl 22:31 skrev mhnx : > Hello. > I have a 10 node cluster. I want to create a non-replicated pool > (replication 1) and I want to ask some questions about it: > > Let me tell you my use case: > - I don't care about losing data, > - All of my data is JUNK and these junk files ar

[ceph-users] Re: Module 'cephadm' has failed: invalid literal for int() with base 10:

2023-04-12 Thread Duncan M Tooke
Hi, Sorry, I didn't update this discussion yesterday. That was indeed exactly what was required, and it immediately recovered. Good shout 😊 Best wishes, Duncan -- Dr Duncan Tooke | Research Cluster Administrator Centre for Computational Biology, Weatherall Institute of Molecular Medicine, Uni

[ceph-users] Live migrate RBD image with a client using it

2023-04-12 Thread Work Ceph
Hello guys, We have been reading the docs, and trying to reproduce that process in our Ceph cluster. However, we always receive the following message: ``` librbd::Migration: prepare: image has watchers - not migrating rbd: preparing migration failed: (16) Device or resource busy ``` We test

[ceph-users] Re: Nearly 1 exabyte of Ceph storage

2023-04-12 Thread Marc
> > We are excited to share with you the latest statistics from our Ceph > public telemetry dashboards . :) > One of the things telemetry helps us to understand is version adoption > rate. See, for example, the trend of Quincy public.ce

[ceph-users] Re: Pacific dashboard: unable to get RGW information

2023-04-12 Thread Eugen Block
It feels like removing the option to set the rgw api host (in previous releases 'ceph dashboard set-rgw-api-host') is a regression. Apparently, there are many use-cases not covered by the automatic setting. I don't know the reasons behind the decision to implement it that way, but maybe it'

[ceph-users] Re: Nearly 1 exabyte of Ceph storage

2023-04-12 Thread Yaarit Hatuka
On Wed, Apr 12, 2023 at 12:32 PM Marc wrote: > > > > We are excited to share with you the latest statistics from our Ceph > > public telemetry dashboards . > > :) > > > One of the things telemetry helps us to understand is version adoption > > rate. See, for e

[ceph-users] Re: Live migrate RBD image with a client using it

2023-04-12 Thread Eugen Block
Hi, the docs you mentioned also state: All clients using the source image must be stopped prior to preparing a live-migration. The prepare step will fail if it finds any running clients with the image open in read/write mode. Once the prepare step is complete, the clients can be restarted

[ceph-users] [RGW] Rebuilding a non master zone

2023-04-12 Thread Gilles Mocellin
Hello cephers ! As I was asking in another thread ([RGW] Rebuilding a non master zone), I try to find the best way to rebuild a zone in a multisite config. The goal is to get rid of remaining Large OMAP objects. The simplest way, as I can rely only on the primary zone, is to : - remove the zone

[ceph-users] ceph pg stuck - missing on 1 osd how to proceed

2023-04-12 Thread xadhoom76
Hi to all Using ceph 17.2.5 i have 3 pgs in stuck state ceph pg map 8.2a6 osdmap e32862 pg 8.2a6 (8.2a6) -> up [88,100,59] acting [59,100] looking at it ho 88 ,100 and 59 i got that ceph pg ls-by-osd osd.100 | grep 8.2a6 8.2a6 211004209089 00 1747979252050

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-04-12 Thread Reza Bakhshayeshi
Thank you Adam for your response, I tried all your comments and the troubleshooting link you sent. From the Quincy mgrs containers, they can ssh into all other Pacific nodes successfully by running the exact command in the log output and vice versa. Here are some debug logs from the cephadm while

[ceph-users] Re: Upgrading from Pacific to Quincy fails with "Unexpected error"

2023-04-12 Thread Adam King
Ah, okay. Someone else had opened an issue about the same thing after the 17.2.5 release I believe. It's changed in 17.2.6 at least to only use sudo for non-root users https://github.com/ceph/ceph/blob/v17.2.6/src/pybind/mgr/cephadm/ssh.py#L148-L153. But it looks like you're also using a non-root u

[ceph-users] pacific v16.2.1 (hot-fix) QE Validation status

2023-04-12 Thread Yuri Weinstein
Details of this release are summarized here: https://tracker.ceph.com/issues/59426#note-3 Release Notes - TBD Seeking approvals/reviews for: smoke - Josh approved? orch - Adam King approved? (there are infrastructure issues in the runs, but we want to release this ASAP) Thx YuriW _

[ceph-users] Re: avg apply latency went up after update from octopus to pacific

2023-04-12 Thread Reed Dier
Hi Jan, As someone who has been watching this thread in anticipation of planning an Octopus to Pacific upgrade, and also someone not all that interested in repaving all OSDs, which release(s) were the OSDs originally deployed with? Just trying to get a basic estimate on how recent or not these

[ceph-users] Re: pacific v16.2.1 (hot-fix) QE Validation status

2023-04-12 Thread Adam King
Obviously the issue installing EPEL makes the runs look pretty bad. But, given the ubuntu based tests look alright, the EPEL stuff is likely not on our side (so who knows when it will be resolved), and this is only 16.2.11 + a handful of ceph-volume patches, I'm willing to approve in the interest o

[ceph-users] Ceph Leadership Team Meeting, 2023-04-12 Minutes

2023-04-12 Thread Patrick Donnelly
Hi folks, Today we discussed: - Just short of 1 exabyte of Ceph storage reported to Telemetry. Telemetry's data is public and viewable at: https://telemetry-public.ceph.com/d/ZFYuv1qWz/telemetry?orgId=1 If your cluster is not reporting to Telemetry, please consider it! :) - A request from the

[ceph-users] RBD snapshot mirror syncs all snapshots

2023-04-12 Thread Andreas Teuchert
Hello, I setup two-way snapshot-based RBD mirroring between two Ceph clusters. After enabling mirroring for an image that already had regular snapshots independently from RBD mirror on the source cluster, the image and all snapshots were synced to the destination cluster. Is there a way to

[ceph-users] Re: pacific v16.2.1 (hot-fix) QE Validation status

2023-04-12 Thread Josh Durgin
smoke is all green on ubuntu as well, given the ceph-volume tests passed as well, it looks good to go. On Wed, Apr 12, 2023 at 8:41 AM Adam King wrote: > Obviously the issue installing EPEL makes the runs look pretty bad. But, > given the ubuntu based tests look alright, the EPEL stuff is likely

[ceph-users] Re: Live migrate RBD image with a client using it

2023-04-12 Thread Work Ceph
Exactly, I have seen that. However, that also means that it is not a "process" then, right? Am I missing something? If we need a live process, where the clients cannot unmpa the volumes, what do you guys recommend? On Wed, Apr 12, 2023 at 10:01 AM Eugen Block wrote: > Hi, > > the docs you menti

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
Thanks for your detailed explanations! That helped a lot. All MDS are still in status error. "ceph orch device ls" showed that some hosts seem to not have enough space on devices. I wonder why I didn't see that in monitoring. Anyway, I'll fix that and then try to proceed. When the backport i

[ceph-users] Re: Upgrade from 17.2.5 to 17.2.6 stuck at MDS

2023-04-12 Thread Thomas Widhalm
Sorry - the info about the insufficient space seems like it referred to why the devices are not available. So that's just as is should be. All MDS are still in error state and were refreshed 2d ago. Even right after a mgr failover. So it seems, there's something else going on. One thing that