date:20211118

[ceph-users] One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

Hi Guys I am busy removing an OSD from my rook-ceph cluster. I did 'ceph osd out osd.7' and the re-balancing process started. Now it has stalled with one pg on "active+undersized+degraded". I have done this before and it has worked fine. # ceph health detail HEALTH_WARN Degraded data redundancy:

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

Sure. Tx. # ceph pg 3.1f query { "snap_trimq": "[]", "snap_trimq_len": 0, "state": "active+undersized+degraded", "epoch": 2477, "up": [ 0, 2 ], "acting": [ 0, 2 ], "acting_recovery_backfill": [ "0", "2" ],

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

I just grepped all the OSD pod logs for error and warn and nothing comes up: # k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk | grep -i warn etc I am assuming that would bring back something if any of them were unhappy. On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman wrote: > On 11/18/

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Peter Lieven

Am 17.11.21 um 20:14 schrieb Marc: a good choice. It lacks RBD encryption and read leases. But for us upgrading from N to O or P is currently not what about just using osd encryption with N? That would be Data at Rest encryption only. The keys for the OSDs are stored on the mons. Data is tr

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Dan van der Ster

Hi, We sometimes have similar stuck client recall warnings. To debug you can try: (1) ceph health detail that will show you the client ids which are generating the warning. (e.g. 1234) (2) ceph tell mds.* client ls id=1234 this will show lots of client statistics for the session. Notabl

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Maged Mokhtar

Hello Cephers, i too am for LTS releases or for some kind of middle ground like longer release cycle and/or have even numbered releases designated for production like before. We all use LTS releases for the base OS when running Ceph, yet in reality we depend much more on the Ceph code than th

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

If I ignore the dire warnings and about losing data and do: ceph osd purge 7 will I lose data? There are still 2 copies of everything right? I need to remove the node with the OSD from the k8s cluster, reinstall it and have it re-join the cluster. This will bring in some new OSDs and maybe Ceph w

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

Tx. # ceph version ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus (stable) On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman wrote: > On 11/18/21 13:20, David Tinker wrote: > > I just grepped all the OSD pod logs for error and warn and nothing comes > up: > > > > # k logs -

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Sasha Litvak

Perhaps I missed something, but does the survey concludes that users don't value reliability improvements at all? This would explain why developers team wants to concentrate on performance and ease of management. On Thu, Nov 18, 2021, 07:23 Stefan Kooman wrote: > On 11/18/21 14:09, Maged Mokht

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Daniel Tönnißen

The weighted category prioritization clearly identifies reliability as the top priority. Daniel > Am 18.11.2021 um 15:32 schrieb Sasha Litvak : > > Perhaps I missed something, but does the survey concludes that users don't > value reliability improvements at all? This would explain why devel

[ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

2021-11-18 Thread Lincoln Bryant

Hi all, Just to close the loop on this one - we ultimately found that there was an MTU misconfiguration between the hosts that was causing Ceph and other things to fail in strange ways. After fixing the MTU, cephadm etc immediately started working. Cheers, Lincoln _

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

2021-11-18 Thread David Tinker

Would it be worth setting the OSD I removed back to "in" (or whatever the opposite of "out") is and seeing if things recovered? On Thu, Nov 18, 2021 at 3:44 PM David Tinker wrote: > Tx. # ceph version > ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus > (stable) > > > > On

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc

> Am 17.11.21 um 20:14 schrieb Marc: > >> a good choice. It lacks RBD encryption and read leases. But for us > >> upgrading from N to O or P is currently not > >> > > what about just using osd encryption with N? > > > That would be Data at Rest encryption only. The keys for the OSDs are > stored

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc

> > docker itself is not the problem, I would even argue the opposite. If the docker daemon crashes it takes down all containers. Sorry but in this time this is really not necessary with other alternatives. ___ ceph-users mailing list -- ceph-user

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc

> We also use containers for ceph and love it. If for some reason we > couldn't run ceph this way any longer, we would probably migrate > everything to a different solution. We are absolutely committed to > containerization. I wonder if you are really using containers. Are you not just using ceph-

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc

> > Please remember, free software comes still with a price. You can not > expect someone to work on your individual problem while being cheap on > your highly critical data. If your data has value, then you should > invest in ensuring data safety. There are companies out, paying Ceph > developers

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Marc

> > If your building a ceph cluster, the state of a single node shouldn't > matter. Docker crashing should not be a show stopper. > You remind me of this senior software engineer of redhat that told me it was not that big of deal that ceph.conf got deleted and the root fs was mounted via a bin

[ceph-users] A middle ground between containers and 'lts distros'?

2021-11-18 Thread Harry G. Coin

I sense the concern about ceph distributions via containers generally has to do with what you might call a feeling of 'opaqueness'. The feeling is amplified as most folks who choose open source solutions prize being able promptly to address the particular concerns affecting them without havin

[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

2021-11-18 Thread Wesley Dillingham

That response is typically indicative of a pg whose OSD sets has changed since it was last scrubbed (typically from a disk failing). Are you sure its actually getting scrubbed when you issue the scrub? For example you can issue: "ceph pg query" and look for "last_deep_scrub_stamp" which will tel

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil

Okay, good news: on the osd start side, I identified the bug (and easily reproduced locally). The tracker and fix are: https://tracker.ceph.com/issues/53326 https://github.com/ceph/ceph/pull/44015 These will take a while to work through QA and get backported. Also, to reiterate what I said on

[ceph-users] November Ceph Science Virtual User Group

2021-11-18 Thread Kevin Hrpcek

Hey all, We will be having a Ceph science/research/big cluster call on Wednesday November 24th. If anyone wants to discuss something specific they can add it to the pad linked below. If you have questions or comments you can contact me. This is an informal open call of community members most

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread huxia...@horebdata.cn

May i ask, which versions are affected by this bug? and which versions are going to receive backports? best regards, samuel huxia...@horebdata.cn From: Sage Weil Date: 2021-11-18 22:02 To: Manuel Lausch; ceph-users Subject: [ceph-users] Re: OSD spend too much time on "waiting for readable"

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

2021-11-18 Thread Sage Weil

It looks like the bug has been there since the read leases were introduced, which I believe was octopus (15.2.z) s On Thu, Nov 18, 2021 at 3:55 PM huxia...@horebdata.cn wrote: > May i ask, which versions are affected by this bug? and which versions are > going to receive backports? > > best reg

[ceph-users] Dashboard's website hangs during loading, no errors

2021-11-18 Thread Zach Heise (SSCC)

Hello! Our test cluster is a few months old, was initially set up from scratch with Pacific and has now had two separate small patches 16.2.5 and then a couple weeks ago, 16.2.6 applied to it. The issue I?m describing has been present sin

[ceph-users] Re: Dashboard's website hangs during loading, no errors

2021-11-18 Thread Kai Börnert

Hi, do you use more nodes than deployed mgrs and cephadm? If so it might be, that the node you are connecting to no longer has a instance of the mgr running, and you only getting some leftovers in the browser cache? At least this was happening in my test cluster, but I was always able to fi

[ceph-users] Re: This week: Ceph User + Dev Monthly Meetup

2021-11-18 Thread Neha Ojha

Hello, I don't think the meeting was recorded but there are detailed notes in https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. The next meeting is scheduled for December 16, feel free to include your discussion topic to the agenda. Thanks, Neha On Thu, Nov 18, 2021 at 11:04 AM Szabo, Istvan

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread Patrick Donnelly

On Thu, Nov 18, 2021 at 12:36 AM 胡玮文 wrote: > > Hi all, > > We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it > seems harmless, but we cannot get HEALTH_OK, which is annoying. > > The clients that are reported failing to respond to cache pressure are > constantly chang

[ceph-users] bluestore_quick_fix_on_mount

2021-11-18 Thread Lindsay Mathieson

How does one read/set that from the command line? Thanks, Lindsay ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Tony Liu

Instead of complaining, take some time to learn more about container would help. Tony From: Marc Sent: November 18, 2021 10:50 AM To: Pickett, Neale T; Hans van den Bogert; ceph-users@ceph.io Subject: [ceph-users] Re: [EXTERNAL] Re: Why you might want pack

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Christian Wuerdig

I think Marc uses containers - but they've chosen Apache Mesos as orchestrator and ceph-adm doesn't work with that. Currently essentially two ceph container orchestrators exist - rook which is a ceph orch or kubernetes and ceph-adm which is an orchestrator expecting docker or podman Admittedly I do

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡玮文

Thanks Dan, I choose one of the stuck client to investigate, as shown below, it currently holds ~269700 caps, which is pretty high with no obvious reason. I cannot understand most of the output, and failed to find any documents about it. # ceph tell mds.cephfs.gpu018.ovxvoz client ls id=7915658

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

2021-11-18 Thread 胡玮文

Hi Patrick, One of the stuck client has num_caps at around 269700, and well above the number of files opened on the client (about 9k). See my reply to Dan for details. So I don't think this warning is simply caused by "mds_min_caps_working_set" being set too low. > -邮件原件- > 发件人: Patric

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

2021-11-18 Thread Anthony D'Atri

> In this context, I find it quite disturbing that nobody is willing even to > discuss an increase of the release cycle from say 2 to 4 years. What is so > important about pumping out one version after the other that real issues > caused by this speed are ignored? One factor I think is that

[ceph-users] One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: cephadm / ceph orch : indefinite hang adding hosts to new cluster

[ceph-users] Re: One pg stuck in active+undersized+degraded after OSD down

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] A middle ground between containers and 'lts distros'?

[ceph-users] Re: erasure coded pool PG stuck inconsistent on ceph Pacific 15.2.13

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[ceph-users] November Ceph Science Virtual User Group

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[ceph-users] Re: OSD spend too much time on "waiting for readable" -> slow ops -> laggy pg -> rgw stop -> worst case osd restart

[ceph-users] Dashboard's website hangs during loading, no errors

[ceph-users] Re: Dashboard's website hangs during loading, no errors

[ceph-users] Re: This week: Ceph User + Dev Monthly Meetup

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

[ceph-users] bluestore_quick_fix_on_mount

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: [EXTERNAL] Re: Why you might want packages not containers for Ceph deployments

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

[ceph-users] Re: Annoying MDS_CLIENT_RECALL Warning

[ceph-users] Re: Why you might want packages not containers for Ceph deployments

33 matches

Site Navigation

Mail list logo

Footer information