Hi Guys
I am busy removing an OSD from my rook-ceph cluster. I did 'ceph osd out
osd.7' and the re-balancing process started. Now it has stalled with one
pg on "active+undersized+degraded". I have done this before and it has
worked fine.
# ceph health detail
HEALTH_WARN Degraded data redundancy:
Sure. Tx.
# ceph pg 3.1f query
{
"snap_trimq": "[]",
"snap_trimq_len": 0,
"state": "active+undersized+degraded",
"epoch": 2477,
"up": [
0,
2
],
"acting": [
0,
2
],
"acting_recovery_backfill": [
"0",
"2"
],
I just grepped all the OSD pod logs for error and warn and nothing comes up:
# k logs -n rook-ceph rook-ceph-osd-10-659549cd48-nfqgk | grep -i warn
etc
I am assuming that would bring back something if any of them were unhappy.
On Thu, Nov 18, 2021 at 1:26 PM Stefan Kooman wrote:
> On 11/18/
Am 17.11.21 um 20:14 schrieb Marc:
a good choice. It lacks RBD encryption and read leases. But for us
upgrading from N to O or P is currently not
what about just using osd encryption with N?
That would be Data at Rest encryption only. The keys for the OSDs are stored on
the mons. Data is tr
Hi,
We sometimes have similar stuck client recall warnings.
To debug you can try:
(1) ceph health detail
that will show you the client ids which are generating the
warning. (e.g. 1234)
(2) ceph tell mds.* client ls id=1234
this will show lots of client statistics for the session. Notabl
Hello Cephers,
i too am for LTS releases or for some kind of middle ground like longer
release cycle and/or have even numbered releases designated for
production like before. We all use LTS releases for the base OS when
running Ceph, yet in reality we depend much more on the Ceph code than
th
If I ignore the dire warnings and about losing data and do:
ceph osd purge 7
will I lose data? There are still 2 copies of everything right?
I need to remove the node with the OSD from the k8s cluster, reinstall it
and have it re-join the cluster. This will bring in some new OSDs and maybe
Ceph w
Tx. # ceph version
ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
(stable)
On Thu, Nov 18, 2021 at 3:28 PM Stefan Kooman wrote:
> On 11/18/21 13:20, David Tinker wrote:
> > I just grepped all the OSD pod logs for error and warn and nothing comes
> up:
> >
> > # k logs -
Perhaps I missed something, but does the survey concludes that users don't
value reliability improvements at all? This would explain why developers
team wants to concentrate on performance and ease of management.
On Thu, Nov 18, 2021, 07:23 Stefan Kooman wrote:
> On 11/18/21 14:09, Maged Mokht
The weighted category prioritization clearly identifies reliability as the top
priority.
Daniel
> Am 18.11.2021 um 15:32 schrieb Sasha Litvak :
>
> Perhaps I missed something, but does the survey concludes that users don't
> value reliability improvements at all? This would explain why devel
Hi all,
Just to close the loop on this one - we ultimately found that there was an MTU
misconfiguration between the hosts that was causing Ceph and other things to
fail in strange ways. After fixing the MTU, cephadm etc immediately started
working.
Cheers,
Lincoln
_
Would it be worth setting the OSD I removed back to "in" (or whatever the
opposite of "out") is and seeing if things recovered?
On Thu, Nov 18, 2021 at 3:44 PM David Tinker wrote:
> Tx. # ceph version
> ceph version 15.2.7 (88e41c6c49beb18add4fdb6b4326ca466d931db8) octopus
> (stable)
>
>
>
> On
> Am 17.11.21 um 20:14 schrieb Marc:
> >> a good choice. It lacks RBD encryption and read leases. But for us
> >> upgrading from N to O or P is currently not
> >>
> > what about just using osd encryption with N?
>
>
> That would be Data at Rest encryption only. The keys for the OSDs are
> stored
>
> docker itself is not the problem,
I would even argue the opposite. If the docker daemon crashes it takes down all
containers. Sorry but in this time this is really not necessary with other
alternatives.
___
ceph-users mailing list -- ceph-user
> We also use containers for ceph and love it. If for some reason we
> couldn't run ceph this way any longer, we would probably migrate
> everything to a different solution. We are absolutely committed to
> containerization.
I wonder if you are really using containers. Are you not just using ceph-
>
> Please remember, free software comes still with a price. You can not
> expect someone to work on your individual problem while being cheap on
> your highly critical data. If your data has value, then you should
> invest in ensuring data safety. There are companies out, paying Ceph
> developers
>
> If your building a ceph cluster, the state of a single node shouldn't
> matter. Docker crashing should not be a show stopper.
>
You remind me of this senior software engineer of redhat that told me it was
not that big of deal that ceph.conf got deleted and the root fs was mounted via
a bin
I sense the concern about ceph distributions via containers generally
has to do with what you might call a feeling of 'opaqueness'. The
feeling is amplified as most folks who choose open source solutions
prize being able promptly to address the particular concerns affecting
them without havin
That response is typically indicative of a pg whose OSD sets has changed
since it was last scrubbed (typically from a disk failing).
Are you sure its actually getting scrubbed when you issue the scrub? For
example you can issue: "ceph pg query" and look for
"last_deep_scrub_stamp" which will tel
Okay, good news: on the osd start side, I identified the bug (and easily
reproduced locally). The tracker and fix are:
https://tracker.ceph.com/issues/53326
https://github.com/ceph/ceph/pull/44015
These will take a while to work through QA and get backported.
Also, to reiterate what I said on
Hey all,
We will be having a Ceph science/research/big cluster call on Wednesday
November 24th. If anyone wants to discuss something specific they can
add it to the pad linked below. If you have questions or comments you
can contact me.
This is an informal open call of community members most
May i ask, which versions are affected by this bug? and which versions are
going to receive backports?
best regards,
samuel
huxia...@horebdata.cn
From: Sage Weil
Date: 2021-11-18 22:02
To: Manuel Lausch; ceph-users
Subject: [ceph-users] Re: OSD spend too much time on "waiting for readable"
It looks like the bug has been there since the read leases were introduced,
which I believe was octopus (15.2.z)
s
On Thu, Nov 18, 2021 at 3:55 PM huxia...@horebdata.cn
wrote:
> May i ask, which versions are affected by this bug? and which versions are
> going to receive backports?
>
> best reg
Hello!
Our test cluster is a few months old, was
initially set up from scratch with Pacific and has now had two
separate small patches 16.2.5 and then a couple weeks ago,
16.2.6 applied to it. The issue I?m describing has been present
sin
Hi,
do you use more nodes than deployed mgrs and cephadm?
If so it might be, that the node you are connecting to no longer has a
instance of the mgr running, and you only getting some leftovers in the
browser cache?
At least this was happening in my test cluster, but I was always able to
fi
Hello,
I don't think the meeting was recorded but there are detailed notes in
https://pad.ceph.com/p/ceph-user-dev-monthly-minutes. The next meeting
is scheduled for December 16, feel free to include your discussion
topic to the agenda.
Thanks,
Neha
On Thu, Nov 18, 2021 at 11:04 AM Szabo, Istvan
On Thu, Nov 18, 2021 at 12:36 AM 胡 玮文 wrote:
>
> Hi all,
>
> We are consistently seeing the MDS_CLIENT_RECALL warning in our cluster, it
> seems harmless, but we cannot get HEALTH_OK, which is annoying.
>
> The clients that are reported failing to respond to cache pressure are
> constantly chang
How does one read/set that from the command line?
Thanks,
Lindsay
___
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io
Instead of complaining, take some time to learn more about container would help.
Tony
From: Marc
Sent: November 18, 2021 10:50 AM
To: Pickett, Neale T; Hans van den Bogert; ceph-users@ceph.io
Subject: [ceph-users] Re: [EXTERNAL] Re: Why you might want pack
I think Marc uses containers - but they've chosen Apache Mesos as
orchestrator and ceph-adm doesn't work with that.
Currently essentially two ceph container orchestrators exist - rook which
is a ceph orch or kubernetes and ceph-adm which is an orchestrator
expecting docker or podman
Admittedly I do
Thanks Dan,
I choose one of the stuck client to investigate, as shown below, it currently
holds ~269700 caps, which is pretty high with no obvious reason. I cannot
understand most of the output, and failed to find any documents about it.
# ceph tell mds.cephfs.gpu018.ovxvoz client ls id=7915658
Hi Patrick,
One of the stuck client has num_caps at around 269700, and well above the
number of files opened on the client (about 9k). See my reply to Dan for
details. So I don't think this warning is simply caused by
"mds_min_caps_working_set" being set too low.
> -邮件原件-
> 发件人: Patric
> In this context, I find it quite disturbing that nobody is willing even to
> discuss an increase of the release cycle from say 2 to 4 years. What is so
> important about pumping out one version after the other that real issues
> caused by this speed are ignored?
One factor I think is that
33 matches
Mail list logo