Hi Igor,
Should we backpot this to the p,q and reef release's?
Thanks,
k
Sent from my iPhone
> On 25 May 2023, at 23:13, Igor Fedotov wrote:
>
> You might be facing the issue fixed by https://github.com/ceph/ceph/pull/49885
___
ceph-users mailing li
Hi Patrick,
The disaster recovery process with cephfs-data-scan tool didn't fix our MDS
issue. It still kept crashing. I've uploaded a detailed MDS log with below ID.
The restore procedure below didn't get it working either. Should I set
mds_go_bad_corrupt_dentry to false alongside with
mds_ab
> To be honest I am not confident that "ceph osd set-require-min-compat-client
> nautilus" is a necessary step for you. What prompted you to run that command?
>
> That step is not listed here:
> https://docs.ceph.com/en/latest/releases/nautilus/#upgrading-from-mimic-or-luminous
You're correc
This is my understanding as well: as with CRUSH tunable sets, features *happen*
to be named after releases don't always correlate 1:1.
> On May 25, 2023, at 15:49, Wesley Dillingham wrote:
>
> Fairly confident this is normal. I just checked a pacific cluster and they
> all report luminous as
Fairly confident this is normal. I just checked a pacific cluster and they
all report luminous as well. I think some of the backstory of this is
luminous is the release where up-maps were released and there hasnt been a
reason to increment the features release of subsequent daemons.
To be honest I
I now ran the command on everey host. And I did find two that couldn't
connect. They were the last two I added and never got any daemons. I
fixed that (copied (/etc/ceph and installed cephadm) and rebooted them
but it didn't change a thing for now.
All others could connect to all others withou
Hi Marc,
>
> I think for an upgrade the rocksdb is necessary. Check this for your monitors
>
> cat /var/lib/ceph/mon/ceph-a/kv_backend
Thanks, but I already had migrated all mons to use rocksdb when upgrading to
Luminous.
~ # cat /srv/ceph/mon/ceph-host1/kv_backend
rocksdb
Is this what you e
Hi,
So sorry I didn't see your reply. Had some tough weeks (father in law
died and that gave us some turmoil) I just came back to debugging and
didn't realize until now that you did in fact answer my e-mail.
I just ran your script on the host that is running the active manager.
Thanks a lot
>
> on our way towards getting our cluster to a current Ceph release, we
> updated all hosts and clients to Nautilus 14.2.22.
I think for an upgrade the rocksdb is necessary. Check this for your monitors
cat /var/lib/ceph/mon/ceph-a/kv_backend
___
cep
Dear Ceph community,
on our way towards getting our cluster to a current Ceph release, we updated
all hosts and clients to Nautilus 14.2.22. But despite setting `ceph osd
set-require-min-compat-client nautilus`, the release reported by `ceph
features` is still "luminous".
Is this supposed to b
Hi Chris,
I think, you have missed one steps and that is to change mtime for directory
explicitly. Please have a look at highlighted steps.
CEPHFS
===
root@sds-ceph:/mnt/cephfs/volumes/_nogroup/test1/d5052b71-39ec-4d0a-9
On 5/25/23 18:17, Igor Fedotov wrote:
Perhaps...
I don't like the idea to use fragmentation score as a real index. IMO
it's mostly like a very imprecise first turn marker to alert that
something might be wrong. But not a real quantitative high-quality
estimate.
Chiming in on the high fragme
Yeah this looks fine. Please collect all of them for a given OSD.
Then restart OSD, wait more to come (1-2 days) and collect them too.
A side note - in the attached probe I can't see any fragmentation at all
- amount of allocations is equal to amount of fragments, e.g.
cnt: 27637 frags: 2763
Ok, I'm gathering the "allocation stats probe" stuff. Not sure I follow what
you mean by the historic probes. just:
| egrep "allocation stats probe|probe" ?
That gets something like:
May 24 11:24:34 cf8 ceph-4e4184f5-7733-453b-b72c-2b43422fd027-osd-183[2282674]:
debug 2023-05-24T18:24:34.105+0
Just run through available logs for a specific OSD (which you suspect
suffer from high fragmentation) and collect all allocation stats probes
you can find ("allocation stats probe" string is a perfect grep pattern,
please append lines with historic probes following day-0 line as well.
Given thi
If you can give me instructions on what you want me to gather before the
restart and after restart I can do it. I have some running away right now.
Thanks,
Kevin
From: Igor Fedotov
Sent: Thursday, May 25, 2023 9:17 AM
To: Fox, Kevin M; Hector Martin; cep
Perhaps...
I don't like the idea to use fragmentation score as a real index. IMO
it's mostly like a very imprecise first turn marker to alert that
something might be wrong. But not a real quantitative high-quality estimate.
So in fact I'd like to see a series of allocation probes showing
eve
Hi Sandip
Ceph servers (debian11/ceph base with Proxmox installed on top - NOT the
ceph that comes with Proxmox!):
ceph@pve1:~$ uname -a
Linux pve1 5.15.107-2-pve #1 SMP PVE 5.15.107-2 (2023-05-10T09:10Z) x86_64
GNU/Linux
ceph@pve1:~$ ceph version
ceph version 17.2.6 (d7ff0d10654d2280e08f1ab9
Is this related to https://tracker.ceph.com/issues/58022 ?
We still see run away osds at times, somewhat randomly, that causes runaway
fragmentation issues.
Thanks,
Kevin
From: Igor Fedotov
Sent: Thursday, May 25, 2023 8:29 AM
To: Hector Martin; ceph-us
Copy-pasting reply from Joseph.
=
Hello Greogry,
We are setting the mtime to 01 Jan 1970 00:00
1. Create a directory "dir1"
2. set mtime of the "dir1 to 0 -> i.e 1 jan 1970
3. Create child dire
Hi Hector,
I can advise two tools for further fragmentation analysis:
1) One might want to use ceph-bluestore-tool's free-dump command to get
a list of free chunks for an OSD and try to analyze whether it's really
highly fragmented and lacks long enough extents. free-dump just returns
a list
I recently upgraded to Quincy, and toggled on the BULK flag of a few pools. As
a result, my cluster has been spending the last several days shuffling data
while growing the pool pg counts. That in turn has resulted in a steadily
increasing number of pgs being flagged PG_NOT_DEEP_SCRUBBED. And
I haven’t checked the logs, but the most obvious way this happens is if the
mtime set on the directory is in the future compared to the time on the
client or server making changes — CephFS does not move times backwards.
(This causes some problems but prevents many, many others when times are
not sy
Hi Chris,
Kindly request you that follow steps given in previous mail and paste the
output here.
The reason behind this request is that we have encountered an issue which is
easily reproducible on
Latest version of both quincy and pacific, also we have thoroughly investigated
the matter and we
On 5/24/23 09:18, Hector Martin wrote:
On 24/05/2023 22.07, Mark Nelson wrote:
Yep, bluestore fragmentation is an issue. It's sort of a natural result
of using copy-on-write and never implementing any kind of
defragmentation scheme. Adam and I have been talking about doing it
now, probably pi
Hi Milind
I just tried this using the ceph kernel client and ceph-common 17.2.6
package in the latest Fedora kernel, against Ceph 17.2.6 and it worked
perfectly...
There must be some other factor in play.
Chris
On 25/05/2023 13:04, Sandip Divekar wrote:
Hello Milind,
We are using Ceph Kernel
What caught my eye is that this is also true for Disks on Hosts.
I added another disk to an OSD host. I can zap it with cephadm, I can
even make it an OSD with "ceph orch daemon add osd ceph06:/dev/sdb" and
it will be listed as new OSD in Ceph Dashboard.
But, when I look at the "Physical Disk
Hello Milind,
We are using Ceph Kernel Client.
But we found this same behavior while using Libcephfs library.
Should we treat this as a bug? Or
Is there any existing bug for similar issue ?
Thanks and Regards,
Sandip Divekar
From: Milind Changire
Sent: Thursday, May 25, 2023 4:24 PM
To: Sa
try the command with the --id argument:
# ceph --id admin --cluster floki daemon mds.icadmin011 dump cache
/tmp/dump.txt
I presume that your keyring has an appropriate entry for the client.admin
user
On Wed, May 24, 2023 at 5:10 PM Emmanuel Jaep
wrote:
> Absolutely! :-)
>
> root@icadmin011:/t
Sandip,
What type of client are you using ?
kernel client or fuse client ?
If it's the kernel client, then it's a bug.
FYI - Pacific and Quincy fuse clients do the right thing
On Wed, May 24, 2023 at 9:24 PM Sandip Divekar <
sandip.dive...@hitachivantara.com> wrote:
> Hi Team,
>
> I'm writing
Thanks. In the meantime we were able to narrow down the cause of the RAM
consumption a little.
ceph mds cache status shows, that the cache is within the limit (32G):
{
"pool": {
"items": 758820483,
"bytes": 32642572344
}
}
The remaining memory belongs to buffer_anon:
c
Hi Emmanuel,
regarding stopping state. We had a similar issue. see subject: MDS Upgrade from
17.2.5 to 17.2.6 not possible
We solved this by failing the MDS, which was in the stop state, but I don't
know if that's a good idea in general.
What does the log of the mds (stopping) shows? We obser
Hi Eugen,
Also, do you know why you use a multi-active MDS setup?
To be completely candid, I don't really know why this choice was made. I
assume the goal was to provide fault-tolerance and load-balancing.
Was that a requirement for subtree pinning (otherwise multiple active
daemons would balance
Hi Wes,
thanks for the heads-up.
Best,
Emmanuel
On Wed, May 24, 2023 at 5:47 PM Wesley Dillingham
wrote:
> There was a memory issue with standby-replay that may have been resolved
> since and fix is in 16.2.10 (not sure), the suggestion at the time was to
> avoid standby-replay.
>
> Perhaps a
34 matches
Mail list logo