Hi Stefan,
We don’t have automatic conversion going on , and the «
bluestore_fsck_quick_fix_on_mount » is not set .
So we did an offline compaction as suggested but this didn’t fix the problem os
osd crush .
In the meantime we are rebuilding all OSDs on the cluster and it seems it
improve the cl
Thanks for your input:
There are buckets with over 15m files and >300 shards, but yesterday a
customer with 2.5m files and 101 shards complained about the slowness of
listing files.
We do not have indexless buckets. I am not sure if a customer can create
such a bucket on their own via the usual to
Hi Frank,
I suspect this is a combination of issues.
1. You have "choose" instead of "chooseleaf" in rule 1.
2. osd.7 is destroyed but still "up" in the osdmap.
3. The _tries settings in rule 1 are not helping.
Here are my tests:
# osdmaptool --test-map-pg 4.1c osdmap.bin
osdmaptool: osdmap file
> 2. osd.7 is destroyed but still "up" in the osdmap.
Oops, you can ignore this point -- this was an observation I had while
playing with the osdmap -- your osdmap.bin has osd.7 down correctly.
In case you're curious, here was what confused me:
# osdmaptool osdmap.bin2 --mark-up-in --mark-out 7
BTW, I vaguely recalled seeing this before. Yup, found it:
https://tracker.ceph.com/issues/55169
On Tue, Aug 30, 2022 at 11:46 AM Dan van der Ster wrote:
>
> > 2. osd.7 is destroyed but still "up" in the osdmap.
>
> Oops, you can ignore this point -- this was an observation I had while
> playing
Thanks a ton!
Yes, restart mds fixed this. But can’t confirm it hit bug 50840, seems when we
read huge small files will hit this! (means more than 10,000 small files in one
directory ).
Thanks
Xiong
> 2022年8月26日 19:13,Stefan Kooman 写道:
>
> On 8/26/22 12:33, zxcs wrote:
>> Hi, experts
>> w
BTW, the defaults for _tries seems to work too:
# diff -u crush.txt crush.txt2
--- crush.txt 2022-08-30 11:27:41.941836374 +0200
+++ crush.txt2 2022-08-30 11:55:45.601891010 +0200
@@ -90,10 +90,10 @@
type erasure
min_size 3
max_size 6
- step set_chooseleaf_tries 50
- step set_choose_tries 2
Hi Wissem,
sharing OSD log snippet preceding the crash (e.g. prior 20K lines) could
be helpful and hopefully will provide more insigh - there might be some
errors/assertion details and/or other artefacts...
Thanks,
Igor
On 8/30/2022 10:51 AM, Wissem MIMOUNA wrote:
Hi Stefan,
We don’t have
Hi Robert,
Thanks for the input.
> -Oorspronkelijk bericht-
> Van: Robert Sander
> Verzonden: maandag 29 augustus 2022 16:23
> Aan: ceph-users@ceph.io
> Onderwerp: [ceph-users] Re: Automanage block devices
>
> Am 29.08.22 um 14:14 schrieb Dominique Ramaekers:
>
> > Nevertheless, I woul
> Note: "step chose" was selected by creating the crush rule with ceph on pool
> creation. If the default should be "step choseleaf" (with OSD buckets), then
> the automatic crush rule generation in ceph ought to be fixed for EC profiles.
Interesting. Which exact command was used to create the p
>> Note: "step chose" was selected by creating the crush rule with ceph on pool
>> creation. If the default should be "step choseleaf" (with OSD buckets), then
>> the automatic crush rule generation in ceph ought to be fixed for EC
>> profiles.
> Interesting. Which exact command was used to crea
Thank you Boris
I have used cephadm to install that cluster
so readded mon / mgr with no issue cluster health seems OK no issue
reviewed ceph.conf it's in place
however still not able to run radosgw-admin .
Thank you in advance for your help
-- Forwarded message -
From: B
Yes, this cluster has both - a large cephfs FS (60TB) that is replicated
(2-copy) and a really large RGW data pool that is EC (12+4). We cannot
currently delete any data from either of them because commands to access them
are not responsive. The cephfs will not mount and radosgw-admin just h
OSDs are bluestore on HDD with SSD for DB/WAL. We already tuned the sleep_hdd
to 0 and cranked up the max_backfills and recovery parameters to much higher
values.
From: Josh Baergen
Sent: Tuesday, August 30, 2022 9:46 AM
To: Wyll Ingersoll
Cc: Dave Schulz ;
Hi Wyll,
The only way I could get my OSDs to start dropping their utilization
because of a similar "unable to access the fs" problem was to run "ceph
osd crush reweight 0" on the full OSDs then wait while they start
to empty and get below the full ratio. Not this is different from ceph
osd
Thanks, we may resort to that if we can't make progress in rebalancing things.
From: Dave Schulz
Sent: Tuesday, August 30, 2022 11:18 AM
To: Wyll Ingersoll ; Josh Baergen
Cc: ceph-users@ceph.io
Subject: Re: [ceph-users] Re: OSDs growing beyond full ratio
Hi
TLDR - Last call for feedback & reviews on PR https://github.com/ceph/ceph/
pull/41855 before our deadline in two weeks.
---
The Ceph Orchestration team has had a long term project [1] to refactor the
'cephadm binary' into something more manageable. The first step in the process
is to turn the
A couple of questions, Alex.
Is it the case that the object does not appear when you list the RGW bucket it
was in?
You referred to "one side of my cluster”. Does that imply you’re using
multisite?
And just for completeness, this is not a versioned bucket?
With a size of 6252 bytes, it wouldn
在 2022年8月30日,23:20,Dave Schulz 写道:
Is a file in ceph assigned to a specific PG? In my case it seems like a file
that's close to the size of a single OSD gets moved from one OSD to the next
filling it up and domino-ing around the cluster filling up OSDs.
I believe no. Each large file is split
Hi Weiwen,
Thanks for the reference link. That does indeed indicate the opposite.
I'm not sure why our issues became much less when the big files were
deleted. I suppose it's just that there was more space available after
deleting the big files.
-Dave
On 2022-08-30 11:56 a.m., 胡 玮文 wrote
One of our OSDs eventually reached 100% capacity (in spite of the full ratio
being 95%). Now it is down and we cannot restart the osd process on it because
there is not enough space on the device.
Is there a way to find PGs on that disk that can be safely removed without
destroying data so we
Hi, experts
we have a cephfs(15.2.13) cluster with kernel mount, and when we read from
2000+ processes to one ceph path(called /path/to/A/), then all of the process
hung, and ls -lrth /path/to/A/ always stuck, but list other directory are
health( /path/to/B/),
health detail always report md
22 matches
Mail list logo