Hello,
i run a ceph Nautilus cluster with 9 hosts and 144 OSDs. Last night we
lost two disks, so two OSDs (67,90) are down. The two disks are on two
different hosts. A third ODS on a third host repotrts slow ops. ceph is
repairing at the moment.
Pools affected are eg these ones:
pool 35 'px
Hi David,
On Tuesday, March 30th, 2021 at 00:50, David Orman wrote:
> Sure enough, it is more than 200,000, just as the alert indicates.
> However, why did it not reshard further? Here's the kicker - we only
> see this with versioned buckets/objects. I don't see anything in the
> documentation th
Hi,
from what you've sent my conclusion about the stalled I/O would be
indeed the min_size of the EC pool.
There's only one PG reported as incomplete, I assume that is the EC
pool, not the replicated pxa-rbd, right? Both pools are for rbd so I'm
guessing the rbd headers are in pxa-rbd while
Hi,
I have a couple OSDs that currently get a lot of data, and are running
towards 95% fillrate.
I would like to forcefully remap some PGs (they are around 100GB) to more
empty OSDs and drop them from the full OSDs. I know this would lead to
degraded objects, but I am not sure how long the cluster
Hi everyone,
I didn't get enough responses on the previous Doodle to schedule a
meeting. I'm wondering if people are OK with the previous PDF I
released or if there's interest in the community to develop better
survey results?
https://ceph.io/community/ceph-user-survey-2019/
On Mon, Mar 22, 2021
I just move one PG away from the OSD, but the diskspace will not get freed.
Do I need to do something to clean obsolete objects from the osd?
Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens :
> Hi,
> I have a couple OSDs that currently get a lot of data, and are running
> towards 95% fil
Hello,
yes your assumptions are correct pxa-rbd ist the metadata pool for
pxa-ec which uses a erasure coding 4+2 profile.
In the last hours ceph repaired most of the damage. One inactive PG
remained and in ceph health detail then told me:
-
HEALTH_WARN Reduced data availability: 1 p
Hello Frank,
the option is actually set. On one of my monitors:
# ceph daemon /var/run/ceph/ceph-mon.*.asok config show|grep
osd_allow_recovery_below_min_size
"osd_allow_recovery_below_min_size": "true",
Thank you very much
Rainer
Am 30.03.21 um 13:20 schrieb Frank Schilder:
Hi, this is
Hi,
On 30.03.21 13:05, Rainer Krienke wrote:
Hello,
yes your assumptions are correct pxa-rbd ist the metadata pool for
pxa-ec which uses a erasure coding 4+2 profile.
In the last hours ceph repaired most of the damage. One inactive PG
remained and in ceph health detail then told me:
-
One week later the ceph is still balancing.
What worries me like hell is the %USE on a lot of those OSDs. Does ceph
resolv this on it's own? We are currently down to 5TB space in the cluster.
Rebalancing single OSDs doesn't work well and it increases the "missplaced
objects".
I thought about letti
Are those PGs backfilling due to splitting or due to balancing?
If it's the former, I don't think there's a way to pause them with
upmap or any other trick.
-- dan
On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote:
>
> One week later the ceph is still balancing.
> What worries me like hell is
Hello,
in between ceph is runing again normally, except for the two osds that
are down because of the failed disks.
What really helped in my situation was to lower min_size from 5 (k+1)
to 4 in my 4+2 erasure code setup. So I am also greatful for the
programmer who put the helping hint in c
Hi all!
We run a 1.5PB cluster with 12 hosts, 192 OSDs (mix of NVMe and HDD) and need
to improve our failure domain by altering the crush rules and moving rack to
pods, which would imply a lot of data movement.
I wonder what would the preferred order of operations be when doing such
changes to
I would think due to splitting, because the balancer doesn't refuses it's
work, because to many misplaced objects.
I also think to turn it off for now, so it doesn't begin it's work at 5%
missplaced objects.
Would adding more hardware help? We wanted to insert another OSD node with
7x8TB disks any
Any time frame on 14.2.19?
On Fri, Mar 26, 2021, 1:43 AM Konstantin Shalygin wrote:
> Finally master is merged now
>
>
> k
>
> Sent from my iPhone
>
> > On 25 Mar 2021, at 23:09, Simon Oosthoek
> wrote:
> >
> > I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19
> will be avail
I reweighted the OSD to .0 and then forced the backfilling.
How long does it take for ceph to free up space? I looks like it was doing
this, but it could also be the "backup cleanup job" that removed images
from the buckets.
Am Di., 30. März 2021 um 14:41 Uhr schrieb Stefan Kooman :
> On 3/30/21
It would be safe to turn off the balancer, yes go ahead.
To know if adding more hardware will help, we need to see how much
longer this current splitting should take. This will help:
ceph status
ceph osd pool ls detail
-- dan
On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote:
>
> I w
Hi Ben,
That was beyond helpful. Thank you so much for the thoughtful and
detailed explanation. That should definitely be added to the
documentation, until/unless the dynamic resharder/sharder handle this
case (if there is even desire to do so) with versioned objects.
Respectfully,
David
On Tue,
The output from ceph osd pool ls detail tell me nothing, except that the
pgp_num is not where it should be. Can you help me to read the output? How
do I estimate how long the split will take?
[root@s3db1 ~]# ceph status
cluster:
id: dca79fff-ffd0-58f4-1cff-82a2feea05f4
health: HEALTH
You started with 1024 PGs, and are splitting to 2048.
Currently there are 1946 PGs used .. so it is nearly there at the goal.
You need to watch that value 1946 and see if it increases slowly. If
it does not increase, then those backfill_toofull PGs are probably
splitting PGs, and they are blocked
On 3/25/21 1:05 PM, Nico Schottelius wrote:
it seems there is no reference to it in the ceph documentation. Do you
have any pointers to it?
Not anymore with new Ceph documentation.
Out of curiosity, do you have any clue why it's not in there anymore?
It might still be, but I cannot find it
I raised the backfillfull_ratio to .91 to see what happens, now I am
waiting. Some OSDs were around 89-91%, some are around 50-60%
The pgp_num is on 1946 since one week. I think this will solve itself, when
the cluster becomes a bit more tidy.
Am Di., 30. März 2021 um 15:23 Uhr schrieb Dan van der
this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to
access a bucket via rgw_dns_s3website_name, you have to set a website
configuration on the bucket - see
https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html
On Tue, Mar 30, 2021 at 10:05 AM Marcel Kuiper wro
Hi Robert,
We get a handful of verify_authorizer warnings on some of our clusters
too but they don't seem to pose any problems.
I've tried without success to debug this in the past -- IIRC I started
to suspect it was coming from old cephfs kernel clients but got
distracted and never reached the bo
Hi, this is odd. The problem with recovery when sufficiently many but less than
min_size shards are present should have been resolved with
osd_allow_recovery_below_min_size=true. It is really dangerous to reduce
min_size below k+1 and, in fact, should never be necessary for recovery. Can
you ch
On 3/30/21 12:55 PM, Boris Behrens wrote:
I just move one PG away from the OSD, but the diskspace will not get freed.
How did you move? I would suggest you use upmap:
ceph osd pg-upmap-items
Invalid command: missing required parameter pgid()
osd pg-upmap-items [(id|osd.id)>...] : set pg_upm
Dear Rainer,
hmm, maybe the option is ignored or not implemented properly. This option set
to true should have the same effect as reducing min_size *except* that new
writes will not go to non-redundant storage. When reducing min-size, a
critically degraded PG will accept new writes, which is th
Sorry about the flow of messages.
I forgot to mention this. Looking at other replies, in particular, that the PG
in question remained at 4 out of 6 OSDs until you reduced min_size might
indicate that peering was blocked for some reason, but completed after the
reduction. If this was the order o
On 3/30/21 3:00 PM, Thomas Hukkelberg wrote:
Any thoughts or insight on how to achieve this with minimal data movement and
risk of cluster downtime would be welcome!
I would do so with Dan's "upmap-remap" script [1]. See [2] for his
presentation. We have used that quite a few times now (also
I thought that recovery below min_size for EC pools wasn't expected to work
until Octopus. From the Octopus release notes: "Ceph will allow recovery
below min_size for Erasure coded pools, wherever possible."
Josh
On Tue, Mar 30, 2021 at 6:53 AM Frank Schilder wrote:
> Dear Rainer,
>
> hmm, may
Ahh, right. I saw it fixed here https://tracker.ceph.com/issues/18749 a long
time ago, but it seems the back-port never happened.
Best regards,
=
Frank Schilder
AIT Risø Campus
Bygning 109, rum S14
From: Josh Baergen
Sent: 30 March 2021 1
despite the examples that can be found on the internet I have troubles
setting up a static website that serves from a S3 bucket If anyone could
point me in the right direction that would be much appreciated
Marcel
I created an index.html in the bucket sky
gm-rc3-jumphost01@ceph/s3cmd (mast
On 3/30/21 3:02 PM, Boris Behrens wrote:
I reweighted the OSD to .0 and then forced the backfilling.
How long does it take for ceph to free up space? I looks like it was
doing this, but it could also be the "backup cleanup job" that removed
images from the buckets.
I don't have any numbers o
It's a bug: https://tracker.ceph.com/issues/50060
On Wed, Dec 23, 2020 at 5:53 PM Alex Taylor wrote:
>
> Hi Patrick,
>
> Any updates? Looking forward to your reply :D
>
>
> On Thu, Dec 17, 2020 at 11:39 AM Patrick Donnelly wrote:
> >
> > On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor wrote:
> > >
This is the 19th update to the Ceph Nautilus release series. This is a
hotfix release to prevent daemons from binding to loopback network
interfaces. All nautilus users are advised to upgrade to this release.
Notable Changes
---
* This release fixes a regression introduced in v14.2.18
Casey,
Many thanks. That did the trick.
Regards
Marcel
Casey Bodley schreef op 2021-03-30 16:48:
this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to
access a bucket via rgw_dns_s3website_name, you have to set a website
configuration on the bucket - see
https://docs.aws.amazon
Thanks for the quick release! \o/
On Tue, 30 Mar 2021, 22:30 David Galloway, wrote:
> This is the 19th update to the Ceph Nautilus release series. This is a
> hotfix release to prevent daemons from binding to loopback network
> interfaces. All nautilus users are advised to upgrade to this releas
I've not undertaken such a large data movement,
The pgupmap script may be of use here, but assuming that its not.
But if I were, I would first take many backups of the current crush map.
I would set the noreblance and norecover flags.
Then I would verify all of the backfill settings are as aggres
Mabi;
We're running Nautilus, and I am not wholly convinced of the "everything in
containers" view of the world, so take this with a small grain of salt...
1) We don't run Ubuntu, sorry. I suspect the documentation highlights 18.04
because it's the current LTS release. Personally, if I had a
Hello,
I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu
nodes and have a few questions mostly regarding planning of the infra.
1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is
it ok to use Ubuntu 20.04 instead or should I stick with 18
40 matches
Mail list logo