date:20210330

[ceph-users] ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Rainer Krienke

Hello, i run a ceph Nautilus cluster with 9 hosts and 144 OSDs. Last night we lost two disks, so two OSDs (67,90) are down. The two disks are on two different hosts. A third ODS on a third host repotrts slow ops. ceph is repairing at the moment. Pools affected are eg these ones: pool 35 'px

[ceph-users] Re: Resolving LARGE_OMAP_OBJECTS

2021-03-30 Thread Benoît Knecht

Hi David, On Tuesday, March 30th, 2021 at 00:50, David Orman wrote: > Sure enough, it is more than 200,000, just as the alert indicates. > However, why did it not reshard further? Here's the kicker - we only > see this with versioned buckets/objects. I don't see anything in the > documentation th

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Eugen Block

Hi, from what you've sent my conclusion about the stalled I/O would be indeed the min_size of the EC pool. There's only one PG reported as incomplete, I assume that is the EC pool, not the replicated pxa-rbd, right? Both pools are for rbd so I'm guessing the rbd headers are in pxa-rbd while

[ceph-users] forceful remap PGs

2021-03-30 Thread Boris Behrens

Hi, I have a couple OSDs that currently get a lot of data, and are running towards 95% fillrate. I would like to forcefully remap some PGs (they are around 100GB) to more empty OSDs and drop them from the full OSDs. I know this would lead to degraded objects, but I am not sure how long the cluster

[ceph-users] Re: Ceph User Survey Working Group - Next Steps

2021-03-30 Thread Mike Perez

Hi everyone, I didn't get enough responses on the previous Doodle to schedule a meeting. I'm wondering if people are OK with the previous PDF I released or if there's interest in the community to develop better survey results? https://ceph.io/community/ceph-user-survey-2019/ On Mon, Mar 22, 2021

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens

I just move one PG away from the OSD, but the diskspace will not get freed. Do I need to do something to clean obsolete objects from the osd? Am Di., 30. März 2021 um 11:47 Uhr schrieb Boris Behrens : > Hi, > I have a couple OSDs that currently get a lot of data, and are running > towards 95% fil

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Rainer Krienke

Hello, yes your assumptions are correct pxa-rbd ist the metadata pool for pxa-ec which uses a erasure coding 4+2 profile. In the last hours ceph repaired most of the damage. One inactive PG remained and in ceph health detail then told me: - HEALTH_WARN Reduced data availability: 1 p

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Rainer Krienke

Hello Frank, the option is actually set. On one of my monitors: # ceph daemon /var/run/ceph/ceph-mon.*.asok config show|grep osd_allow_recovery_below_min_size "osd_allow_recovery_below_min_size": "true", Thank you very much Rainer Am 30.03.21 um 13:20 schrieb Frank Schilder: Hi, this is

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Burkhard Linke

Hi, On 30.03.21 13:05, Rainer Krienke wrote: Hello, yes your assumptions are correct pxa-rbd ist the metadata pool for pxa-ec which uses a erasure coding 4+2 profile. In the last hours ceph repaired most of the damage. One inactive PG remained and in ceph health detail then told me: -

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens

One week later the ceph is still balancing. What worries me like hell is the %USE on a lot of those OSDs. Does ceph resolv this on it's own? We are currently down to 5TB space in the cluster. Rebalancing single OSDs doesn't work well and it increases the "missplaced objects". I thought about letti

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Dan van der Ster

Are those PGs backfilling due to splitting or due to balancing? If it's the former, I don't think there's a way to pause them with upmap or any other trick. -- dan On Tue, Mar 30, 2021 at 2:07 PM Boris Behrens wrote: > > One week later the ceph is still balancing. > What worries me like hell is

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Rainer Krienke

Hello, in between ceph is runing again normally, except for the two osds that are down because of the failed disks. What really helped in my situation was to lower min_size from 5 (k+1) to 4 in my 4+2 erasure code setup. So I am also greatful for the programmer who put the helping hint in c

[ceph-users] Preferred order of operations when changing crush map and pool rules

2021-03-30 Thread Thomas Hukkelberg

Hi all! We run a 1.5PB cluster with 12 hosts, 192 OSDs (mix of NVMe and HDD) and need to improve our failure domain by altering the crush rules and moving rack to pods, which would imply a lot of data movement. I wonder what would the preferred order of operations be when doing such changes to

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens

I would think due to splitting, because the balancer doesn't refuses it's work, because to many misplaced objects. I also think to turn it off for now, so it doesn't begin it's work at 5% missplaced objects. Would adding more hardware help? We wanted to insert another OSD node with 7x8TB disks any

[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-03-30 Thread Sasha Litvak

Any time frame on 14.2.19? On Fri, Mar 26, 2021, 1:43 AM Konstantin Shalygin wrote: > Finally master is merged now > > > k > > Sent from my iPhone > > > On 25 Mar 2021, at 23:09, Simon Oosthoek > wrote: > > > > I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19 > will be avail

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Boris Behrens

I reweighted the OSD to .0 and then forced the backfilling. How long does it take for ceph to free up space? I looks like it was doing this, but it could also be the "backup cleanup job" that removed images from the buckets. Am Di., 30. März 2021 um 14:41 Uhr schrieb Stefan Kooman : > On 3/30/21

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Dan van der Ster

It would be safe to turn off the balancer, yes go ahead. To know if adding more hardware will help, we need to see how much longer this current splitting should take. This will help: ceph status ceph osd pool ls detail -- dan On Tue, Mar 30, 2021 at 3:00 PM Boris Behrens wrote: > > I w

[ceph-users] Re: Resolving LARGE_OMAP_OBJECTS

2021-03-30 Thread David Orman

Hi Ben, That was beyond helpful. Thank you so much for the thoughtful and detailed explanation. That should definitely be added to the documentation, until/unless the dynamic resharder/sharder handle this case (if there is even desire to do so) with versioned objects. Respectfully, David On Tue,

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens

The output from ceph osd pool ls detail tell me nothing, except that the pgp_num is not where it should be. Can you help me to read the output? How do I estimate how long the split will take? [root@s3db1 ~]# ceph status cluster: id: dca79fff-ffd0-58f4-1cff-82a2feea05f4 health: HEALTH

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Dan van der Ster

You started with 1024 PGs, and are splitting to 2048. Currently there are 1946 PGs used .. so it is nearly there at the goal. You need to watch that value 1946 and see if it increases slowly. If it does not increase, then those backfill_toofull PGs are probably splitting PGs, and they are blocked

[ceph-users] Re: Device class not deleted/set correctly

2021-03-30 Thread Stefan Kooman

On 3/25/21 1:05 PM, Nico Schottelius wrote: it seems there is no reference to it in the ceph documentation. Do you have any pointers to it? Not anymore with new Ceph documentation. Out of curiosity, do you have any clue why it's not in there anymore? It might still be, but I cannot find it

[ceph-users] Re: should I increase the amount of PGs?

2021-03-30 Thread Boris Behrens

I raised the backfillfull_ratio to .91 to see what happens, now I am waiting. Some OSDs were around 89-91%, some are around 50-60% The pgp_num is on 1946 since one week. I think this will solve itself, when the cluster becomes a bit more tidy. Am Di., 30. März 2021 um 15:23 Uhr schrieb Dan van der

[ceph-users] Re: Rados gateway static website

2021-03-30 Thread Casey Bodley

this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to access a bucket via rgw_dns_s3website_name, you have to set a website configuration on the bucket - see https://docs.aws.amazon.com/AmazonS3/latest/API/API_PutBucketWebsite.html On Tue, Mar 30, 2021 at 10:05 AM Marcel Kuiper wro

[ceph-users] Re: Upgrade from Luminous to Nautilus now one MDS with could not get service secret

2021-03-30 Thread Dan van der Ster

Hi Robert, We get a handful of verify_authorizer warnings on some of our clusters too but they don't seem to pose any problems. I've tried without success to debug this in the past -- IIRC I started to suspect it was coming from old cephfs kernel clients but got distracted and never reached the bo

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder

Hi, this is odd. The problem with recovery when sufficiently many but less than min_size shards are present should have been resolved with osd_allow_recovery_below_min_size=true. It is really dangerous to reduce min_size below k+1 and, in fact, should never be necessary for recovery. Can you ch

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Stefan Kooman

On 3/30/21 12:55 PM, Boris Behrens wrote: I just move one PG away from the OSD, but the diskspace will not get freed. How did you move? I would suggest you use upmap: ceph osd pg-upmap-items Invalid command: missing required parameter pgid() osd pg-upmap-items [(id|osd.id)>...] : set pg_upm

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder

Dear Rainer, hmm, maybe the option is ignored or not implemented properly. This option set to true should have the same effect as reducing min_size *except* that new writes will not go to non-redundant storage. When reducing min-size, a critically degraded PG will accept new writes, which is th

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder

Sorry about the flow of messages. I forgot to mention this. Looking at other replies, in particular, that the PG in question remained at 4 out of 6 OSDs until you reduced min_size might indicate that peering was blocked for some reason, but completed after the reduction. If this was the order o

[ceph-users] Re: Preferred order of operations when changing crush map and pool rules

2021-03-30 Thread Stefan Kooman

On 3/30/21 3:00 PM, Thomas Hukkelberg wrote: Any thoughts or insight on how to achieve this with minimal data movement and risk of cluster downtime would be welcome! I would do so with Dan's "upmap-remap" script [1]. See [2] for his presentation. We have used that quite a few times now (also

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Josh Baergen

I thought that recovery below min_size for EC pools wasn't expected to work until Octopus. From the Octopus release notes: "Ceph will allow recovery below min_size for Erasure coded pools, wherever possible." Josh On Tue, Mar 30, 2021 at 6:53 AM Frank Schilder wrote: > Dear Rainer, > > hmm, may

[ceph-users] Re: ceph Nautilus lost two disk over night everything hangs

2021-03-30 Thread Frank Schilder

Ahh, right. I saw it fixed here https://tracker.ceph.com/issues/18749 a long time ago, but it seems the back-port never happened. Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Josh Baergen Sent: 30 March 2021 1

[ceph-users] Rados gateway static website

2021-03-30 Thread Marcel Kuiper

despite the examples that can be found on the internet I have troubles setting up a static website that serves from a S3 bucket If anyone could point me in the right direction that would be much appreciated Marcel I created an index.html in the bucket sky gm-rc3-jumphost01@ceph/s3cmd (mast

[ceph-users] Re: forceful remap PGs

2021-03-30 Thread Stefan Kooman

On 3/30/21 3:02 PM, Boris Behrens wrote: I reweighted the OSD to .0 and then forced the backfilling. How long does it take for ceph to free up space? I looks like it was doing this, but it could also be the "backup cleanup job" that removed images from the buckets. I don't have any numbers o

[ceph-users] Re: ceph-fuse false passed X_OK check

2021-03-30 Thread Patrick Donnelly

It's a bug: https://tracker.ceph.com/issues/50060 On Wed, Dec 23, 2020 at 5:53 PM Alex Taylor wrote: > > Hi Patrick, > > Any updates? Looking forward to your reply :D > > > On Thu, Dec 17, 2020 at 11:39 AM Patrick Donnelly wrote: > > > > On Wed, Dec 16, 2020 at 5:46 PM Alex Taylor wrote: > > >

[ceph-users] v14.2.19 Nautilus released

2021-03-30 Thread David Galloway

This is the 19th update to the Ceph Nautilus release series. This is a hotfix release to prevent daemons from binding to loopback network interfaces. All nautilus users are advised to upgrade to this release. Notable Changes --- * This release fixes a regression introduced in v14.2.18

[ceph-users] Re: Rados gateway static website

2021-03-30 Thread Marcel Kuiper

Casey, Many thanks. That did the trick. Regards Marcel Casey Bodley schreef op 2021-03-30 16:48: this error 2039 is ERR_NO_SUCH_WEBSITE_CONFIGURATION. if you want to access a bucket via rgw_dns_s3website_name, you have to set a website configuration on the bucket - see https://docs.aws.amazon

[ceph-users] Re: v14.2.19 Nautilus released

2021-03-30 Thread David Caro

Thanks for the quick release! \o/ On Tue, 30 Mar 2021, 22:30 David Galloway, wrote: > This is the 19th update to the Ceph Nautilus release series. This is a > hotfix release to prevent daemons from binding to loopback network > interfaces. All nautilus users are advised to upgrade to this releas

[ceph-users] Re: Preferred order of operations when changing crush map and pool rules

2021-03-30 Thread Reed Dier

I've not undertaken such a large data movement, The pgupmap script may be of use here, but assuming that its not. But if I were, I would first take many backups of the current crush map. I would set the noreblance and norecover flags. Then I would verify all of the backfill settings are as aggres

[ceph-users] Re: First 6 nodes cluster with Octopus

2021-03-30 Thread DHilsbos

Mabi; We're running Nautilus, and I am not wholly convinced of the "everything in containers" view of the world, so take this with a small grain of salt... 1) We don't run Ubuntu, sorry. I suspect the documentation highlights 18.04 because it's the current LTS release. Personally, if I had a

[ceph-users] First 6 nodes cluster with Octopus

2021-03-30 Thread mabi

Hello, I am planning to setup a small Ceph cluster for testing purpose with 6 Ubuntu nodes and have a few questions mostly regarding planning of the infra. 1) Based on the documentation the OS requirements mentions Ubuntu 18.04 LTS, is it ok to use Ubuntu 20.04 instead or should I stick with 18

40 matches

Mail list logo