date:20220308

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Dan van der Ster

Here's the reason they exit: 7f1605dc9700 -1 osd.97 486896 _committed_osd_maps marked down 6 > osd_max_markdown_count 5 in last 600.00 seconds, shutting down If an osd flaps (marked down, then up) 6 times in 10 minutes, it exits. (This is a safety measure). It's normally caused by a network

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Zakhar Kirpichenko

Hi! I run cephadm-based 16.2.x cluster in production. It's been mostly fine, but not without quirks. Hope this helps. /Z On Tue, Mar 8, 2022 at 6:17 AM norman.kern wrote: > Dear Ceph folks, > > Anyone is using cephadm in product(Version: Pacific)？ I found several bugs > on it and > I really do

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens

Yes, this is something we know and we disabled it, because we ran into the problem that PGs went unavailable when two or more OSDs went offline. I am searching for the reason WHY this happens. Currently we have set the service file to restart=always and removed the StartLimitBurst from the service

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Francois Legrand

Hi, We also had this kind of problems after upgrading to octopus. Maybe you can play with the hearthbeat grace time ( https://docs.ceph.com/en/latest/rados/configuration/mon-osd-interaction/ ) to tell osds to wait a little more before declaring another osd down ! We also try to fix the problem

[ceph-users] Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ulrich Klein

Hi, I just upgraded a small test cluster on Raspberries from pacific 16.2.6 to 16.2.7. The upgrade went without major problems. But now the Ceph Dashboard doesn't work anymore in Safari. It complains about main..js "Line 3 invalid regular expression: invalid group specifier name". It works with

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac

Proxmox = 6.4-8 CEPH = 15.2.15 Nodes = 3 Network = 2x100G / node Disk = nvme Samsung PM-1733 MZWLJ3T8HBLS 4TB nvme Samsung PM-1733 MZWLJ1T9HBJR 2TB CPU = EPYC 7252 CEPH pools = 2 separate pools for each disk type and each disk spliced in 2 OSD's Replica = 3 VM don't do many

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ulrich Klein

Replying to my self :) It seems to be this function: replaceBraces(e) { ==> return e.replace(/(?<=\d)\s*-\s*(?=\d)/g, ".."). replace(/$/g, "{"). replace(/$/g, "}").

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Jay See

We have an old ceph cluster, which is running fine without any problems with cephadm and pacific (16.2.7) on Ubuntu (which was deployed without using cephadm). Now, I am trying to setup one more cluster on CentOS Stream 8 with cephadm, containers are killed or stopped for no reason. On Tue, Mar 8

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Marc

> > VM don't do many writes and i migrated main testing VM's to 2TB pool which > in turns fragments faster. > > > Did a lot of tests and recreated pools and OSD's in many ways but in a > matter of days every time each OSD's gets severely fragmented and loses up > to 80% of write performance (tes

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Marc

> > We have an old ceph cluster, which is running fine without any problems > with cephadm and pacific (16.2.7) on Ubuntu (which was deployed without > using cephadm). > > Now, I am trying to setup one more cluster on CentOS Stream 8 with > cephadm, containers are killed or stopped for no reaso

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread Marc

> > Can't imagine there is no reason. Anyway I think there is a general > misconception that using containers would make it easier for users. > > ceph = learn linux sysadmin + learn ceph > cephadm = learn linux sysadmin + learn ceph + learn containers > Oh forgot ;) croit ceph = learn not

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Boris Behrens

Hi Francois, thanks for the reminder. We offline compacted all of the OSDs when we reinstalled the hosts with the new OS. But actually reinstalling them was never on my list. I could try that and in the same go I can remove all the cache SSDs (when one SSD share the cache for 10 OSDs this is a ho

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

2022-03-08 Thread Francois Legrand

Hi, The last 2 osd I recreated were on december 30 and february 8. I totally agree that ssd cache are a terrible spof. I think that's an option if you use 1 ssd/nvme for 1 or 2 osd, but the cost is then very high. Using 1 ssd for 10 osd increase the risk for almost no gain because the ssd is

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days. (Marc)

2022-03-08 Thread Sasa Glumac

> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac

> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac

> Where is the rados bench before and after your problem? Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 T1 - wr,4M T2 = ro,seq,4M T3 = ro,rand,4M > Total time run 60.0405 Total time run 250.486 Total time run > 600.463 > Total writes made

[ceph-users] Re: 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

2022-03-08 Thread Sasa Glumac

Rados bench before deleting OSD's and recreating them + syncing with fragmentation 0.89 > T1 - wr,4M > Total time run 60.0405 > Total writes made 9997 > Write size 4194304 > Object size4194304 > Bandwidth (MB/sec) 666,017 > Stddev Bandwidth 24.1108 > Max

[ceph-users] Re: Cephadm is stable or not in product?

2022-03-08 Thread David Orman

We use it without major issues, at this point. There are still flaws, but there are flaws in almost any deployment and management system, and this is not unique to cephadm. I agree with the general sentiment that you need to have some knowledge about containers, however. I don't think that's necess

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

2022-03-08 Thread Ernesto Puerta

Hi, This was already fixed in master/quincy, but the pacific backport was never completed (https://github.com/ceph/ceph/pull/45301). I just did that: https://github.com/ceph/ceph/pull/45301 (it should be there for 16.2.8). Kind Regards, Ernesto On Tue, Mar 8, 2022 at 3:55 PM Jozef Rebjak wrot

[ceph-users] Re: "Incomplete" pg's

2022-03-08 Thread Kyriazis, George

Thanks Eugen, Yeah, unfortunately the OSDs have been replaced with new OSDs. Currently the cluster is under rebalancing. I was thinking that I would try the ''osd_find_best_info_ignore_history_les' trick after the cluster has calmed down and there is no extra traffic on the OSDs. Thing is ..

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

2022-03-08 Thread Gaël THEROND

Unexpectedly, everything disappeared and the cluster health went back to its previous state! I think I’ll never have a definitive answer ^^ I’ve been able to find out a really nice way to get the rbd stats/iotop on our prometheus using the mgr plugin too and it’s awesome as we can now better chase

[ceph-users] Re: Understanding RGW multi zonegroup replication topology

2022-03-08 Thread Mark Selby

It has taken me too long to reply to you. I just wanted to say thanks - this was very helpful and answered my question. Thanks for taking the time to provide this information. -- Mark Selby Sr Linux Administrator, The Voleon Group mse...@voleon.com This email is subject to important condi

[ceph-users] RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Mark Selby

I am not sure that what I would like to do is even possible. I was hoping there is someone out there who could chime in on this. We use Ceph RBD and Ceph FS somewhat extensively and are starting on our RGW journey. We have a couple of different groups that would like to be their own tenan

[ceph-users] aws-cli with RGW and cross tenant access

2022-03-08 Thread Mark Selby

We are starting to test out Ceph RGW and have run into a small issue with the aws-cli that amazon publishes. We have a set of developers who use the aws-cli heavily and it seems that this tool does not work with Ceph RGW tenancy. Given user = test01$test01 with bucket buck01 Given user = tes

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Pritha Srivastava

Hi Mark, On Wed, Mar 9, 2022 at 6:57 AM Mark Selby wrote: > I am not sure that what I would like to do is even possible. I was hoping > there is someone out there who could chime in on this. > > > > We use Ceph RBD and Ceph FS somewhat extensively and are starting on our > RGW journey. > > > > W

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

2022-03-08 Thread Pritha Srivastava

Alternatively, if you want to restrict access to s3 resources for different groups of users, then you can do so by creating a role in a tenant, and then create s3 resources and attach tags to them and then use ABAC/ tags to allow a user to access a particular resource (bucket/ object). Details can

[ceph-users] Re: "Incomplete" pg's

2022-03-08 Thread Kyriazis, George

Ok, some progress… I’m describing what I did here, hopefully it will help someone that ended up in the same predicament. I used "ceph-objectstore-tool … —op mark-complete” to mark the incomplete pgs as complete on the primary OSD, and then brought the OSD up. The incomplete pg now has a state

[ceph-users] 回复: Failed in ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring

2022-03-08 Thread huxia...@horebdata.cn

Just to report back the root cause of the above mentioned failures in " ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring" It turns out the culprit was using Samsung SM883 SSD disks as DB/WAL partitions. Replacing SM883 with Intel S4510/4520 SSDs solved the issues. It loo

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[ceph-users] Re: Cephadm is stable or not in product?

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[ceph-users] Ceph Pacific 16.2.7 dashboard doesn't work with Safari

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

[ceph-users] Re: Cephadm is stable or not in product?

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

[ceph-users] Re: Cephadm is stable or not in product?

[ceph-users] Re: Cephadm is stable or not in product?

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[ceph-users] Re: octopus (15.2.16) OSDs crash or don't answer heathbeats (and get marked as down)

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days. (Marc)

[ceph-users] Re: SPAM 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

[ceph-users] 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

[ceph-users] Re: 3 node CEPH PVE hyper-converged cluster serious fragmentation and performance loss in matter of days.

[ceph-users] Re: Cephadm is stable or not in product?

[ceph-users] Re: Ceph Pacific 16.2.7 dashboard doesn't work with Safari

[ceph-users] Re: "Incomplete" pg's

[ceph-users] Re: OSD SLOW_OPS is filling MONs disk space

[ceph-users] Re: Understanding RGW multi zonegroup replication topology

[ceph-users] RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

[ceph-users] aws-cli with RGW and cross tenant access

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

[ceph-users] Re: RGW STS AssumeRoleWithWebIdentity Multi-Tenancy

[ceph-users] Re: "Incomplete" pg's

[ceph-users] 回复: Failed in ceph-osd -i ${osd_id} --mkfs -k /var/lib/ceph/osd/ceph-${osd_id}/keyring

28 matches

Site Navigation

Mail list logo

Footer information