date:20210615

[ceph-users] Module 'devicehealth' has failed:

2021-06-15 Thread Torkil Svensgaard

Hi Looking at this error in v15.2.13: " [ERR] MGR_MODULE_ERROR: Module 'devicehealth' has failed: Module 'devicehealth' has failed: " It used to work. Since the module is always on I can't seem to restart it and I've found no clue as to why it failed. I've tried rebooting all hosts to no

[ceph-users] Re: Upgrading ceph to latest version, skipping minor versions?

2021-06-15 Thread Janne Johansson

Den mån 14 juni 2021 kl 22:48 skrev Matt Larson : > > Looking at the documentation for ( > https://docs.ceph.com/en/latest/cephadm/upgrade/) - I have a question on > whether you need to sequentially upgrade for each minor versions, 15.2.1 -> > 15.2.3 -> ... -> 15.2.XX? > > Can you safely upgrade by

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Sebastian Wagner

Hi Torkil, you should see more information in the MGR log file. Might be an idea to restart the MGR to get some recent logs. Am 15.06.21 um 09:41 schrieb Torkil Svensgaard: Hi Looking at this error in v15.2.13: " [ERR] MGR_MODULE_ERROR: Module 'devicehealth' has failed: Module 'devicehea

[ceph-users] Re: Module 'devicehealth' has failed:

2021-06-15 Thread Torkil Svensgaard

Hi Thanks, I guess this might have something to do with it: " Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify devicehealth.notify: Jun 15 09:44:22 dcn-ceph-01 bash[3278]: debug 2021-06-15T09:44:22.507+ 7f704e4b3700 -1 mgr notify Traceba

[ceph-users] ceph PGs issues

2021-06-15 Thread Aly, Adel

Dears, We have a ceph cluster with 4096 PGs out of with +100 PGs are not active+clean. On top of the ceph cluster, we have a ceph FS, with 3 active MDS servers. It seems that we can’t get all the files out of it because of the affected PGs. The object store has more than 400 million objects. W

[ceph-users] Re: ceph df: pool stored vs bytes_used -- raw or not?

2021-06-15 Thread Konstantin Shalygin

Fired https://tracker.ceph.com/issues/51223 k > On 9 Jun 2021, at 13:20, Igor Fedotov wrote: > > Should we fire another ticket for that? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: Ceph Month June Schedule Now Available

2021-06-15 Thread Mike Perez

Hi everyone, Here's today's schedule for Ceph Month: 9:00ET / 15:00 CEST Dashboard Update [Ernesto] 9:30 ET / 15:30 CEST [lightning] RBD latency with QD=1 bs=4k [Wido, den Hollander] 9:40 ET / 15:40 CEST [lightning] From Open Source to Open Ended in Ceph with Lua [Yuval Lifshitz] Full schedule:

[ceph-users] Failover with 2 nodes

2021-06-15 Thread nORKy

Hi, I'm building a lab with virtual machines. I build a set up with only 2 nodes, 2 osd per nodes and I have a host that use mount.cephfs Each 2 ceph nodes runs services mon + mgr + mds and has cephadm command. If I stop a node, all commands hang. Can't use dashboard, can't use ceph -s or any ce

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Robert Sander

On 15.06.21 15:16, nORKy wrote: > Why is there no failover ?? Because only one MON out of two is not in the majority to build a quorum. Regards -- Robert Sander Heinlein Support GmbH Schwedter Str. 8/9b, 10119 Berlin https://www.heinlein-support.de Tel: 030 / 405051-43 Fax: 030 / 405051-19 A

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread mhnx

It's easy. The problem ise OSD's are still up because there is not enough down mon_osd_min_down_reporters and due to this problem MDS is stucking. The solution is "mon_osd_min_down_reporters = 1" Due to "two node" cluster and "replicated 2" with "chooseleaf host" the reporter count should be set t

[ceph-users] NFS Ganesha ingress parameter not valid?

2021-06-15 Thread Oliver Weinmann

Dear All, I have deployed the latest CEPH Pacific release in my lab and started to check out the new ?stable? NFS Ganesha features. First of all I'm a bit confused which method to actually use to deploy the NFS cluster: cephadm or ceph nfs cluster create? I used "nfs cluster create" for

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Christoph Brüning

Hi, That's right! We're currently evaluating a similar setup with two identical HW nodes (on two different sites), with OSD, MON and MDS each, and both nodes have CephFS mounted. The goal is to build a minimal self-contained shared filesystem that remains online during planned updates and c

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Reed Dier

You have incomplete PGs, which means you have inactive data, because the data isn't there. This will typically only happen when you have multiple concurrent disk failures, or something like that, so I think there is some missing info. >1 osds exist in the crush map but not in the os

[ceph-users] Re: slow ops at restarting OSDs (octopus)

2021-06-15 Thread Manuel Lausch

Hi, yeah. I wasn't aware, that I have set "osd op complaint time" to 5 seconds. AFAIK the default is 32. So I get slow ops already after 5 instead of 32 seconds. Thats why I think no one before have noticed this before. My application which uses the cluster will throw timeouts after 6 seconds, tha

[ceph-users] Strategy for add new osds

2021-06-15 Thread Jorge JP

Hello, I have a ceph cluster with 5 nodes (1 hdd each node). I want to add 5 more drives (hdd) to expand my cluster. What is the best strategy for this? I will add each drive in each node but is a good strategy add one drive and wait to rebalance the data to new osd for add new osd? or maybe..

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread nORKy

Hi, Thank you guys. I deployed a third monitor and failover works. Thanks you Le mar. 15 juin 2021 à 16:15, Christoph Brüning < christoph.bruen...@uni-wuerzburg.de> a écrit : > Hi, > > That's right! > > We're currently evaluating a similar setup with two identical HW nodes > (on two different si

[ceph-users] Re: Strategy for add new osds

2021-06-15 Thread Kai Börnert

Hi, as far as I understand it, you get no real benefit with doing them one by one, as each osd add, can cause a lot of data to be moved to a different osd, even tho you just rebalanced it. The algorithm determining the placement of pg's does not take the current/historic placement into acco

[ceph-users] Re: Strategy for add new osds

2021-06-15 Thread DHilsbos

Personally, when adding drives like this, I set noin (ceph osd set noin), and norebalance (ceph osd set norebalance). Like your situation, we run smaller clusters; our largest cluster only has 18 OSDs. That keeps the cluster from starting data moves until all new drives are in place. Don't fo

[ceph-users] CephFS mount fails after Centos 8.4 Upgrade

2021-06-15 Thread Ackermann, Christoph

Hello all, after upgrading Centos clients to version 8.4 CephFS ( Kernel 4.18.0-305.3.1.el8 ) mount did fail. Message: *mount error 110 = Connection timed out* ..unfortunately the kernel log was flooded with zeros... :-( The monitor connection seems to be ok, but libceph said: kernel: libceph:

[ceph-users] Re: CephFS mount fails after Centos 8.4 Upgrade

2021-06-15 Thread Dan van der Ster

Looks like this: https://tracker.ceph.com/issues/51112 On Tue, Jun 15, 2021 at 5:48 PM Ackermann, Christoph wrote: > > Hello all, > > after upgrading Centos clients to version 8.4 CephFS ( Kernel > 4.18.0-305.3.1.el8 ) mount did fail. Message: *mount error 110 = > Connection timed out* > ..unf

[ceph-users] Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn

Dear Cephers, I encountered the following networking issue several times, and i wonder whether there is a solution for networking HA solution. We build ceph using L2 multi chassis link aggregation group (MC-LAG ) to provide switch redundancy. On each host, we use 802.3ad, LACP mode for NIC re

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Reed Dier

Note: I am not entirely sure here, and would love other input from the ML about this, so take this with a grain of salt. You don't show any unfound objects, which I think is excellent news as far as data loss. >>96 active+clean+scrubbing+deep+repair The deep scrub + repair seems au

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Burkhard Linke

Hi, On 15.06.21 16:15, Christoph Brüning wrote: Hi, That's right! We're currently evaluating a similar setup with two identical HW nodes (on two different sites), with OSD, MON and MDS each, and both nodes have CephFS mounted. The goal is to build a minimal self-contained shared filesystem

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn

My big worry is about, when a single link under a bond breaks, it breaks hardly such that the whole bond does not work. How to make it "failover" in such cases? best regards, samuel huxia...@horebdata.cn From: Anthony D'Atri Date: 2021-06-15 18:22 To: huxia...@horebdata.cn Subject: Re: [c

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Serkan Çoban

Do you observe the same behaviour when you pull a cable? Maybe a flapping port might cause this kind of behaviour, other than that you should't see any network disconnects. Are you sure about LACP configuration, what is the output of 'cat /proc/net/bonding/bond0' On Tue, Jun 15, 2021 at 7:19 PM hu

[ceph-users] Re: Failover with 2 nodes

2021-06-15 Thread Jamie Fargen

This also sounds like a possible GlusterFS use case. Regards, -Jamie On Tue, Jun 15, 2021 at 12:30 PM Burkhard Linke < burkhard.li...@computational.bio.uni-giessen.de> wrote: > Hi, > > On 15.06.21 16:15, Christoph Brüning wrote: > > Hi, > > > > That's right! > > > > We're currently evaluating a

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn

When i pull out the cable, then the bond is working properly. Does it mean that the port is somehow flapping? Ping can still work, but the iperf test yields very low results. huxia...@horebdata.cn From: Serkan Çoban Date: 2021-06-15 18:47 To: huxia...@horebdata.cn CC: ceph-users Subject: R

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Andrew Walker-Brown

With an unstable link/port you could see the issues you describe. Ping doesn’t have the packet rate for you to necessarily have a packet in transit at exactly the same time as the port fails temporarily. Iperf on the other hand could certainly show the issue, higher packet rate and more likely

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Anthony D'Atri

> On Jun 15, 2021, at 10:26 AM, Andrew Walker-Brown > wrote: > > With an unstable link/port you could see the issues you describe. Ping > doesn’t have the packet rate for you to necessarily have a packet in transit > at exactly the same time as the port fails temporarily. Iperf on the othe

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster

Hi Ilya, We're now hitting this on CentOS 8.4. The "setmaxosd" workaround fixed access to one of our clusters, but isn't working for another, where we have gaps in the osd ids, e.g. # ceph osd getmaxosd max_osd = 553 in epoch 691642 # ceph osd tree | sort -n -k1 | tail 541 ssd 0.87299

[ceph-users] problem using gwcli; package dependancy lockout

2021-06-15 Thread Philip Brown

I'm trying to update a ceph octopus install, to add an iscsi gateway, using ceph-ansible, and gwcli wont run for me. The ansible run went well.. but when I try to actually use gwcli, I get (blahblah) ImportError: No module named rados which isnt too surprising, since "python-rados" is not insta

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster

Replying to own mail... On Tue, Jun 15, 2021 at 7:54 PM Dan van der Ster wrote: > > Hi Ilya, > > We're now hitting this on CentOS 8.4. > > The "setmaxosd" workaround fixed access to one of our clusters, but > isn't working for another, where we have gaps in the osd ids, e.g. > > # ceph osd getmax

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Ackermann, Christoph

Dan, sorry, we have no gaps in osd numbering: isceph@ceph-deploy:~$ sudo ceph osd ls |wc -l; sudo ceph osd tree | sort -n -k1 |tail 76 [..] 73ssd0.28600 osd.73 up 1.0 1.0 74ssd0.27689 osd.74 up 1.0 1.0

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Dan van der Ster

Hi Christoph, What about the max osd? If "ceph osd getmaxosd" is not 76 on this cluster, then set it: `ceph osd setmaxosd 76`. -- dan On Tue, Jun 15, 2021 at 8:54 PM Ackermann, Christoph wrote: > > Dan, > > sorry, we have no gaps in osd numbering: > isceph@ceph-deploy:~$ sudo ceph osd ls |wc -l

[ceph-users] Re: cephfs mount problems with 5.11 kernel - not a ipv6 problem

2021-06-15 Thread Ackermann, Christoph

Hi Dan, Thanks for the hint, i'll try this tomorrow with a test bed first. This evening I had to fix some Bareos client systems to get a quiet sleep. ;-) Will give you feedback asap. Best regards, Christoph Am Di., 15. Juni 2021 um 21:03 Uhr schrieb Dan van der Ster < d...@vanderster.com>: >

[ceph-users] Re: ceph PGs issues

2021-06-15 Thread Aly, Adel

Hi Reed, Thank you for getting back to us. We had indeed several disk failures at the same time. Regarding the OSD map, we have an OSD that failed and we needed to remove but we didn't update the crushmap. The question here, is it safe to update the OSD crushmap without affecting the data ava

[ceph-users] Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread huxia...@horebdata.cn

I run 2x 10G on my hosts, and i would tolerate one bond with one link down. From what you suggest, i will check link monitoring, to make sure the failing link will be removed automatically, without the requirement for manually pulling out the cable. thanks and best regards, samuel huxia.

[ceph-users] Fwd: Re: Issues with Ceph network redundancy using L2 MC-LAG

2021-06-15 Thread Joe Comeau

We also run with Dell VLT switches (40 GB) everything is active/active, so multiple paths as Andrew describes in his config Our config allows us: bring down one of the switches for upgrades bring down an iscsi gatway for patching all the while at least one path is up and servicing Thanks

[ceph-users] Ceph monitor won't start after Ubuntu update

2021-06-15 Thread Petr

Hello Ceph-users, I've upgraded my Ubuntu server from 18.04.5 LTS to Ubuntu 20.04.2 LTS via 'do-release-upgrade', during that process ceph packages were upgraded from Luminous to Octopus and now ceph-mon daemon(I have only one) won't start, log error is: "2021-06-15T20:23:41.843+ 7fbb55e9b54

[ceph-users] How to orch apply single site rgw with custom front-end

2021-06-15 Thread Vladimir Brik

Hello How can I use ceph orch apply to deploy single site rgw daemons with custom frontend configuration? Basically, I have three servers in a DNS round-robin, each running a 15.2.12 rgw daemon with this configuration: rgw_frontends = civetweb num_threads=5000 port=443s ssl_certificate=/etc/

[ceph-users] JSON output schema

2021-06-15 Thread Vladimir Prokofev

Good day. I'm writing some code for parsing output data for monitoring purposes. The data is that of "ceph status -f json", "ceph df -f json", "ceph osd perf -f json" and "ceph osd pool stats -f json". I also need support for all major CEPH releases, starting with Jewel till Pacific. What I've st

[ceph-users] Re: Mon crash when client mounts CephFS

2021-06-15 Thread Phil Merricks

Thanks for the replies folks. This one was resolved, I wish I could tell you I know what I changed to fix it, but there were several undocumented changes to the deployment script I'm using whilst I was distracted by something else.. Tearing down and redeploying today seems to not be suffering from

[ceph-users] Docs on Containerized Mon Maintenance

2021-06-15 Thread Phil Merricks

Hey folks, I'm working through some basic ops drills, and noticed what I think is an inconsistency in the Cephadm Docs. Some Googling appears to show this is a known thing, but I didn't find a clear direction on cooking up a solution yet. On a cluster with 5 mons, 2 were abruptly removed when th

43 matches

Mail list logo