[ceph-users] Re: How to apply ceph.conf changes using new tool cephadm

2020-05-05 Thread ceph
I am not absolutly sure but you should be able to do something like ceph config mon set Or try to restart the mon/osd daemon Hth Am 29. April 2020 16:42:31 MESZ schrieb "Gencer W. Genç" : >Hi, > >I just deployed a new cluster with cephadm instead of ceph-deploy. In >tyhe past, If i change ceph

[ceph-users] Add lvm in cephadm

2020-05-05 Thread Simon Sutter
Hello Everyone, The new cephadm is giving me a headache. I'm setting up a new testenvironment, where I have to use lvm partitions, because I don't have more Hardware. I could't find any information about the compatibility of existing lvm partitions and cephadm/octopus. I tried the old metho

[ceph-users] Re: adding block.db to OSD

2020-05-05 Thread Stefan Priebe - Profihost AG
Hello Igor, Am 30.04.20 um 15:52 schrieb Igor Fedotov: > 1) reset perf counters for the specific OSD > > 2) run bench > > 3) dump perf counters. This is OSD 0: # ceph tell osd.0 bench -f plain 12288000 4096 bench: wrote 12 MiB in blocks of 4 KiB in 6.70482 sec at 1.7 MiB/sec 447 IOPS https://

[ceph-users] Re: Add lvm in cephadm

2020-05-05 Thread Simon Sutter
Sorry I missclicked, here the second part: ceph-volume --cluster ceph lvm prepare --data /dev/centos_node1/ceph But that gives me just: Running command: /usr/bin/ceph-authtool --gen-print-key Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd --keyring /var/lib/ceph/boots

[ceph-users] Re: Bluestore - How to review config?

2020-05-05 Thread Herve Ballans
Hi Dave, Probably not complete but I know 2 interesting ways to get configuration of a Bluestore OSD: 1/ the /show-label/ option of /ceph-bluestore-tool/ command Ex: $ sudo ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/ 2/ the /config show/ and /perf dump/ parameters of the

[ceph-users] Re: asynchronous/non-sequential example read and write test codes Librados

2020-05-05 Thread Bobby
Hi Casey, Hi all, Casey thanks a lot for your reply ! That was really helpful. A question please. Do these tests reflect realistic workload? Basically I am profiling (CPU profiling) the computations in these tests. And naturally I am interested in big workload. I have started with CRUSH and her

[ceph-users] Re: RGW and the orphans

2020-05-05 Thread EDH - Manuel Rios
Hi Eric, Expected version to be included your tool in Nautilus? Maybe next reléase? Best Regards Manuel -Mensaje original- De: Katarzyna Myrek Enviado el: lunes, 20 de abril de 2020 12:19 Para: Eric Ivancich CC: EDH - Manuel Rios ; ceph-users@ceph.io Asunto: Re: [ceph-users] RGW and

[ceph-users] Re: adding block.db to OSD

2020-05-05 Thread Igor Fedotov
Hi Stefan, so (surprise!) some DB access counters show a significant difference, e.g. "kv_flush_lat": { "avgcount": 1423, "sum": 0.000906419, "avgtime": 0.00636 }, "kv_sync_lat": { "avgcount": 1423, "sum": 0.

[ceph-users] Re: How to apply ceph.conf changes using new tool cephadm

2020-05-05 Thread Sebastian Wagner
ceph@elchaka.de wrote: > I am not absolutly sure but you should be able to do something like > > ceph config mon set Yes. please use `ceph config ...` cephadm only uses a minimal ceph.conf which only contains the IPs of the other MONs. > > Or try to restart the mon/osd daemon > > Hth > > Am

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Dan van der Ster
Hi Frank, Could you share any ceph-osd logs and also the ceph.log from a mon to see why the cluster thinks all those osds are down? Simply marking them up isn't going to help, I'm afraid. Cheers, Dan On Tue, May 5, 2020 at 4:12 PM Frank Schilder wrote: > > Hi all, > > a lot of OSDs crashed in

[ceph-users] Re: Add lvm in cephadm

2020-05-05 Thread Joshua Schmid
On 20/05/05 08:46, Simon Sutter wrote: > Sorry I missclicked, here the second part: > > > ceph-volume --cluster ceph lvm prepare --data /dev/centos_node1/ceph > But that gives me just: > > Running command: /usr/bin/ceph-authtool --gen-print-key > Running command: /usr/bin/ceph --cluster ceph --n

[ceph-users] Re: Bluestore - How to review config?

2020-05-05 Thread Igor Fedotov
Hi Dave, wouldn't this help (particularly "Viewing runtime settings" section): https://docs.ceph.com/docs/nautilus/rados/configuration/ceph-conf/ Thanks, Igor On 5/5/2020 2:52 AM, Dave Hall wrote: Hello, Sorry if this has been asked before... A few months ago I deployed a small Nautilus c

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Alex Gorbachev
Hi Frank, On Tue, May 5, 2020 at 10:43 AM Frank Schilder wrote: > Dear Dan, > > thank you for your fast response. Please find the log of the first OSD > that went down and the ceph.log with these links: > > https://files.dtu.dk/u/tF1zv5zdc6mmXXO_/ceph.log?l > https://files.dtu.dk/u/hPb5qax2-b6

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Paul Emmerich
Check network connectivity on all configured networks between alle hosts, OSDs running but being marked as down is usually a network problem Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Te

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Dan van der Ster
Hi, The osds are getting marked down due to this: 2020-05-05 15:18:42.893964 mon.ceph-01 mon.0 192.168.32.65:6789/0 292689 : cluster [INF] osd.40 marked down after no beacon for 903.781033 seconds 2020-05-05 15:18:42.894009 mon.ceph-01 mon.0 192.168.32.65:6789/0 292690 : cluster [INF] osd.60 mark

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Alex Gorbachev
On Tue, May 5, 2020 at 11:27 AM Frank Schilder wrote: > I tried that and get: > > 2020-05-05 17:23:17.008 7fbbe700 0 -- 192.168.32.64:0/2061991714 >> > 192.168.32.68:6826/5216 conn(0x7fbbf01d6f80 :-1 > s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 > l=1).handle_connect_reply connect

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Dan van der Ster
OK those requires look correct. While the pgs are inactive there will be no client IO, so there's nothing to pause at this point. In general, I would evict those misbehaving clients with ceph tell mds.* client evict id= For now, keep nodown and noout, let all the PGs get active again. You might n

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Dan van der Ster
ceph osd tree down # shows the down osds ceph osd tree out # shows the out osds there is no "active/inactive" state on an osd. You can force an individual osd to do a soft restart with "ceph osd down " -- this will cause it to restart and recontact mons and osd peers. If that doesn't work, restar

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread brad . swanson
Ditto, I had a bad optic on 48x10 switch. The only way I detected it was my prometheus tcp fail retrans count. Looking back over the previous 4 weeks, I could seen it increment in small bursts, but Ceph was able to handle it and then it went crazy and a bunch of OSD’s just dropped out. __

[ceph-users] Workload in Unit testing

2020-05-05 Thread Bobby
Hi all, Ceph documentation mentions it has two types of tests: *unit tests* (also called make check tests) and *integration tests*. Strictly speaking, the *make check tests* are not “unit tests”, but rather tests that can be run easily on a single build machine after compiling Ceph from source .

[ceph-users] Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Hi all, a lot of OSDs crashed in our cluster. Mimic 13.2.8. Current status included below. All daemons are running, no OSD process crashed. Can I start marking OSDs in and up to get them back talking to each other? Please advice on next steps. Thanks!! [root@gnosis ~]# ceph status cluster:

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Dear Dan, thank you for your fast response. Please find the log of the first OSD that went down and the ceph.log with these links: https://files.dtu.dk/u/tF1zv5zdc6mmXXO_/ceph.log?l https://files.dtu.dk/u/hPb5qax2-b6W9vmp/ceph-osd.2.log?l I can collect more osd logs if this helps. Best regards

[ceph-users] radosgw garbage collection error

2020-05-05 Thread James, GleSYS
Hi, We’ve recently installed a new Ceph cluster running Octopus 15.2.1, and we’re using RGW with an erasure coded backed pool. I started to get a suspicion that deleted objects were not getting cleaned up properly, and I wanted to verify this by checking the garbage collector. That’s when I di

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Situation is improving very slowly. I set nodown,noout,norebalance since all daemons are running, nothing actually crashed. Current status: [root@gnosis ~]# ceph status cluster: id: health: HEALTH_WARN 2 MDSs report slow metadata IOs 1 MDSs report slow requests

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
I tried that and get: 2020-05-05 17:23:17.008 7fbbe700 0 -- 192.168.32.64:0/2061991714 >> 192.168.32.68:6826/5216 conn(0x7fbbf01d6f80 :-1 s=STATE_CONNECTING_WAIT_CONNECT_REPLY_AUTH pgs=0 cs=0 l=1).handle_connect_reply connect got BADAUTHORIZER Strange. = Frank Schilder AIT

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Thanks! Here it is: [root@gnosis ~]# ceph osd dump | grep require require_min_compat_client jewel require_osd_release mimic It looks like we had an extremely aggressive job running on our cluster, completely flooding everything with small I/O. I think the cluster built up a huge backlog and is/

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Its not the time: [root@gnosis ~]# pdsh -w ceph-[01-20] date ceph-01: Tue May 5 17:34:52 CEST 2020 ceph-03: Tue May 5 17:34:52 CEST 2020 ceph-02: Tue May 5 17:34:52 CEST 2020 ceph-04: Tue May 5 17:34:52 CEST 2020 ceph-07: Tue May 5 17:34:52 CEST 2020 ceph-14: Tue May 5 17:34:52 CEST 2020 cep

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Hi Dan, looking at an older thread, I found that "OSDs do not send beacons if they are not active". Is there any way to activate an OSD manually? Or check which ones are inactive? Also, I looked at this here: [root@gnosis ~]# ceph mon feature ls all features supported: [kraken,luminous

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Frank Schilder
Dear all, the command ceph config set mon.ceph-01 mon_osd_report_timeout 3600 saved the day. Within a few seconds, the cluster became: == [root@gnosis ~]# ceph status cluster: id: health: HEALTH_WARN 2 slow ops, oldest one blocked for 10884

[ceph-users] Re: Ceph meltdown, need help

2020-05-05 Thread Marc Roos
But what does mon_osd_report_timeout do, so it resolved your issues? Is this related to the suggested ntp / time sync? From the name I assume that now your monitor just waits longer before it reports the osd as 'unreachable'(?) So your osd has more time to 'announce' itself. And I am a little

[ceph-users] State of SMR support in Ceph?

2020-05-05 Thread Oliver Freyermuth
Dear Cephalopodians, seeing the recent moves of major HDD vendors to sell SMR disks targeted for use in consumer NAS devices (including RAID systems), I got curious and wonder what the current status of SMR support in Bluestore is. Of course, I'd expect disk vendors to give us host-managed SMR

[ceph-users] osd won't start

2020-05-05 Thread Mazzystr
I've been using CentOS 7 and 5.6.10-1.el7.elrepo.x86_64 Linux kernel. After today's update and reboot osd's wont start. # podman run --privileged --pid=host --cpuset-cpus 0,1 --memory 2g --name ceph_osd0 --hostname ceph_osd0 --ip 172.30.0.10 -v /dev:/dev -v /etc/localtime:/etc/localtime:ro -v /etc

[ceph-users] Re: radosgw garbage collection error

2020-05-05 Thread Pritha Srivastava
Hi James, Does radosgw-admin gc list --include-all, give the same error? If yes, can you please open a tracker issue and share rgw and osd logs? Thanks, Pritha On Wed, May 6, 2020 at 12:22 AM James, GleSYS wrote: > Hi, > > We’ve recently installed a new Ceph cluster running Octopus 15.2.1, and

[ceph-users] Re: Bluestore - How to review config?

2020-05-05 Thread lin yunfan
Is there a way to get the block,block.db,block.wal path and size? what if all of them or some of them are colocated in one disk? I can get the info from a wal,db,block colocated osd like below: ceph-bluestore-tool show-label --path /var/lib/ceph/osd/ceph-0/ { "/var/lib/ceph/osd/ceph-0//block"

[ceph-users] Re: radosgw garbage collection error

2020-05-05 Thread James, GleSYS
Hi, Yes, it’s the same error with “—include-all”. I am currently awaiting confirmation of my account creation on the tracker site. In the meantime, here are some logs which I’ve obtained: radosgw-admin gc list --debug-rgw=10 --debug-ms=10: 2020-05-06T06:06:33.922+ 7ff4ccffb700 1 -- [2a00:X