[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
Hi Eugen, Thanks for the reply. If rbd-mirror constantly synchronize changes, what frequency to replay once? I don't find any options I can config. Eugen Block 于2020年6月4日周四 下午2:54写道: > Hi, > > that's the point of rbd-mirror, to constantly replay changes from the > primary image to the remote im

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Eugen Block
The initial sync is a full image sync, the rest is based on the object sets created. There are several options to control the mirroring, for example: rbd_journal_max_concurrent_object_sets rbd_mirror_concurrent_image_syncs rbd_mirror_leader_max_missed_heartbeats and many more. I'm not sure I

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
My condition is that the primary image being used while rbd-mirror sync. I want to get the period between the two times of rbd-mirror transfer the increased data. I will search those options you provided, thanks a lot :) Eugen Block 于2020年6月4日周四 下午3:28写道: > The initial sync is a full image sync,

[ceph-users] Re: 15.2.3 Crush Map Viewer problem.

2020-06-04 Thread Lenz Grimmer
Hi Marco, thank you. It seems as if the REST API output matches the output of "ceph osd tree", but the tree view in the dashboard somehow fails to display all nodes. We will investigate this. I've now submitted your report as a bug on our tracker: https://tracker.ceph.com/issues/45873 Please ma

[ceph-users] Re: Cephadm Hangs During OSD Apply

2020-06-04 Thread Sebastian Wagner
encrypted OSDS should land in the next octopus release: https://tracker.ceph.com/issues/44625 Am 27.05.20 um 20:31 schrieb m...@silvenga.com: > I noticed the luks volumes were open, even though luksOpen hung. I killed > cryptsetup (once per disk) and ceph-volume continued and eventually created

[ceph-users] Re: Octopus 15.2.2 unable to make drives available (reject reason locked)...

2020-06-04 Thread Sebastian Wagner
Hi Marco, note that encrypted OSDs will land in the next octous release. Regarding the locked state, you could run ceph-volume directly on the host to understand the issue better. c-v should give you the reasons. Am 29.05.20 um 03:18 schrieb Marco Pizzolo: > Rebooting addressed > > On Thu,

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Wido den Hollander
On 6/4/20 9:17 AM, Frank Schilder wrote: >> Yes and No. This will cause many CRUSHMap updates where a manual update >> is only a single change. >> >> I would do: >> >> $ ceph osd getcrushmap -o crushmap > > Well, that's a yes and a no as well. > > If you are experienced and edit crush maps on

[ceph-users] Re: Cephadm Setup Query

2020-06-04 Thread Sebastian Wagner
Am 26.05.20 um 08:16 schrieb Shivanshi .: > Hi, > > I am facing an issue on Cephadm cluster setup. Whenever, I try to add > remote devices as OSDs, command just hangs. > > The steps I have followed : > > sudo ceph orch daemon add osd node1:device > >   > > 1. For the setup I have followed t

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-06-04 Thread Sebastian Wagner
sorry for the late response. I'm seeing > Upgrade: It is NOT safe to stop mon.vx-rg23-rk65-u43-130 in the logs. please make sure `ceph mon ok-to-stop vx-rg23-rk65-u43-130` succeeds. Am 22.05.20 um 19:28 schrieb Gencer W. Genç: > Hi Sebastian, > > I cannot see my replies in here. So i put

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Hi George, for replicated rules you can simply create a new crush rule with the new failure domain set to chassis and change any pool's crush rule to this new one. If you have EC pools, then the chooseleaf needs to be edited by hand. I did this before as well. (A really unfortunate side effect

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
> Yes and No. This will cause many CRUSHMap updates where a manual update > is only a single change. > > I would do: > > $ ceph osd getcrushmap -o crushmap Well, that's a yes and a no as well. If you are experienced and edit crush maps on a regular basis, you can go that way. I would still enclo

[ceph-users] Re: ceph orch upgrade stuck at the beginning.

2020-06-04 Thread Gencer W . Genç
Hi Sebastian, No worries about the delay. I just run that command however it returns: $ ceph mon ok-to-stop vx-rg23-rk65-u43-130 Error EBUSY: not enough monitors would be available (vx-rg23-rk65-u43-130-1) after stopping mons [vx-rg23-rk65-u43-130] It seems we have some progress here. In the p

[ceph-users] speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi, I have 15628 misplaced objects that are currently backfilling as follows: 1. pgid:14.3ce1 from:osd.1321 to:osd.3313 2. pgid:14.4dd9 from:osd.1693 to:osd.2980 3. pgid:14.680b from:osd.362 to:osd.3313 These are remnant backfills from a pg-upmap/rebalance campaign after we've added 2

[ceph-users] Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Thomas Gradisnik
We have deployed a small test cluster consisting of three nodes. Each node is running a mon/mgr and two osds (Samsung PM983 3,84TB NVMe split into two partitions), so six osds in total. We started with Ceph 14.2.7 some weeks ago (upgraded to 14.2.9 later) and ran different tests using fio agains

[ceph-users] Re: speed up individual backfills

2020-06-04 Thread Thomas Bennett
Hi, It turns out I was mapping to a problematic OSD. In this case OSD 3313. After disabling the OSD with systemctl on the host, recovery has picked up again and mapped the pgs to new osds. For prosperity, I ran smartctl on osd.3313's device and then I noticed: 5 Reallocated_Sector_Ct 0x0033

[ceph-users] changing acces vlan for all the OSDs - potential downtime ?

2020-06-04 Thread Adrian Nicolae
Hi all, I have a Ceph cluster with a standard setup : - the public network : MONs and OSDs conected in the same agg switch with ports in the same access vlan - private network :  OSDs connected in another switch with a second eth connected in another access vlan I need to change the public

[ceph-users] Re: Nautilus latest builds for CentOS 8

2020-06-04 Thread Giulio Fidente
On 6/4/20 1:17 AM, Anthony D'Atri wrote: > cbs.centos.org offers 14.2.7 packages for el8 eg > https://cbs.centos.org/koji/buildinfo?buildID=28564 but I don’t know anything > about their provenance or nature. > For sure a downloads.ceph.com package would be desirable. upstream CentOS Storage SIG

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread David Orman
* bluestore: common/options.cc: disable bluefs_preextend_wal_files <-- from 15.2.3 changelogs. There was a bug which lead to issues on OSD restart, and I believe this was the attempt at mitigation until a proper bugfix could be put into place. I suspect this might be the cause of the symptoms y

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Janne Johansson
Den tors 4 juni 2020 kl 16:29 skrev David Orman : >* bluestore: common/options.cc: disable bluefs_preextend_wal_files <-- > from 15.2.3 changelogs. There was a bug which lead to issues on OSD > Given that preextended WAL files was mentioned as a speed increasing feature in nautilus 14.2.3 re

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Stephan
Thanks for your fast reply! We just tried all four possible combinations of bluefs_preextend_wal_files and bluefs_buffered_io, but the write-iops in test "usecase1" remain the same. By the way bluefs_preextend_wal_files has been false in 14.2.9 (as in 15.2.3). Any other ideas? David Orman wrot

[ceph-users] nfs-ganesha mount hangs every day since upgrade to nautilus

2020-06-04 Thread Marc Roos
After having to revert back to ceph-fuse upgrading to nautilus, I have also that the nfs-ganesha mount stalls/breaks every day. Probably caused by: 1 clients failing to respond to capability release 2 clients failing to respond to cache pressure 1 MDSs repor

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Mark Nelson
Hi Stephan, We recently ran a set of 3-sample tests looking at 2OSD/NVMe vs 1 OSD/NVMe RBD performance on Nautilus, Octopus, and Master on some of our newer performance nodes with Intel P4510 NVMe drives. Those tests use the librbd fio backend.  We also saw similar randread and seq write per

[ceph-users] Re: Degradation of write-performance after upgrading to Octopus

2020-06-04 Thread Mark Nelson
Oh, one other thing: Check for background work, especially PG balancer.  In all of my tests the balancer was explicitly disabled.  During benchmarks there may be a high background workload affecting client IO if it's constantly rebalancing the number of PGs in the pool. Mark On 6/4/20 11

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Jason Dillaman
On Thu, Jun 4, 2020 at 3:43 AM Zhenshi Zhou wrote: > > My condition is that the primary image being used while rbd-mirror sync. > I want to get the period between the two times of rbd-mirror transfer the > increased data. > I will search those options you provided, thanks a lot :) When using the

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Thanks Frank, Interesting info about the EC profile. I do have an EC pool, but I noticed the following when I dumped the profile: # ceph osd erasure-code-profile get ec22 crush-device-class=hdd crush-failure-domain=host crush-root=default jerasure-per-chunk-alignment=false k=2 m=2 plugin=jerasu

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Yes, that makes total sense. Thanks, George > On Jun 4, 2020, at 2:17 AM, Frank Schilder wrote: > >> Yes and No. This will cause many CRUSHMap updates where a manual update >> is only a single change. >> >> I would do: >> >> $ ceph osd getcrushmap -o crushmap > > Well, that's a yes and a n

[ceph-users] diskprediction_local fails with python3-sklearn 0.22.2

2020-06-04 Thread Eric Dold
Hello the mgr module diskprediction_local fails under ubuntu 20.04 focal with python3-sklearn version 0.22.2 Ceph version is 15.2.3 when the module is enabled i get the following error: File "/usr/share/ceph/mgr/diskprediction_local/module.py", line 112, in serve self.predict_all_devices()

[ceph-users] bad balacing (octopus)

2020-06-04 Thread Ml Ml
Hello, any idea why it´s so bad balanced? e.g.: osd.52 (82%) vs osd.34 (29%) I did run "/usr/bin/ceph osd reweight-by-utilization " by cron for some time, since i was low on space for some time and that helped a bit. What should i do next? Here is some info: root@ceph01:~# ceph -s cluster:

[ceph-users] log_channel(cluster) log [ERR] : Error -2 reading object

2020-06-04 Thread Frank Schilder
Hi all, I found these messages today: 2020-06-04 17:07:57.471 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error -2 reading object 14:e4c5ebb6:::1000203c59b.0002:head 2020-06-04 17:08:04.236 7fa0aa16e700 -1 log_channel(cluster) log [ERR] : Error -2 reading object 14:e4c9a1a1:::1000203ad

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Hmm, So I tried all that, and I got almost all of my PGs being remapped. Crush map looks correct. Is that normal? Thanks, George On Jun 4, 2020, at 2:33 PM, Frank Schilder mailto:fr...@dtu.dk>> wrote: Hi George, you don't need to worry about that too much. The EC profile contains two typ

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Hi George, you don't need to worry about that too much. The EC profile contains two types of information, one part about the actual EC encoding and another part about crush parameters. Unfortunately, actually. Part of this information is mutable after pool creation while the rest is not. Mutabl

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Kyriazis, George
Understand that it’s difficult to debug remotely. :-) In my current scenario I have 5 machines (1 host per chassis), but planning on adding some additional chassis with 4 hosts per chassis in the near future. Currently I am going through the first stage of adding “stub” chassis for the 5 hosts

[ceph-users] Re: rbd-mirror sync image continuously or only sync once

2020-06-04 Thread Zhenshi Zhou
Thank you for the clarification. That's very clear. Jason Dillaman 于2020年6月5日周五 上午12:46写道: > On Thu, Jun 4, 2020 at 3:43 AM Zhenshi Zhou wrote: > > > > My condition is that the primary image being used while rbd-mirror sync. > > I want to get the period between the two times of rbd-mirror trans

[ceph-users] Re: Best way to change bucket hierarchy

2020-06-04 Thread Frank Schilder
Its hard to tell without knowing what the diff is, but from your description I take it that you changed the failure domain for every(?) pool from host to chassis. I don't know what a chassis is in your architecture, but if each chassis contains several host buckets, then yes, I would expect almo

[ceph-users] Re: changing acces vlan for all the OSDs - potential downtime ?

2020-06-04 Thread Konstantin Shalygin
On 6/4/20 4:26 PM, Adrian Nicolae wrote: Hi all, I have a Ceph cluster with a standard setup : - the public network : MONs and OSDs conected in the same agg switch with ports in the same access vlan - private network :  OSDs connected in another switch with a second eth connected in another