[ceph-users] radosgw lost config during upgrade 14.2.16 -> 21

2021-05-14 Thread Jan Kasprzak
Hello, I have just upgraded my cluster from 14.2.16 to 14.2.21, and after the upgrade, radosgw was listening on the default port 7480 instead of the SSL port it used before the upgrade. It might be I mishandled "ceph config assimilate-conf" previously or forgot to restart radosgw after the assimil

[ceph-users] rbd cp versus deep cp?

2021-05-24 Thread Jan Kasprzak
Hello, Ceph users, what is the difference between "rbd cp" and "rbd deep cp"? What I need to do is to make a copy of the rbd volume one of our users inadveredly resized to a too big size, shrink the copied image to the expected size, verify that everything is OK, and then delete the origi

[ceph-users] Re: rbd cp versus deep cp?

2021-05-25 Thread Jan Kasprzak
Eugen, Eugen Block wrote: : Mykola explained it in this thread [1] a couple of months ago: : : `rbd cp` will copy only one image snapshot (or the image head) to the : destination. : : `rbd deep cp` will copy all image snapshots and the image head. Thanks for the explanation. I have created a pu

[ceph-users] Unprotect snapshot: device or resource busy

2021-06-30 Thread Jan Kasprzak
Hello, Ceph users, How can I figure out why it is not possible to unprotect a snapshot in a RBD image? I use this RBD pool for OpenNebula, and somehow there is a snapshot in one image, which OpenNebula does not see. So I wanted to delete the snapshot: # rbd info one/one-1312 rbd image 'on

[ceph-users] Re: Unprotect snapshot: device or resource busy

2021-07-01 Thread Jan Kasprzak
Ilya Dryomov wrote: : On Thu, Jul 1, 2021 at 8:37 AM Jan Kasprzak wrote: : > : > Hello, Ceph users, : > : > How can I figure out why it is not possible to unprotect a snapshot : > in a RBD image? I use this RBD pool for OpenNebula, and somehow there : > is a snapshot i

[ceph-users] [solved] Unprotect snapshot: device or resource busy

2021-07-01 Thread Jan Kasprzak
Ilya Dryomov wrote: : On Thu, Jul 1, 2021 at 8:37 AM Jan Kasprzak wrote: : > : > # rbd snap unprotect one/one-1312@snap : > 2021-07-01 08:28:40.747 7f3cb6ffd700 -1 librbd::SnapshotUnprotectRequest: cannot unprotect: at least 1 child(ren) [68ba8e7bace188] in pool 'one' : > 2

[ceph-users] Re: [solved] Unprotect snapshot: device or resource busy

2021-07-01 Thread Jan Kasprzak
Ilya Dryomov wrote: : On Thu, Jul 1, 2021 at 10:50 AM Jan Kasprzak wrote: : > : > Ilya Dryomov wrote: : > : On Thu, Jul 1, 2021 at 8:37 AM Jan Kasprzak wrote: : > : > : > : > # rbd snap unprotect one/one-1312@snap : > : > 2021-07-01 08:28:40.747 7f3cb6ffd700 -1 librb

[ceph-users] Radosgw replicated -> EC pool

2024-01-02 Thread Jan Kasprzak
Hello, Ceph users, what is the best way how to change the storage layout of all buckets in radosgw? I have default.rgw.buckets.data pool as replicated, and I want to use an erasure-coded layout instead. One way is to use cache tiering as described here: https://cephnotes.ksperis.com/blog

[ceph-users] Re: Ceph as rootfs?

2024-01-04 Thread Jan Kasprzak
Hello, Jeremy Hansen wrote: : Is it possible to use Ceph as a root filesystem for a pxe booted host? I am not sure about CephFS, but as for Ceph RBD, it should definitely be possible to get an IP address and run "rbd map" from the initrd, and then use /dev/rbd0 as a root device. -Yenya

[ceph-users] ceph -s: wrong host count

2024-01-08 Thread Jan Kasprzak
Hello, Ceph users! I have recently noticed that when I reboot a single ceph node, ceph -s reports "5 hosts down" instead of one. The following is captured during reboot of a node with two OSDs: health: HEALTH_WARN noout flag(s) set 2 osds down 5 hos

[ceph-users] Re: ceph -s: wrong host count

2024-01-08 Thread Jan Kasprzak
Hi Eugen, Eugen Block wrote: : you probably have empty OSD nodes in your crush tree. Can you send : the output of 'ceph osd tree'? You are right, there were 4 hosts in the crush tree, which I removed from the cluster and repurposed a while ago. I have edited the CRUSH map to remove the ho

[ceph-users] Keyring location for ceph-crash?

2024-01-18 Thread Jan Kasprzak
Hello, Ceph users, what is the correct location of keyring for ceph-crash? I tried to follow this document: https://docs.ceph.com/en/latest/mgr/crash/ # ceph auth get-or-create client.crash mon 'profile crash' mgr 'profile crash' > /etc/ceph/ceph.client.crash.keyring and copy this file

[ceph-users] Re: Keyring location for ceph-crash?

2024-01-19 Thread Jan Kasprzak
suggest to open a tracker issue. : : Thanks, : Eugen : : Zitat von Jan Kasprzak : : : >Hello, Ceph users, : > : >what is the correct location of keyring for ceph-crash? : >I tried to follow this document: : > : >https://docs.ceph.com/en/latest/mgr/crash/ : > : ># ceph a

[ceph-users] RadosGW manual deployment

2024-01-28 Thread Jan Kasprzak
Hi all, how can radosgw be deployed manually? For Ceph cluster deployment, there is still (fortunately!) a documented method which works flawlessly even in Reef: https://docs.ceph.com/en/latest/install/manual-deployment/#monitor-bootstrapping But as for radosgw, there is no such descript

[ceph-users] Re: RadosGW manual deployment

2024-01-29 Thread Jan Kasprzak
Hello, Janne, Janne Johansson wrote: > Den mån 29 jan. 2024 kl 08:11 skrev Jan Kasprzak : > > > > Is it possible to install a new radosgw instance manually? > > If so, how can I do it? > > We are doing it, and I found the same docs issue recently, so Zac > p

[ceph-users] Re: RadosGW manual deployment

2024-01-29 Thread Jan Kasprzak
Hello, Eugen, Eugen Block wrote: > Janne was a bit quicker than me, so I'll skip my short instructions > how to deploy it manually. But your (cephadm managed) cluster will > complain about "stray daemons". There doesn't seem to be a way to > deploy rgw daemons manually with the cephadm too

[ceph-users] RBD mirroring to an EC pool

2024-02-02 Thread Jan Kasprzak
Hello, Ceph users, I would like to use my secondary Ceph cluster for backing up RBD OpenNebula volumes from my primary cluster using mirroring in image+snapshot mode. Because it is for backups only, not a cold-standby, I would like to use erasure coding on the secondary side to save a disk

[ceph-users] just-rebuilt mon does not join the cluster

2022-09-08 Thread Jan Kasprzak
Hello, I had to rebuld a data directory of one of my mons, but now I can't get the new mon to join the cluster. What I did was based on this documentation: https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/ ceph mon remove mon1 ssh root@mon1 mkdir /var/lib/ceph/mon/tmp mkdi

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-08 Thread Jan Kasprzak
Jan Kasprzak wrote: : Hello, : : I had to rebuld a data directory of one of my mons, but now I can't get : the new mon to join the cluster. What I did was based on this documentation: : https://docs.ceph.com/en/latest/rados/operations/add-or-rm-mons/ : : ceph mon remove mon1 : : ssh

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-08 Thread Jan Kasprzak
Hello, Frank, Frank Schilder wrote: : Might be a problem I had as well. Try setting : : mon_sync_max_payload_size 4096 : : If you search this list for that you will find the background. Thanks. I did ceph tell mon.* config set mon_sync_max_payload_size 4096 ceph config s

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-09 Thread Jan Kasprzak
TL;DR: my cluster is working now. Details and further problems below: Jan Kasprzak wrote: : I did : : ceph tell mon.* config set mon_sync_max_payload_size 4096 : ceph config set mon mon_sync_max_payload_size 4096 : : and added "mon_sync_max_payload_size = 4096" into the [global] se

[ceph-users] Re: just-rebuilt mon does not join the cluster

2022-09-13 Thread Jan Kasprzak
Hello, Stefan Kooman wrote: : Hi, : : On 9/9/22 10:53, Frank Schilder wrote: : >Is there a chance you might have seen this https://tracker.ceph.com/issues/49231 ? : > : >Do you have network monitoring with packet reports? It is possible though that you have observed something new. : > :

[ceph-users] OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Hello, I am trying to upgrade my Ceph cluster (v15.2.15) from CentOS 7 to CentOS 8 stream. I upgraded monitors (a month or so ago), and now I want to upgrade OSDs: for now I upgraded one host with two OSDs: I kept the partitions where OSD data live (I have separate db on NVMe partition and

[ceph-users] Re: OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Sebastian, Sebastian Knust wrote: : On 01.12.21 17:31, Jan Kasprzak wrote: : >In "ceph -s", they "2 osds down" : >message disappears, and the number of degraded objects steadily decreases. : >However, after some time the number of degraded objects starts go

[ceph-users] [solved] Re: OSD repeatedly marked down

2021-12-01 Thread Jan Kasprzak
Jan Kasprzak wrote: [...] : So I don't think my problem is OOM. It might be communication, : but I tried to tcpdump and look for example for ICMP port unreachable : messages, but nothing interesting there. D'oh. Wrong prefix length of public_network in ceph.conf, copied fr

[ceph-users] Re: Migration from CentOS7/Nautilus to CentOS Stream/Pacific

2021-12-08 Thread Jan Kasprzak
Carlos, Carlos Mogas da Silva wrote: : >From what I can gather, this will not be smooth at all, since I can't make an inplace upgrade of the : OS first and then Ceph and neither other way around. So the idea is to create a total new Ceph : cluster from scratch and migrate the data from o

[ceph-users] What is "register_cache_with_pcm not using rocksdb"?

2022-03-22 Thread Jan Kasprzak
Hello, Ceph users, what does the following message mean? Mar 22 11:59:07 mon2.host.name ceph-mon[1148]: 2022-03-22T11:59:07.286+0100 7f32d2b07700 -1 mon.mon2@1(peon).osd e2619840 register_cache_with_pcm not using rocksdb It appears in the journalctl -u ceph-mon@ on all three mons of my

[ceph-users] Re: Ceph SSH orchestrator?

2020-07-07 Thread Jan Kasprzak
Hello, Ceph users, Lars Täuber wrote: : +1 from me : : I also hope for a bare metal solution for the upcoming versions. At the moment it is a show stopper for an upgrade to Octopus. Also +1. Not having a unix-style deployment tool is a show-stopper for me as well. So far I keep my serve

[ceph-users] Ceph on CentOS 8?

2019-12-02 Thread Jan Kasprzak
Hello, Ceph users, does anybody use Ceph on recently released CentOS 8? Apparently there are no el8 packages neither at download.ceph.com, nor in the native CentOS package tree. I am thinking about upgrading my cluster to C8 (because of other software running on it apart from Ceph). Do el7

[ceph-users] Merge DB/WAL back to the main device?

2025-02-03 Thread Jan Kasprzak
Hi all, while reading a sibling thread about moving DB/WAL to a separate device, I wonder whether is it possible to go the other way round as well, i.e. to remove a metadata device from an OSD and merge metadata back to the main storage? What I am trying to do: My OSD nodes are 1U boxes

[ceph-users] Re: Merge DB/WAL back to the main device?

2025-02-05 Thread Jan Kasprzak
Holger, Eugen, thanks! I tried the ceph-volume approach, and it worked. The only strange thing was that "ceph osd metadata $ID | grep devices" reports "bluefs_db_devices": "nvme1n1", while in fact the db+wal LV/PV/VG is on a _partition_ of that device, nvme1n1p4. Another problem is

[ceph-users] Re: Measuring write latency (ceph osd perf)

2025-02-06 Thread Jan Kasprzak
hs ago on a large bare-metal server running RHEL 9 - for some reaason, the storage operations went slower and slower, and only the reboot fixed this. Cheers, -Yenya > > On 3 Jan 2025, at 11:37, Jan Kasprzak wrote: > > > > Hello, ceph users, > > > > TL;DR: how c

[ceph-users] pgs not deep-scrubbed in time

2024-12-18 Thread Jan Kasprzak
Hello, Ceph users, a question/problem related to deep scrubbing: I have a HDD-based Ceph 18 cluster currently with 34 osds and 600-ish pgs. In order to avoid latency peaks which apparently correlate with HDD being 100 % busy for several hours during a deep scrub, I wanted to relax the scr

[ceph-users] Re: pgs not deep-scrubbed in time

2024-12-18 Thread Jan Kasprzak
Hi Eugen, Eugen Block wrote: > check out the docs [0] or my blog post [1]. Either set the new interval > globally, or at least for the mgr as well, otherwise it will still check for > the default interval. Thanks for the pointers. I did ceph config set global osd_deep_scrub_interval 2592

[ceph-users] Measuring write latency (ceph osd perf)

2025-01-03 Thread Jan Kasprzak
Hello, ceph users, TL;DR: how can I look into ceph cluster write latency issues? Details: we have a HDD-based cluster (with NVMe for metadata), about 20 hosts, 2 OSD per host, mostly used as RBD storage for QEMU/KVM virtual machines. >From time to time our users complain about write laten

[ceph-users] How to clear the "slow operations" warning?

2025-05-29 Thread Jan Kasprzak
Hello, Ceph users, TL;DR: how to clear/acknowledge the following warning from ceph -s? health: HEALTH_WARN 1 OSD(s) experiencing slow operations in BlueStore Details: This is caused by a bad sector on the physical HDD - the time of this warning is the same as the most rec

[ceph-users] Re: How to clear the "slow operations" warning?

2025-06-17 Thread Jan Kasprzak
Hello again, I still have problem with occasional "slow operations in BlueStore" warning, which I don't know how to clear/acknowledge except restarting that OSD process (or doihg "ceph osd down/up" cycle). When I previously asked about this several weeks ago, the underlying cause was HDD