[ceph-users] Re: 回复: [ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
Hi, I just tried with 'ceph-volume lvm migrate' in Octopus but it doesn't really work. I'm not sure if I'm missing something here, but I believe it's again the already discussed containers issue. To be able to run the command for an OSD the OSD has to be offline, but then you don't have a

[ceph-users] Re: osd marked down

2021-09-29 Thread Eugen Block
Just to clarify, you didn't simply import the unchanged keyring but modified it to reflect the actual key of OSD.3, correct? If not, run 'ceph auth get osd.3' first and set the key in the osd.3.export file before importing it to ceph. Zitat von Abdelillah Asraoui : i have created keyring

[ceph-users] Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

2021-09-29 Thread Javier Cacheiro
Hi David, After some investigation, it seems that it affects all osd nodes, in a way that the same node sometimes is being announced through the cluster_network (10.114) and others through the public_network (10.113). Since each node has 12 OSDs I suspect it could depends on the specific OSD. Thi

[ceph-users] Re: S3 Bucket Notification requirement

2021-09-29 Thread Yuval Lifshitz
aws-cli v2 do not support the old signature types. can you please install aws-cli v1 [1] and try with it? [1] https://docs.aws.amazon.com/cli/latest/userguide/install-linux.html On Mon, Sep 27, 2021 at 6:45 PM Sanjeev Jha wrote: > Hi Yuval, > > I have changed the sns signature version as sugges

[ceph-users] Re: 16.2.6: clients being incorrectly directed to the OSDs cluster_network address

2021-09-29 Thread Javier Cacheiro
Digging further and checking osd metadata I have what it seems like a bug assigning the addresses. In most OSDs looking at ceph osd metadata the result is fine and the front addresses are correctly configured through the 10.113 public network, like this one: == osd.0 == "back_addr": "[v2: 10.

[ceph-users] Re: Cephadm set rgw SSL port

2021-09-29 Thread Sebastian Wagner
Here you go: https://github.com/ceph/ceph/pull/43332 Am 28.09.21 um 15:49 schrieb Sebastian Wagner: > Am 28.09.21 um 15:12 schrieb Daniel Pivonka: >> Hi, >> >> 1. I believe the field is called 'rgw_frontend_port' >> 2. I don't think something like that exists but probably should > > At least for

[ceph-users] Re: [EXTERNAL] RE: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-29 Thread Igor Fedotov
Hi Dave, I think it's your disk sizing/utilization what makes your setup rather unique and apparently causes the issue. First of all you're using custom 4K min_alloc_size which wasn't adapted before Pacific, aren't you? 2021-09-08T10:42:02.049+ 7f705c4f2f00  1 bluestore(/var/lib/ceph/o

[ceph-users] Re: 回复: Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Igor Fedotov
Hi Eugen, indeed this looks like an issue related to containerized deployment, "ceph-volume lvm migrate" expects osd folder to be under /var/lib/ceph/osd: > stderr: 2021-09-29T06:56:24.787+ 7fde05b96180 -1 bluestore(/var/lib/ceph/osd/ceph-1) _lock_fsid failed to lock /var/lib/ceph/osd/ce

[ceph-users] Re: [EXTERNAL] RE: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-29 Thread Igor Fedotov
On 9/21/2021 10:44 AM, Dave Piper wrote: I still can't find a way to get ceph-bluestore-tool working in my containerized deployment. As soon as the OSD daemon stops, the contents of /var/lib/ceph/osd/ceph- are unreachable. Some speculations on the above. /var/lib/ceph/osd/ceph- is just a m

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
I’ve not tried it, but how about: cephadm shell -n osd.0 then run “ceph-volume” commands in the newly opened shell. The directory structure seems fine. $ sudo cephadm shell -n osd.0 Inferring fsid e88d509a-f6fc-11ea-b25d-a0423f3ac864 Inferring config /var/lib/ceph/e88d509a-f6fc-11ea-b25d-a0423f

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
The OSD has to be stopped in order to migrate DB/WAL, it can't be done live. ceph-volume requires a lock on the device. Zitat von 胡 玮文 : I’ve not tried it, but how about: cephadm shell -n osd.0 then run “ceph-volume” commands in the newly opened shell. The directory structure seems fine.

[ceph-users] Leader election loop reappears

2021-09-29 Thread Manuel Holtgrewe
Dear all, I was a bit too optimistic in my previous email. It looks like the leader election loop reappeared. I could fix it by stopping the rogue mon daemon but I don't know how to fix it for good. I'm running a 16.2.6 Ceph cluster on CentOS 7.9 servers (6 servers in total). I have about 35 HDDs

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread 胡 玮文
Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run ceph-volume in the new shell. 发件人: Eugen Block 发送时间: 2021年9月29日 21:40 收

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
That's what I did and pasted the results in my previous comments. Zitat von 胡 玮文 : Yes. And “cephadm shell” command does not depend on the running daemon, it will start a new container. So I think it is perfectly fine to stop the OSD first then run the “cephadm shell” command, and run cep

[ceph-users] Re: prometheus - figure out which mgr (metrics endpoint) that is active

2021-09-29 Thread Karsten Nielsen
OK thanks for that explanation. Would be awesome if you got time to do the patches upstream. It does seem like a lot of work. I will get cracking at it. On 28-09-2021 22:38, David Orman wrote: We scrape all mgr endpoints since we use external Prometheus clusters, as well. The query results will

[ceph-users] rgw user metadata default_storage_class not honnored

2021-09-29 Thread Scheurer François
Dear All The rgw user metadata "default_storage_class" is not working as expected on Nautilus 14.2.15. See the doc: https://docs.ceph.com/en/nautilus/radosgw/placement/#user-placement S3 API PUT with the header x-amz-storage-class:NVME is working as expected. But without this header RGW sh

[ceph-users] Failing to mount PVCs

2021-09-29 Thread Fatih Ertinaz
Hi, We recently started to observe issues similar to the following in our cluster environment: Warning FailedMount 31s (x8 over 97s) kubelet, ${NODEIP} MountVolume.SetUp failed for volume "${PVCNAME}" : mount command failed, status: Failure, reason: failed to mount volume /dev/rbd2 [ext4] to /va

[ceph-users] Re: [EXTERNAL] RE: OSDs flapping with "_open_alloc loaded 132 GiB in 2930776 extents available 113 GiB"

2021-09-29 Thread Dave Piper
Some interesting updates on our end. This cluster (condor) is in a multisite RGW zonegroup with another cluster (albans). Albans is still on nautilus and was healthy back when we started this thread. As a last resort, we decided to destroy condor and recreate it, putting it back in the zonegro

[ceph-users] Write Order during Concurrent S3 PUT on RGW

2021-09-29 Thread Scheurer François
Dear All, RGW provides atomic PUT in order to guarantee write consistency. cf: https://ceph.io/en/news/blog/2011/atomicity-of-restful-radosgw-operations/ But my understanding is that the are no guarantee regarding the PUT order sequence. So basically, if doing a storage class migration: aws s

[ceph-users] Re: osd marked down

2021-09-29 Thread Abdelillah Asraoui
I must have imported osd.2 key instead, now osd.3 has the same key as osd.2 ceph auth import -i osd.3.export How do we update this ? thanks! On Wed, Sep 29, 2021 at 2:13 AM Eugen Block wrote: > Just to clarify, you didn't simply import the unchanged keyring but > modified it to reflect th

[ceph-users] [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread Andrew Gunnerson
Hello all, I'm trying to troubleshoot a test cluster that is attempting to deploy an old quay.io/ceph/ceph@sha256: image that no longer exists when adding a new host. The cluster is running 16.2.6 and was deployed last week with: cephadm bootstrap --mon-ip $(facter -p ipaddress) --allow-fqdn

[ceph-users] Re: Leader election loop reappears

2021-09-29 Thread DHilsbos
Manuel; Reading through this mailing list this morning, I can't help but mentally connect your issue to Javier's issue. In part because you're both running 16.2.6. Javier's issue seems to be that OSDs aren't registering public / cluster network addresses correctly. His most recent message in

[ceph-users] Re: Leader election loop reappears

2021-09-29 Thread Manuel Holtgrewe
Hi, thanks for the suggestion. In the case that I again get a rogue MON, I'll try to do this. I'll also need to figure out then how to pull the meta data from the host, might be visible with `docker inspect`. Cheers, On Wed, Sep 29, 2021 at 6:06 PM wrote: > Manuel; > > Reading through this mai

[ceph-users] Re: [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread David Orman
It appears when an updated container for 16.2.6 (there was a remoto version included with a bug in the first release) was pushed, the old one was removed from quay. We had to update our 16.2.6 clusters to the 'new' 16.2.6 version, and just did the typical upgrade with the image specified. This shou

[ceph-users] Re: prometheus - figure out which mgr (metrics endpoint) that is active

2021-09-29 Thread Ernesto Puerta
Hi Karsten, Endpoints returning no data shouldn't be an issue. If all endpoints are scraped under the same job, they'll only differ on the "instance" label. The "instance" label is being progressively removed from the ceph_* metric queries (as it only makes sense for node exporter ones). In the m

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Szabo, Istvan (Agoda)
Actually I don't have containerized deployment, my is normal one. So it should work the lvm migrate. Istvan Szabo Senior Infrastructure Engineer --- Agoda Services Co., Ltd. e: istvan.sz...@agoda.com

[ceph-users] reducing mon_initial_members

2021-09-29 Thread Rok Jaklič
Can I reduce mon_initial_members to one host after already being set to two hosts? ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: [16.2.6] When adding new host, cephadm deploys ceph image that no longer exists

2021-09-29 Thread Andrew Gunnerson
Thank you very much! The previous attempts at adding new hosts with the missing image seems to have left cephadm in a bad state. We restarted the mgrs and then did an upgrade to the same version using: ceph orch upgrade start --ceph-version 16.2.6 and that seems to have deployed new images w

[ceph-users] Re: osd_memory_target=level0 ?

2021-09-29 Thread Christian Wuerdig
Bluestore memory targets have nothing to do with spillover. It's already been said several times: The spillover warning is simply telling you that instead of writing data to your supposedly fast wal/blockdb device it's now hitting your slow device. You've stated previously that your fast device is

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-09-29 Thread Eugen Block
Yes, I believe for you it should work without containers although I haven't tried the migrate command in a non-containerized cluster yet. But I believe this is a general issue for containerized clusters with regards to maintenance. I haven't checked yet if there are existing tracker issues

[ceph-users] Re: osd marked down

2021-09-29 Thread Eugen Block
Is the content of OSD.3 still available in the filesystem? If the answer is yes you can get the OSD's keyring from /var/lib/ceph/osd/ceph-3/keyring Then update your osd.3.export file with the correct keyring and then import the correct back to ceph. Zitat von Abdelillah Asraoui : I must

[ceph-users] Re: New Ceph cluster in PRODUCTION

2021-09-29 Thread Eugen Block
Hi, there is no information about your ceph cluster, e. g. hdd/ssd/nvme disks. This information can be crucial with regards to performance. Also why would you use osd_pool_default_min_size = 1 osd_pool_default_size = 2 There have been endless discussions in this list why a pool size of 2

[ceph-users] Re: New Ceph cluster in PRODUCTION

2021-09-29 Thread Michel Niyoyita
Hello Eugen We planned to start with a small cluster with HDD disks and a Replicas of 2 . it will be consist of 6 hosts , 3 of them are MONS which will hold also 2 MGRs and remaining 3 are for OSDs . and will use VM for deployment . one of the OSD host will be used also as RGW client for Object st