[ceph-users] Re: Ganesha NFS hangs on any rebalancing or degraded data redundancy

2021-10-13 Thread Eugen Block
Hi, what does your 'ceph osd tree' look like and which rules are in place for the affected pools? Can you provide more details about those pools like size, min_size, replicated or erasure-coded? The first thing coming to mind is min_size. For example, if you have six hosts and an erasure-co

[ceph-users] Re: Cluster down

2021-10-13 Thread Alex Gorbachev
Hi Jorge, This looks like a corosync problem to me. If corosync loses connectivity, the Proxmox nodes would fence and reboot. Ideally, you'd have a second ring on different switch(es), even a cheap 1Gb switch will do. -- Alex Gorbachev ISS - Storcium On Wed, Oct 13, 2021 at 7:07 AM Jorge JP

[ceph-users] Snap-schedule stopped working?

2021-10-13 Thread Kyriazis, George
Hello ceph-users, I am running Proxmox 7 with ceph 16.2.6 with 46 OSDs. I enabled snap_schedule about a month ago, and it seemed to be going fine, at least at the beginning. I’ve noticed, however, that snapshots stopped happening, as shown below: root@vis-mgmt:/ceph/backups/nassie/NAS/.snap# l

[ceph-users] Re: OSD's fail to start after power loss

2021-10-13 Thread Orbiting Code, Inc.
I have an update on the topic "OSD's fail to start after power loss". We have fixed the issue. After our last "apt upgrade" procedure about 90 days ago, the package python-pkg-resources was removed via "apt autoremove" after rebooting the OSD host. The command below shows that the module pkg_re

[ceph-users] Re: RGW pubsub deprecation

2021-10-13 Thread Dave Piper
Hi Yuval, We're using pubsub! We opted for pubsub over bucket notifications as the pull mode fits well with our requirements. 1) We want to be able to guarantee that our client (the external server) has received and processed each event. My initial understanding of bucket notifications was th

[ceph-users] Re: OSD's fail to start after power loss

2021-10-13 Thread DHilsbos
Todd; What version of ceph are you running? Are you running containers or packages? Was the cluster installed manually, or using a deployment tool? Logs provided are for osd ID 31, is ID 31 appropriate for that server? Have you verified that the ceph.conf on that server is intact, and correc

[ceph-users] Re: Cluster down

2021-10-13 Thread DHilsbos
Jorge; This sounds, to me, like something to discuss with the proxmox folks. Unless there was an IP conflict between the rebooted server, and one of the existing mons, I can't see the ceph cluster going unavailable. Further, I don't see where anything ceph related would cause hypervisors, on o

[ceph-users] Re: Adopting "unmanaged" OSDs into OSD service specification

2021-10-13 Thread David Orman
That's the exact situation we've found too. We'll add it to our backlog to investigate on the development side since it seems nobody else has run into this issue before. David On Wed, Oct 13, 2021 at 4:24 AM Luis Domingues wrote: > Hi, > > We have the same issue on our lab cluster. The only way

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-13 Thread Stefan Kooman
On 10/13/21 16:22, von Hoesslin, Volker wrote: okay, i did it :S if have run this command: cephfs-data-scan scan_links its ends in the next error, checkout attachment. i think i will re-deploy my complete ceph storage and recover my external backup-files... thx for help. It's the same asser

[ceph-users] Re: Do people still use LevelDBStore?

2021-10-13 Thread Casey Bodley
+1 from a dev's perspective. we don't test leveldb, and we don't expect it to perform as well as rocksdb in ceph, so i don't see any value in keeping it the rados team put a ton of effort into converting existing clusters to rocksdb, so i would be very surprised if removing leveldb left any users

[ceph-users] Re: Do people still use LevelDBStore?

2021-10-13 Thread Ken Dreyer
I think it's a great idea to remove it. - Ken On Wed, Oct 13, 2021 at 12:52 PM Adam C. Emerson wrote: > > Good day, > > Some time ago, the LevelDB maintainers turned -fno-rtti on in their > build. As we don't use -fno-rtti, building LevelDBStore > against newer LevelDB packages can fail. > > Thi

[ceph-users] Do people still use LevelDBStore?

2021-10-13 Thread Adam C. Emerson
Good day, Some time ago, the LevelDB maintainers turned -fno-rtti on in their build. As we don't use -fno-rtti, building LevelDBStore against newer LevelDB packages can fail. This has made me wonder, are there still people who use LevelDBStore and rely on it, or can we deprecate and/or remove it?

[ceph-users] Ganesha NFS hangs on any rebalancing or degraded data redundancy

2021-10-13 Thread Jeff Turmelle
We are using NFS-Ganesha to serve data from our Nautilus cluster to older clients. We recently had an OSD fail and the NFS server will not respond while we have degraded data redundancy. This also happens on the rare occasion when we have some lost objects on a PG. Is this a known issue and i

[ceph-users] Default policy for bucket creation

2021-10-13 Thread Dante F . B . Colò
Hello everyone , I'm not very experienced ceph user/administrator, i'm looking for some way to set a default policy for newly created buckets , i can set a policy for some user existing buckets, but i need this policy on bucket creation , is there anyway i can accomplish this ? Best Regards Dant

[ceph-users] Accessing Ceph storage from a Windows guest.

2021-10-13 Thread open infra
Hi, I have deployed Openstack with Ceph. To get better performance from a Windows guest do I need to have specific client-side configuration? My use-case is hundreds of Openstack Windows guests supposed access 2TB shared volume (with executable files) (multiattach) as drive D and each VM has 100

[ceph-users] Re: MDS: corrupted header/values: decode past end of struct encoding: Malformed input

2021-10-13 Thread von Hoesslin, Volker
okay, i did it :S if have run this command: cephfs-data-scan scan_links its ends in the next error, checkout attachment. i think i will re-deploy my complete ceph storage and recover my external backup-files... thx for help. volker. Von: Stefan Kooman Gesen

[ceph-users] Re: RFP for arm64 test nodes

2021-10-13 Thread Mark Nelson
There are a lot of advantages to going bare metal if you can make use of all of the cores.  It's sort of ironic that it's one of the things Ceph is fairly good at.  If you need more parallelism you can throw more OSDs at the problem.  Failure domain and general simplicity have always been the w

[ceph-users] OSD's fail to start after power loss

2021-10-13 Thread Orbiting Code, Inc.
Hello Everyone, I have 3 OSD hosts with 12 OSD's each. After a power failure on 1 host, all 12 OSD's fail to start on that host. The other 2 hosts did not lose power, and are functioning. Obviously I don't want to restart the working hosts at this time. Syslog shows: Oct 12 17:24:07 osd3 sys

[ceph-users] Re: Adopting "unmanaged" OSDs into OSD service specification

2021-10-13 Thread Luis Domingues
Hi, We have the same issue on our lab cluster. The only way I found to have the osds on the new specification was to drain, remove and re-add the host. The orchestrator was happy to recreate the osds under the good specification. But I do not think this is a good solution for production cluster

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Szabo, Istvan (Agoda)
Is it possible to extend the block.db lv of that specific osd with lvextend command or it needs some special bluestore extend? I want to extend that lv with the size of the spillover, compact it and migrate after. Istvan Szabo Senior Infrastructure Engineer --

[ceph-users] Re: Cluster down

2021-10-13 Thread Jorge JP
Hello Marc, For add node to ceph cluster with Proxmox first I have to install Proxmox hehe, this is not the problem. File configuration is revised and correct. I understand your words but not is problem of configuration. I can understand that cluster can have problems if any servers not config

[ceph-users] Datacenter migration: How to change cluster network.

2021-10-13 Thread mhnx
Hello. We're moving our Cluster to a different datacenter and I need to change the Cluster and Public network. Is there any procedure guide for doing this? I think I should follow these steps: 1- Power-on all nodes. 2- Do not start any Mon,Mgr,Mds,Osd. 3- Set up the old network ip's as vlan0 acc

[ceph-users] Re: Cluster down

2021-10-13 Thread Marc
> > We currently have a ceph cluster in Proxmox, with 5 ceph nodes with the > public and private network correctly configured and without problems. > The state of ceph was optimal. > > We had prepared a new server to add to the ceph cluster. We did the > first step of installing Proxmox with the

[ceph-users] Cluster down

2021-10-13 Thread Jorge JP
Hello, We currently have a ceph cluster in Proxmox, with 5 ceph nodes with the public and private network correctly configured and without problems. The state of ceph was optimal. We had prepared a new server to add to the ceph cluster. We did the first step of installing Proxmox with the same

[ceph-users] Re: Multisite Pubsub - Duplicates Growing Uncontrollably

2021-10-13 Thread Yuval Lifshitz
Hi Alex, How many overall zones do you have configured in the system? We have an issue with pubsub based notifications, where we may get as many as (#zone-1) duplicates per object. This, however, won't explain 13 events per object. Did you verify that these are indeed the same events? For the same

[ceph-users] Re: is it possible to remove the db+wal from an external device (nvme)

2021-10-13 Thread Igor Fedotov
Yes. For DB volume expanding underlying device/lv should be enough... -- Igor Fedotov Ceph Lead Developer Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH, Freseniusstr. 31h, 81247 Munich CEO: Martin Verges - VAT-ID: DE310638492 Com. register: Amtsgericht Munic