[ceph-users] All pgs unknown

2023-01-29 Thread Daniel Brunner
Hi, my ceph cluster started to show HEALTH_WARN, there are no healthy pgs left, all are unknown, but it seems my cephfs is still readable, how to investigate this any further? $ sudo ceph -s cluster: id: ddb7ebd8-65b5-11ed-84d7-22aca0408523 health: HEALTH_WARN failed to

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
$ sudo ceph osd pool set cephfs_data pg_num 16 $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 Am Fr., 2. Dez. 2022 um 14:30 Uhr schrieb Anthony D'Atri < anthony.da...@gmail.com>: > Could be that you’re fighting with the autoscaler? > > > On Dec 2, 2022, at 4:58 AM

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Can I get rid of PGs after trying to decrease the number on the pool again? Doing a backup and nuking the cluster seems a little too much work for me :) $ sudo ceph osd pool get cephfs_data pg_num pg_num: 128 $ sudo ceph osd pool set cephfs_data pg_num 16 $ sudo ceph osd pool get cephfs_data pg_

[ceph-users] Re: OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
-rss:0kB, shmem-rss:0kB [ +1.042284] oom_reaper: reaped process 3061175 (ceph-osd), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB Am Fr., 2. Dez. 2022 um 09:47 Uhr schrieb Daniel Brunner : > Hi, > > my OSDs are running odroid-hc4's and they only have about 4GB of memory, > and e

[ceph-users] OSDs do not respect my memory tune limit

2022-12-02 Thread Daniel Brunner
Hi, my OSDs are running odroid-hc4's and they only have about 4GB of memory, and every 10 minutes a random OSD crashes due to out of memory. Sadly the whole machine gets unresponsive when the memory gets completely full, so no ssh access or prometheus output in the meantime. After the osd success

[ceph-users] Re: lost all monitors at the same time

2022-11-16 Thread Daniel Brunner
hi once again, what does this error message "e18 handle_auth_request failed to assign global_id" actually mean? is this an indication that I need to manually rebuild the monmap from the OSDs? thanks, daniel Am Mi., 16. Nov. 2022 um 00:39 Uhr schrieb Daniel Brunner : > Hi, >

[ceph-users] lost all monitors at the same time

2022-11-15 Thread Daniel Brunner
Hi, I accidently lost power on all nodes at the same time and now my monitors do not get in quorum anymore. I cannot mount my cephfs anymore, I cannot see the status anymore, seems like all data is lost. I used cephadm to deploy all nodes, I only see the monitor docker container on 2 hosts, and b

[ceph-users] Mails not getting through?

2022-11-15 Thread Daniel Brunner
Hi, are my mails not getting through? is anyone receiving my emails? best regards, daniel ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Lots of OSDs with failed asserts

2022-11-02 Thread Daniel Brunner
Hi, more and more OSDs now crash all the time and I've lost more OSDs than my replication allows, all my data is currently down or inactive. Can somebody help me fix those asserts and get them up again (so i can start my distaster recovery backup)? $ sudo /usr/bin/ceph-osd -f --cluster ceph --id

[ceph-users] OSD crashes

2022-10-27 Thread Daniel Brunner
Hi, I noticed one my OSDs keeps crashing even when ran manually, this is my homelab and nothing too critical is going on my cluster, but I'd like to know what's the issue. I am running on archlinux arm (aarch64 on an odroid-hc4) and compiled everything ceph related myself, ceph version 17.2.4 (13