[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-11-30 15:55, Oliver Weinmann wrote: > I have another error "pgs undersized", maybe this is also causing trouble? This is a result of the loss of one OSD, and the PGs located on it. As you only have 1 OSDs left, the cluster cannot recover on a third OSD (assuming defaults here). The cluste

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-12-01 Thread Kalle Happonen
Hi All, back to this. Dan, it seems we're following exactly in your footsteps. We recovered from our large pg_log, and got the cluster running. A week after our cluster was ok, we started seeing big memory increases again. I don't know if we had buffer_anon issues before or if our big pg_logs we

[ceph-users] OSD Metadata Imbalance

2020-12-01 Thread Paul Kramme
Hello, my cluster is currently showing a metadata imbalance. Normally, all OSDs have around 23GB metadata (META column), but 4 OSDs out of 56 have 34 GB metadata. Compacting reduces the data for some OSDs, but not for others. OSDs where the compaction worked quickly grow to back to 34GB. Our clus

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Oliver Weinmann
Hi Stefan, unfortunately It doesn't start. The failed osd (osd.0) is located on gedaopl02 [root@gedasvl02 ~]# ceph osd tree INFO:cephadm:Inferring fsid d0920c36-2368-11eb-a5de-005056b703af INFO:cephadm:Inferring config /var/lib/ceph/d0920c36-2368-11eb-a5de-005056b703af/mon.gedasvl02/config IN

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-12-01 Thread Kalle Happonen
Quick update, restarting OSDs is not enough for us to compact the db. So we stop the osd ceph-kvstore-tool bluestore-kv /var/lib/ceph/osd/ceph-$osd compact start the osd It seems to fix the spillover. Until it grows again. Cheers, Kalle - Original Message - > From: "Kalle Happonen" > To

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-12-01 10:21, Oliver Weinmann wrote: > Hi Stefan, > > unfortunately It doesn't start. > > The failed osd (osd.0) is located on gedaopl02 > > I can start the service but then after a minute or so it fails. Maybe > I'm looking at the wrong log file, but it's empty: Maybe it hits a timeout

[ceph-users] Re: osd_pglog memory hoarding - another case

2020-12-01 Thread Dan van der Ster
Hi Kalle, Thanks for the update. Unfortunately I haven't made any progress on understanding the root cause of this issue. (We are still tracking our mempools closely in grafana and in our case they are no longer exploding like in the incident.) Cheers, Dan On Tue, Dec 1, 2020 at 3:49 PM Kalle Ha

[ceph-users] add OSDs to cluster

2020-12-01 Thread mj
Hi, We are wondering why adding an OSD to a healthy cluster results in a (very small percentage of) "Degraded data redundancy". (0.020%) We understand a large percentage of misplaced objects (7.622%) But since we're adding an OSD to a HEALTH_OK cluster, there should really not be any degrade

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Oliver Weinmann
Yes, I deployed via cephadm on CentOS 7, it is using podman. The container doesn't even start up so I don't get a container id. But i checked journalctl -xe, and it seems that its trying to use a container name that still exists. -- Unit ceph-d0920c36-2368-11eb-a5de-005056b703af@osd.0.service

[ceph-users] reliability of rados_stat() function

2020-12-01 Thread Peter Lieven
Hi all, the rados_stat() function has a TODO in the comments: * TODO: when are these set, and by whom? can they be out of date? Can anyone help with this? How reliably is the pmtime updated? Is there a minimum update interval? Thank you, Peter __

[ceph-users] Re: osd out cant' bring it back online

2020-12-01 Thread Stefan Kooman
On 2020-12-01 13:19, Oliver Weinmann wrote: > > podman ps -a didn't show that container. So I googled and stumbled over > this post: > > https://github.com/containers/podman/issues/2553 > > I was able to fix it by running: > > podman rm --storage > e43f8533d6418267d7e6f3a408a566b4221df4fb51b13

[ceph-users] Upgrade to 15.2.7 fails on mixed x86_64/arm64 cluster

2020-12-01 Thread Bryan Stillwell
I tried upgrading my home cluster to 15.2.7 (from 15.2.5) today and it appears to be entering a loop when trying to match docker images for ceph:v15.2.7: 2020-12-01T16:47:26.761950-0700 mgr.aladdin.liknom [INF] Upgrade: Checking mgr daemons... 2020-12-01T16:47:26.769581-0700 mgr.aladdin.liknom [

[ceph-users] ceph in docker the log_file config is empty

2020-12-01 Thread goodluck
when I use kolla to deploy ceph in docker. I found there is no ceph logs output with osds and mons. But the mgr have the ceph logs output in the logfile. While I found the ceph log_file is empty even I set the log_file config in ceph.conf. ceph daemon /var/run/ceph/ceph-osd.0.asok config show|g

[ceph-users] Re: replace osd with Octopus

2020-12-01 Thread Tony Liu
Hi Frank, A dummy question, what's this all-to-all rebuild/copy? Is that PG remapping when the broken disk is taken out? In your case, does "shut the OSD down" mark OSD "out"? "rebuilt to full redundancy" took 2 hours (I assume there was PG remapping.)? What's the disk size? Regarding to your fu

[ceph-users] Determine effective min_alloc_size for a specific OSD

2020-12-01 Thread 胡 玮文
Hi all, I’ve read from this mail list that too high bluestore_min_alloc_size will result in too much space wasted if I have many small objects. But too low bluestore_min_alloc_size will reduce performance. I’ve also read that this config can’t be changed after OSD creation. Now I want to tune

[ceph-users] Re: ceph in docker the log_file config is empty

2020-12-01 Thread 胡 玮文
Hi, These config may be overridden by cmd arguments. Check with “ps -ef | grep ceph-osd”. Also check “docker logs “ for logs of OSDs and mons. > 在 2020年12月2日,09:28,goodluck 写道: > > when I use kolla to deploy ceph in docker. I found there is no ceph logs > output with osds and mons. But the m

[ceph-users] Re: ceph in docker the log_file config is empty

2020-12-01 Thread goodluck
Thanks for your reply. Yes all the osd logs output to the docker logs. How can I reset the osd logs output to the osd logfile? ps -ef|grep ceph-osd root 7399 7378 1 10:59 ?00:02:43 /usr/bin/ceph-osd -f -d --public-addr * --cluster-addr ** -i 2 --osd-journal /dev

[ceph-users] Re: Determine effective min_alloc_size for a specific OSD

2020-12-01 Thread Eugen Block
Hi, there're several ways to retrieve config information: osd-host:~ # ceph daemon osd.0 config show | grep bluestore_min_alloc_size "bluestore_min_alloc_size": "0", "bluestore_min_alloc_size_hdd": "65536", "bluestore_min_alloc_size_ssd": "4096", osd-host:~ # ceph daemon osd.0 confi