[ceph-users] Re: upgrade problem nautilus 14.2.15 -> 14.2.18? (Broken ceph!)

2021-04-09 Thread Simon Oosthoek
On 25/03/2021 21:08, Simon Oosthoek wrote: > > I'll wait a bit before upgrading the remaining nodes. I hope 14.2.19 > will be available quickly. > Hi Dan, Just FYI, I upgraded the cluster this week to 14.2.19 and all systems are good now. I've removed the workaround configuration in the /etc/ce

[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-09 Thread mabi
Thank you for confirming that podman 3.0.1 is fine. I have now bootstrapped my first node with cephadm bootstrap command on my RasPi 4 (8GB RAM) with Ubuntu 20.04 LTS that worked well but in the logs I can see that it fails to deploy the graphana container as you can see from the log below: Tr

[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-09 Thread David Orman
That container (ceph-grafana) is not built for ARM based processors, only AMD64: https://hub.docker.com/r/ceph/ceph-grafana/tags?page=1&ordering=last_updated . You'll probably need to disable that (I think it's part of the dashboard module - I don't know - we run our own Prometheus/Grafana infrast

[ceph-users] Re: Abandon incomplete (damaged EC) pgs - How to manage the impact on cephfs?

2021-04-09 Thread Michael Thomas
Hi Joshua, I'll dig into this output a bit more later, but here are my thoughts right now. I'll preface this by saying that I've never had to clean up from unrecoverable incomplete PGs, so some of what I suggest may not work/apply or be the ideal fix in your case. Correct me if I'm wrong, b

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
I'm attempting to deep scrub all the PGs to see if that helps clear up some accounting issues, but that's going to take a really long time on 2PB of data. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Thu, Apr 8, 2021 at 9:48 PM Robert LeBlan

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 9:25 AM Stefan Kooman wrote: > Are you running with 1 mon now? Have you tried adding mons from scratch? > So with a fresh database? And then maybe after they have joined, kill > the donor mon and start from scratch. > > You have for sure not missed a step during the upgrade

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
The only step not yet taken was to move to straw2. That was the last step we were going to do next. Robert LeBlanc PGP Fingerprint 79A2 9CA4 6CC4 45DD A904 C70E E654 3BB2 FA62 B9F1 On Fri, Apr 9, 2021 at 10:41 AM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 9:25 AM Stefan Ko

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
Hi Robert, Have you checked a log with debug_mon=20 yet to try to see what it's doing? .. Dan On Fri, Apr 9, 2021, 7:02 PM Robert LeBlanc wrote: > The only step not yet taken was to move to straw2. That was the last > step we were going to do next. > > Robert LeBlanc > PGP F

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster wrote: > > Hi Robert, > > Have you checked a log with debug_mon=20 yet to try to see what it's doing? > I've posted the logs with debug_mon=20 for a period during high CPU here https://owncloud.leblancnet.us/owncloud/index.php/s/OtHsBAYN9r5eSbU You

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 7:24 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 11:05 AM Dan van der Ster wrote: > > > > Hi Robert, > > > > Have you checked a log with debug_mon=20 yet to try to see what it's doing? > > > I've posted the logs with debug_mon=20 for a period during high CPU > here

[ceph-users] Re: Version of podman for Ceph 15.2.10

2021-04-09 Thread mabi
I see, I guess I can live without grafana anyway. What would be the "cephadm bootstrap" command or parameter in order to skip installing grafana? And now that it is already installed which command can I use to disable it? It is trying to get deployed every 10 minutes... ‐‐‐ Original Messa

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster wrote: > > Thanks. I didn't see anything ultra obvious to me. > > But I did notice the nearfull warnings so I wonder if this cluster is > churning through osdmaps? Did you see a large increase in inbound or > outbound network traffic on this mon fol

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Stefan Kooman
On 4/9/21 3:40 PM, Robert LeBlanc wrote: I'm attempting to deep scrub all the PGs to see if that helps clear up some accounting issues, but that's going to take a really long time on 2PB of data. Are you running with 1 mon now? Have you tried adding mons from scratch? So with a fresh database?

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster wrote: > > > > Thanks. I didn't see anything ultra obvious to me. > > > > But I did notice the nearfull warnings so I wonder if this cluster is > > churning through osdmaps? Did you see a lar

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster > > wrote: > > > > > > Thanks. I didn't see anything ultra obvious to me. > > > > > > But I did notice the nearfull warnings s

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 2:04 PM Dan van der Ster wrote: > > On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc wrote: > > > > > > On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster > > > wrote: > > > > > > > > Thanks. I didn't see anything

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Dan van der Ster
On Fri, Apr 9, 2021 at 11:50 PM Robert LeBlanc wrote: > > On Fri, Apr 9, 2021 at 2:04 PM Dan van der Ster wrote: > > > > On Fri, Apr 9, 2021 at 9:37 PM Dan van der Ster wrote: > > > > > > On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc > > > wrote: > > > > > > > > On Fri, Apr 9, 2021 at 11:49 AM

[ceph-users] working ansible based crush map?

2021-04-09 Thread Philip Brown
I'm trying to follow the directions in ceph-ansible for having it automatically set up the crush map. I've also looked at https://access.redhat.com/documentation/en-us/red_hat_ceph_storage/4/html/installation_guide/installing-red-hat-ceph-storage-using-ansible But.. my setup isnt working. ansibl

[ceph-users] Re: working ansible based crush map?

2021-04-09 Thread Philip Brown
Ooops. I found the problem with duplicates.. .. there were actually duplicates elsewhere, that were presenting as if they were in this section. So on the one hand, i got rid of the runtime errors. But on the other hand... this isnt actually modding the crush map for me? the "chassis: hostA" doesnt

[ceph-users] Re: working ansible based crush map?

2021-04-09 Thread Philip Brown
AAAND final update: problem fixed. I had enabled create_crush_tree: true in group_vars/osd.yml but I had neglected to ALSO set crush_rule_config: true So now it's all happy ___ ceph-users mailing list -- ceph-users@ceph.io To unsubscribe send an ema

[ceph-users] Re: Nautilus 14.2.19 mon 100% CPU

2021-04-09 Thread Robert LeBlanc
On Fri, Apr 9, 2021 at 4:04 PM Dan van der Ster wrote: > > Here's what you should look for, with debug_mon=10. It shows clearly > that it takes the mon 23 seconds to run through > get_removed_snaps_range. > So if this is happening every 30s, it explains at least part of why > this mon is busy. > >