[ceph-users] Re: one cephfs volume becomes very slow

2023-11-10 Thread Eugen Block
Did you check how many caps the clients are using? The thread you refer to contains some instructions. Zitat von Ben : checked that disk utilization had been normal during the incident. cephfs slow performance could not seemingly be attributed to osd. All HDD, no ssd in fact. I found this t

[ceph-users] Re: one cephfs volume becomes very slow

2023-11-10 Thread Ben
checked that disk utilization had been normal during the incident. cephfs slow performance could not seemingly be attributed to osd. All HDD, no ssd in fact. I found this thread related: https://lists.ceph.io/hyperkitty/list/ceph-users@ceph.io/thread/B7K6B5VXM3I7TODM4GRF3N7S254O5ETY/ Does it ha

[ceph-users] Re: ceph fs (meta) data inconsistent

2023-11-10 Thread Frank Schilder
>>> It looks like the cap update request was dropped to the ground in MDS. >>> [...] >>> If you can reproduce it, then please provide the mds logs by setting: >>> [...] >> I can do a test with MDS logs on high level. Before I do that, looking at >> the python >> findings above, is this something t

[ceph-users] Re: Permanent KeyError: 'TYPE' ->17.2.7: return self.blkid_api['TYPE'] == 'part'

2023-11-10 Thread Sascha Lucas
Hi, On Wed, 8 Nov 2023, Sascha Lucas wrote: On Tue, 7 Nov 2023, Harry G Coin wrote: "/usr/lib/python3.6/site-packages/ceph_volume/util/device.py", line 482, in is_partition /usr/bin/docker: stderr return self.blkid_api['TYPE'] == 'part' /usr/bin/docker: stderr KeyError: 'TYPE' Problem

[ceph-users] Re: Redeploy ceph orch OSDs after reboot, but don't mark as 'unmanaged'

2023-11-10 Thread Eugen Block
Hi, Indeed! It says "osd" for all the unmanaged OSDs. When I change it to the name of my managed service and restart the daemon, it shows up in ceph orch ps --service-name. I checked whether cephadm deploy perhaps has an undocumented flag for setting the service name, but couldn't find an

[ceph-users] Re: IO stalls when primary OSD device blocks in 17.2.6

2023-11-10 Thread David C.
Hi Daniel, it's perfectly normal for a PG to freeze when the primary osd is not stable. It can sometimes happen that the disk fails but doesn't immediately send back I/O errors (which crash the osd). When the OSD is stopped, there's a 5-minute delay before it goes down in the crushmap. Le ve

[ceph-users] IO stalls when primary OSD device blocks in 17.2.6

2023-11-10 Thread Daniel Schreiber
Dear cephers, we are sometimes observing stalling IO on our ceph 17.2.6 cluster when the backing device for the primary OSD of a PG fails and seems to block read IO to objects from that pg. If I set the OSD with the broken device to down, the IO continues. Setting the OSD to down is not suffic

[ceph-users] Re: Help needed with Grafana password

2023-11-10 Thread Sake Ceph
Thank you Eugen! This worked :) > Op 09-11-2023 14:55 CET schreef Eugen Block : > > > It's the '#' character, everything after (including '#' itself) is cut > off. I tried with single and double quotes which also failed. But as I > already said, use a simple password and then change it with