[ceph-users] Re: libceph: mds1 IP+PORT wrong peer at address

2023-09-20 Thread Frank Schilder
Hi, in our case the problem was on the client side. When you write "logs from a host", do you mean an OSD host or a host where client connections come from? Its not clear from your problem description *who* is requesting the wrong peer. The reason for this message is that something tries to talk

[ceph-users] Re: S3website range requests - possible issue

2023-09-20 Thread Ondřej Kukla
I was checking the tracker again and I found already fixed issue that seems to be connected with this issue. https://tracker.ceph.com/issues/44508 Here is the PR that fixes it https://github.com/ceph/ceph/pull/33807 What I’m still not understanding is why this is only happening when using s3we

[ceph-users] Re: Ceph MDS OOM in combination with 6.5.1 kernel client

2023-09-20 Thread Mark Nelson
Hi Stefan, Can you tell if the memory being used is due to the cache not being trimmed fast enough or something else? You might want to try to see if you can track down if the 6.5.1 client isn't releasing CAPS properly or something. Dan Van der Ster might have some insight here as well. Ma

[ceph-users] Re: CephFS warning: clients laggy due to laggy OSDs

2023-09-20 Thread Venky Shankar
Hi Janek, On Tue, Sep 19, 2023 at 4:44 PM Janek Bevendorff < janek.bevendo...@uni-weimar.de> wrote: > Hi Venky, > > As I said: There are no laggy OSDs. The maximum ping I have for any OSD in > ceph osd perf is around 60ms (just a handful, probably aging disks). The > vast majority of OSDs have pi

[ceph-users] Re: CephFS warning: clients laggy due to laggy OSDs

2023-09-20 Thread Venky Shankar
Hey Janek, I took a closer look at various places where the MDS would consider a client as laggy and it seems like a wide variety of reasons are taken into consideration and not all of them might be a reason to defer client eviction, so the warning is a bit misleading. I'll post a PR for this. In

[ceph-users] Re: cephfs mount 'stalls'

2023-09-20 Thread Marc
> > William, this is fuse client, not the kernel > > Mark, you can use kernel client. Stock c7 or install, for example, kernel- > ml from ELrepo [1], and use the latest krbd version > > I think I had to move to the fuse because with one of the latest releases of luminous, I was getting issues

[ceph-users] Re: CephFS warning: clients laggy due to laggy OSDs

2023-09-20 Thread Dhairya Parmar
Hi Janek, The PR venky mentioned makes use of OSD's laggy parameters (laggy_interval and laggy_probability) to find if any OSD is laggy or not. These laggy parameters can reset to 0 if the interval between the last modification done to OSDMap and the time stamp when OSD was marked down exceeds the

[ceph-users] Re: Clients failing to respond to capability release

2023-09-20 Thread Tim Bishop
Hi Stefan, On Wed, Sep 20, 2023 at 11:00:12AM +0200, Stefan Kooman wrote: > On 19-09-2023 13:35, Tim Bishop wrote: > > The Ceph cluster is running Pacific 16.2.13 on Ubuntu 20.04. Almost all > > clients are working fine, with the exception of our backup server. This > > is using the kernel CephFS

[ceph-users] Re: S3website range requests - possible issue

2023-09-20 Thread Ondřej Kukla
When checking the RGW logs I can confirm that it is in fact the same issue as the one in the issue. 2023-09-20T12:52:06.670+ 7f216d702700 1 -- xxx.xxx.58.15:0/758879303 --> [v2:xxx.xxx.58.2:6816/8556,v1:xxx.xxx.58.2:6817/8556] -- osd_op(unknown.0.0:238 18.651 18:8a75a7b2:::39078a70-7768-48

[ceph-users] millions of hex 80 0_0000 omap keys in single index shard for single bucket

2023-09-20 Thread Christopher Durham
I am using ceph 17.2.6 on Rocky 8. I have a system that started giving me large omap object warnings. I tracked this down to a specific index shard for a single s3 bucket. rados -p listomapkeys .dir..bucketid.nn.shardid shows over 3 million keys for that shard. There are only about 2 million obj

[ceph-users] Re: S3website range requests - possible issue

2023-09-20 Thread Ondřej Kukla
I was checking the tracker again and I found already fixed issue that seems to be connected with this issue. https://tracker.ceph.com/issues/44508 Here is the PR that fixes it https://github.com/ceph/ceph/pull/33807 What I’m still not understanding is why this is only happening when using s3we

[ceph-users] After upgrading from 17.2.6 to 18.2.0, OSDs are very frequently restarting due to livenessprobe failures

2023-09-20 Thread sbengeri
Since upgrading to 18.2.0 , OSDs are very frequently restarting due to livenessprobe failures making the cluster unusable. Has anyone else seen this behavior? Upgrade path: ceph 17.2.6 to 18.2.0 (and rook from 1.11.9 to 1.12.1) on ubuntu 20.04 kernel 5.15.0-79-generic Thanks. _

[ceph-users] Re: millions of hex 80 0_0000 omap keys in single index shard for single bucket

2023-09-20 Thread Casey Bodley
these keys starting with "<80>0_" appear to be replication log entries for multisite. can you confirm that this is a multisite setup? is the 'bucket sync status' mostly caught up on each zone? in a healthy multisite configuration, these log entries would eventually get trimmed automatically On Wed

[ceph-users] Error adding OSD

2023-09-20 Thread Budai Laszlo
Hi all, I am trying to add an OSD using cephadm but it fails with the message found below. Do you have any ide what may be wrong? The given device used to be in the cluster but it has been removed, and now the device appears as available in the `ceph orch device ls`. Thank you, Laszlo root@m