[ceph-users] Re: MDS restarts after enabling msgr2

2020-10-30 Thread Dan van der Ster
Hi Stefan, Yeah, the same happened to us on around half of our clusters. I didn't ever find the pattern what triggered it or not on the different clusters: https://tracker.ceph.com/issues/43596 Cheers, Dan On Fri, Oct 30, 2020 at 1:23 AM Stefan Kooman wrote: > > Hi List, > > After a successful

[ceph-users] Re: Corrupted RBD image

2020-10-30 Thread Wido den Hollander
On 30/10/2020 06:09, Ing. Luis Felipe Domínguez Vega wrote: Hi: I tried get info from a RBD image but: - root@fond-beagle:/# rbd list --pool cinder-ceph | grep volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda volume-dfcca6c8

[ceph-users] Re: Fix PGs states

2020-10-30 Thread Wido den Hollander
On 30/10/2020 05:20, Ing. Luis Felipe Domínguez Vega wrote: Great and thanks, i fixed all unknowns with the command, now left the incomplete, down, etc. Start with a query: $ ceph pg query That will tell you why it's down and incomplete. The force-create-pg has probably corrupted and de

[ceph-users] Re: monitor sst files continue growing

2020-10-30 Thread Wido den Hollander
On 29/10/2020 19:29, Zhenshi Zhou wrote: Hi Alex, We found that there were a huge number of keys in the "logm" and "osdmap" table while using ceph-monstore-tool. I think that could be the root cause. But that is exactly how Ceph works. It might need that very old OSDMap to get all the PGs

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Dan van der Ster
Hi, You said you dropped caches -- can you try again echo 3 > /proc/sys/vm/drop_caches ? Otherwise, does umount then mount from one of the clients clear the warning? (I don't believe this is due to a "busy client", but rather a kernel client bug where it doesn't release caps in some cases -- we'

[ceph-users] Re: bluefs mount failed(crash) after a long time

2020-10-30 Thread Igor Fedotov
Hi Elians, you might want to create a ticket in Ceph bug tracker and attach failing OSD startup log with debug-bluefs set to 20. It can be pretty large though... Also wondering what did precede to the first failure - may be unexpected  shutdown or something else? Thanks, Igor On 10/30/2

[ceph-users] Re: Not all OSDs in rack marked as down when the rack fails

2020-10-30 Thread Wido den Hollander
On 29/10/2020 18:58, Dan van der Ster wrote: Hi Wido, Could it be one of these? mon osd min up ratio mon osd min in ratio 36/120 is 0.3 so it might be one of those magic ratios at play. I thought of those settings and looked at it. The weird thing is that all 3 racks are equal and it work

[ceph-users] Re: Corrupted RBD image

2020-10-30 Thread Ing . Luis Felipe Domínguez Vega
root@fond-beagle:/home/ubuntu# rbd info --debug-rados=20 --pool cinder-ceph volume-dfcca6c8-cb96-4b79-bc85-b200a061dcda 2020-10-30T06:56:11.567-0400 7ff7302b8380 1 librados: starting msgr at 2020-10-30T06:56:11.567-0400 7ff7302b8380 1 librados: starting objecter 2020-10-30T06:56:11.567-0400 7ff

[ceph-users] Re: frequent Monitor down

2020-10-30 Thread Frank Schilder
I remember exactly this discussion some time ago, where one of the developers gave some more subtle reasons for not using even numbers. The maths sounds simple, with 4 mons you can tolerate the loss of 1, just like with 3 mons. The added benefit seems to be the extra copy of a mon. However, the

[ceph-users] Mon crashes when adding 4th OSD

2020-10-30 Thread Lalit Maganti
(sending this email again as the first time was blocked because my attached log file was too big) Hi all, *Context*: I'm running Ceph Octopus 15.2.5 (the latest as of this email) using Rook on a toy Kubernetes cluster of two nodes. I've got a single Ceph mon node running perfectly with 3 OSDs . T

[ceph-users] Re: Corrupted RBD image

2020-10-30 Thread Jason Dillaman
If the remove command is interrupted after it deletes the data and image header but before it deletes the image listing in the directory, this can occur. If you run "rbd rm " again (assuming it was your intent), it should take care of removing the directory listing entry. On Fri, Oct 30, 2020 at 6

[ceph-users] MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Frank Schilder
Dear cephers, I have a somewhat strange situation. I have the health warning: # ceph health detail HEALTH_WARN 3 clients failing to respond to capability release MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability release mdsceph-12(mds.0): Client sn106.hpc.ait.dtu.dk:con-fs2-h

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Frank Schilder
umount + mount worked. Thanks! Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Dan van der Ster Sent: 30 October 2020 10:22:38 To: Frank Schilder Cc: ceph-users Subject: Re: [ceph-users] MDS_CLIENT_LATE_RELEASE: 3

[ceph-users] RBD low iops with 4k object size

2020-10-30 Thread w1kl4s
Hello! I'm currently in process of playing around with ceph, and before migrating my servers to use it i wanted to benchmark it to have some idea what expected performance it might achieve. I am using 2 400GB Intel S3700 ssds with crush rule created as follows: ➜  ~ ceph osd crush rule create-rep

[ceph-users] Re: Fix PGs states

2020-10-30 Thread DHilsbos
This line is telling: 1 osds down This is likely the cause of everything else. Why is one of your OSDs down? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perform Air International, Inc. dhils...@performair.com www.PerformAir.com -Original Message---

[ceph-users] Re: MDS_CLIENT_LATE_RELEASE: 3 clients failing to respond to capability release

2020-10-30 Thread Patrick Donnelly
On Fri, Oct 30, 2020 at 2:13 AM Frank Schilder wrote: > > Dear cephers, > > I have a somewhat strange situation. I have the health warning: > > # ceph health detail > HEALTH_WARN 3 clients failing to respond to capability release > MDS_CLIENT_LATE_RELEASE 3 clients failing to respond to capability

[ceph-users] Re: Fix PGs states

2020-10-30 Thread Ing . Luis Felipe Domínguez Vega
https://pastebin.ubuntu.com/p/tHSpzWp8Cx/ El 2020-10-30 11:47, dhils...@performair.com escribió: This line is telling: 1 osds down This is likely the cause of everything else. Why is one of your OSDs down? Thank you, Dominic L. Hilsbos, MBA Director - Information Technology Perfo

[ceph-users] Re: OSD down, how to reconstruct it from its main and block.db parts ?

2020-10-30 Thread Wladimir Mutel
Dear Mr. Caro, here are the next steps I decided to take on my discretion : 1. running "ceph-volume lvm list" , I discovered the following section about my non-activating osd.1 : == osd.1 === [block] /dev/ceph-e53b65ba-5eb0-44f5-9160-a2328f787a0f/osd-block-8c6324a3-0364-4fad-9d

[ceph-users] Re: Very high read IO during backfilling

2020-10-30 Thread Frank Schilder
Are you a victim of bluefs_buffered_io=false: https://www.mail-archive.com/ceph-users@ceph.io/msg05550.html ? Best regards, = Frank Schilder AIT Risø Campus Bygning 109, rum S14 From: Kamil Szczygieł Sent: 27 October 2020 21:39:22 To: cep