[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-18 Thread Marco Pizzolo
Hi Everyone, Update on this. 5.4 kernel wasn't working well for us and we had to reinstall the HWE and 5.11 kernel. We can now get all OSDs more or less up, but on a clean OS reinstall we are seeing this type of behavior that is causing slow ops even before any pool and filesystem has been create

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Zakhar Kirpichenko
Indeed, this is the PVE forum post I saw earlier. /Z On Tue, Oct 12, 2021 at 9:27 PM Marco Pizzolo wrote: > Igor, > > Thanks for the response. One that I found was: > https://forum.proxmox.com/threads/pve-7-0-bug-kernel-null-pointer-dereference-address-00c0-pf-error_code-0x-no-

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Marco Pizzolo
Igor, Thanks for the response. One that I found was: https://forum.proxmox.com/threads/pve-7-0-bug-kernel-null-pointer-dereference-address-00c0-pf-error_code-0x-no-web-access-no-ssh.96598/ In regards to your questions, this is a new cluster deployed at 16.2.6. It currently has l

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Igor Fedotov
Zakhar, could you please point me to the similar reports at Proxmox forum? Curious what's the Ceph release mentioned there... Thanks, Igor On 10/12/2021 8:53 PM, Zakhar Kirpichenko wrote: Hi, This could be kernel-related, as I've seen similar reports in Proxmox forum. Specifically, 5.11.x w

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Igor Fedotov
FYI: telemetry reports that triggered the above-mentioned ticket creation indicate kernel v4.18... "utsname_release": "4.18.0-305.10.2.el8_4.x86_64" On 10/12/2021 8:53 PM, Zakhar Kirpichenko wrote: Hi, This could be kernel-related, as I've seen similar reports in Proxmox forum. Specifically,

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Zakhar Kirpichenko
Can't say much about kernel 5.4 and connectx-6, as we have no experience with this combination. 5.4 + connectx-5 works well though :-) / Z On Tue, Oct 12, 2021 at 9:06 PM Marco Pizzolo wrote: > Hi Zakhar, > > Thanks for the quick response. I was coming across some of those Proxmox > forum post

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Igor Fedotov
Hi Marco, this reminds me the following ticket: https://tracker.ceph.com/issues/52234 Unfortunately that's all we have so far about that issue. Could you please answer some questions: 1) Is this a new or upgraded cluster? 2) If you upgraded it - what was the previous Ceph versionĀ  and did y

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Marco Pizzolo
Hi Zakhar, Thanks for the quick response. I was coming across some of those Proxmox forum posts as well. I'm not sure if going to the 5.4 kernel will create any other challenges for us, as we're using dual port mellanox connectx-6 200G nics in the hosts, but it is definitely something we can try

[ceph-users] Re: OSD Crashes in 16.2.6

2021-10-12 Thread Zakhar Kirpichenko
Hi, This could be kernel-related, as I've seen similar reports in Proxmox forum. Specifically, 5.11.x with Ceph seems to be hitting kernel NULL pointer dereference. Perhaps a newer kernel would help. If not, I'm running 16.2.6 with kernel 5.4.x without any issues. Best regards, Z On Tue, Oct 12,