[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

Igor Fedotov Mon, 24 Jan 2022 05:43:05 -0800

Hey Sebastian,

thanks a lot for the update, please see more questions inline.



Thanks,

Igor

On 1/22/2022 2:13 AM, Sebastian Mazza wrote:

Hey Igor,

thank you for your response and your suggestions.

I've tried to simulate every imaginable load that the cluster might have done 
before the three OSD crashed.
I rebooted the servers many times while the Custer was under load. If more than 
a single node was rebooted at the same time, the client hangs until enough 
servers are up again. Which is perfectly fine!
I really tried hard to crash it, but I failed. Wich is excellent in general, 
but unfortunately not helpful for finding the root cause of the problem with 
the corrupted Rocks DBs.

And you haven't made any environsment/config changes, e.g. disk caching 
disablement, since the last issue, right?

There is an environmental change, since I’m currently missing one of my two 
ethernet switches for the cluster. The switches (should) provide a MLAG for 
every server, so every server uses a linux interface bond that is connected 
with one cable to each switch. However, one of the switches is currently for 
RMA because it sporadically failed to (re)boot. I did not change anything at 
the network config of the server, but of corse the linux bond driver is 
currently not able to balance the network traffic across two link, since only 
one is active. Could this have an influence?
Except from disconnecting half of the network cables I did not change anything. 
Alle the HDDs are the same and are inserted into the same drive bays.

Configuration wise I’m not aware of any change. I did only destroy and recreate 
the 3 failed OSDs.

I did now checke the write cache settings of all HDDs by `hdparm -W /dev/sdX` 
which always returns “write-caching =  1 (on)”.
I did also check the OSD setting “bluefs_buffered_io” by `ceph daemon osd.X 
config show | grep bluefs_buffered_io` which returned true for all OSDs.
I’m pretty sure that all this caches was always on.


Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io 
for productive clusters?

Generally upstream recommendation is to disable disk write caching,there were multiple complains it might negatively impact the performancein some setups.

As for bluefs_buffered_io - please keep it on, the disablmement is knownto cause performance drop.

When rebooting a node  - did you perform it by regular OS command (reboot or 
poweroff) or by a power switch?

I never did a hard reset or used the power switch. I used `init 6` for 
performing a reboot. Each server has redundant power supplies with one 
connected to a battery backup and the other to the grid. Therefore, I do think 
that none of the servers ever faced a non clean shutdown or reboot.

So the original reboot which caused the failures was made in the samemanner, right?

Best regards,
Sebastian


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Re: 3 OSDs can not be started after a server reboot - rocksdb Corruption

Reply via email to