On 1/26/2022 1:18 AM, Sebastian Mazza wrote:
Hey Igor,

thank you for your response!

Do you suggest to disable the HDD write-caching and / or the bluefs_buffered_io 
for productive clusters?

Generally upstream recommendation is to disable disk write caching, there were 
multiple complains it might negatively impact the performance in some setups.

As for bluefs_buffered_io - please keep it on, the disablmement is known to 
cause performance drop.
Thanks for the explanation. For the enabled disk write cache you only mentioned 
possible performance problem, but can the enabled disk write cache also lead to 
data corruption? Or make a problem more likely than with a disabled disk cache?

Definitely it can, particularly if cache isn't protected  from power loss or the implementation isn't so good ;)


When rebooting a node  - did you perform it by regular OS command (reboot or 
poweroff) or by a power switch?
I never did a hard reset or used the power switch. I used `init 6` for 
performing a reboot. Each server has redundant power supplies with one 
connected to a battery backup and the other to the grid. Therefore, I do think 
that none of the servers ever faced a non clean shutdown or reboot.

So the original reboot which caused the failures was made in the same manner, 
right?
Yes, Exactly.
And the OSD logs confirms that:

OSD 4:
2021-12-12T21:33:07.780+0100 7f464a944700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal 
Terminated ***
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1608
:...
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002145.sst
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 
bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db:


OSD 7:
2021-12-12T21:20:11.141+0100 7f9714894700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1937
:...
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 
bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db:


OSD 8:
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1938
:...
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 
bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db:



Best regards,
Sebastian


--
Igor Fedotov
Ceph Lead Developer

Looking for help with your Ceph cluster? Contact us at https://croit.io

croit GmbH, Freseniusstr. 31h, 81247 Munich
CEO: Martin Verges - VAT-ID: DE310638492
Com. register: Amtsgericht Munich HRB 231263
Web: https://croit.io | YouTube: https://goo.gl/PGE1Bx

_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to