Hey Igor,

thank you for your response!

>> 
>> Do you suggest to disable the HDD write-caching and / or the 
>> bluefs_buffered_io for productive clusters?
>> 
> Generally upstream recommendation is to disable disk write caching, there 
> were multiple complains it might negatively impact the performance in some 
> setups.
> 
> As for bluefs_buffered_io - please keep it on, the disablmement is known to 
> cause performance drop.

Thanks for the explanation. For the enabled disk write cache you only mentioned 
possible performance problem, but can the enabled disk write cache also lead to 
data corruption? Or make a problem more likely than with a disabled disk cache?

> 
>> 
>>> When rebooting a node  - did you perform it by regular OS command (reboot 
>>> or poweroff) or by a power switch?
>> I never did a hard reset or used the power switch. I used `init 6` for 
>> performing a reboot. Each server has redundant power supplies with one 
>> connected to a battery backup and the other to the grid. Therefore, I do 
>> think that none of the servers ever faced a non clean shutdown or reboot.
>> 
> So the original reboot which caused the failures was made in the same manner, 
> right?

Yes, Exactly.
And the OSD logs confirms that:

OSD 4:
2021-12-12T21:33:07.780+0100 7f464a944700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Got signal 
Terminated ***
2021-12-12T21:33:07.780+0100 7f464a944700 -1 osd.4 2606 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:35:29.918+0100 7ffa5ce42f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1608
:...
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002145.sst
2021-12-12T21:35:32.509+0100 7ffa5ce42f00 -1 
bluestore(/var/lib/ceph/osd/ceph-4) _open_db erroring opening db: 


OSD 7:
2021-12-12T21:20:11.141+0100 7f9714894700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7f9714894700 -1 osd.7 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f63c6557f00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1937
:...
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.557+0100 7f63c6557f00 -1 
bluestore(/var/lib/ceph/osd/ceph-7) _open_db erroring opening db: 


OSD 8:
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 received  signal: Terminated from 
/sbin/init  (PID: 1) UID: 0
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Got signal 
Terminated ***
2021-12-12T21:20:11.141+0100 7fd1ccf01700 -1 osd.8 2591 *** Immediate shutdown 
(osd_fast_shutdown=true) ***
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 set uid:gid to 64045:64045 
(ceph:ceph)
2021-12-12T21:21:41.881+0100 7f6d18d2bf00  0 ceph version 16.2.6 
(1a6b9a05546f335eeeddb460fdc89caadf80ac7a) pacific (stable), process ceph-osd, 
pid 1938
:...
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 rocksdb: Corruption: Bad table 
magic number: expected 9863518390377041911, found 0 in db/002182.sst
2021-12-12T21:21:44.577+0100 7f6d18d2bf00 -1 
bluestore(/var/lib/ceph/osd/ceph-8) _open_db erroring opening db: 



Best regards,
Sebastian


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to