Okay - I've finally got full debug logs from the flapping OSDs. The raw logs 
are both 100M each - I can email them directly if necessary. (Igor I've already 
sent these your way.)

Both flapping OSDs are reporting the same "bluefs _allocate failed to allocate" 
errors as before.  I've also noticed additional errors about corrupt blocks 
which I haven't noticed previously.  E.g.

2021-09-08T10:42:13.316+0000 7f705c4f2f00  3 rocksdb: 
[table/block_based_table_reader.cc:1117] Encountered error while reading data 
from compression dictionary block Corruption: block checksum mismatch: expected 
0, got 2324967111  in db/501397.sst offset 18446744073709551615 size 
18446744073709551615


FTR (I realised I never posted this before) our osd tree is:

[qs-admin@condor_sc0 ~]$ sudo docker exec fe4eb75fc98b ceph osd tree
ID  CLASS  WEIGHT   TYPE NAME            STATUS  REWEIGHT  PRI-AFF
-1         1.02539  root default
-7         0.34180      host condor_sc0
 1    ssd  0.34180          osd.1          down         0  1.00000
-5         0.34180      host condor_sc1
 0    ssd  0.34180          osd.0            up   1.00000  1.00000
-3         0.34180      host condor_sc2
 2    ssd  0.34180          osd.2          down   1.00000  1.00000


I've still not managed to get the ceph-bluestore-tool output - will get back to 
you on that.


_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to