Hello,
Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2. 
After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are 
damaged. Resetting purge_queue does not seem to work well as journal still 
appears to be damaged.
Can anybody help?

mds log:

  -789> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.mds2 Updating MDS map to 
version 586 from mon.2
  -788> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map i am 
now mds.0.583
  -787> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 handle_mds_map state 
change up:rejoin --> up:active
  -786> 2018-09-26 18:42:32.527 7f70f78b1700  1 mds.0.583 recovery_done -- 
successful recovery!
<skip>
   -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume: 
Decode error at read_pos=0x322ec6636
   -37> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 set_want_state: 
up:active -> down:damaged
   -36> 2018-09-26 18:42:32.707 7f70f28a7700  5 mds.beacon.mds2 _send 
down:damaged seq 137
   -35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message to 
mon.ceph3 at mon:6789/0
   -34> 2018-09-26 18:42:32.707 7f70f28a7700  1 -- mds:6800/e4cc09cf --> 
mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 -- 
0x563b321ad480 con 0
<skip>
    -3> 2018-09-26 18:42:32.743 7f70f98b5700  5 -- mds:6800/3838577103 >> 
mon:6789/0 conn(0x563b3213e000 :-1 
s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2 seq 29 
0x563b321ab880 mdsbeaco
n(85106/mds2 down:damaged seq 311 v587) v7
    -2> 2018-09-26 18:42:32.743 7f70f98b5700  1 -- mds:6800/3838577103 <== 
mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587) v7 
==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
000
    -1> 2018-09-26 18:42:32.743 7f70f98b5700  5 mds.beacon.mds2 
handle_mds_beacon down:damaged seq 311 rtt 0.038261
     0> 2018-09-26 18:42:32.743 7f70f28a7700  1 mds.mds2 respawn!

# cephfs-journal-tool --journal=purge_queue journal inspect
Overall journal integrity: DAMAGED
Corrupt regions:
  0x322ec65d9-ffffffffffffffff

# cephfs-journal-tool --journal=purge_queue journal reset
old journal was 13470819801~8463
new journal start will be 13472104448 (1276184 bytes past old end)
writing journal head
done

# cephfs-journal-tool --journal=purge_queue journal inspect
2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
Overall journal integrity: DAMAGED
Objects missing:
  0xc8c
Corrupt regions:
  0x323000000-ffffffffffffffff
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to