> On 8.10.2018, at 12:37, Yan, Zheng <uker...@gmail.com> wrote:
>
> On Mon, Oct 8, 2018 at 4:37 PM Sergey Malinin <h...@newmail.com> wrote:
>>
>> What additional steps need to be taken in order to (try to) regain access to
>> the fs providing that I backed up metadata pool, created alternate metadata
>> pool and ran scan_extents, scan_links, scan_inodes, and somewhat recursive
>> scrub.
>> After that I only mounted the fs read-only to backup the data.
>> Would anything even work if I had mds journal and purge queue truncated?
>>
>
> did you backed up whole metadata pool? did you make any modification
> to the original metadata pool? If you did, what modifications?
I backed up both journal and purge queue and used cephfs-journal-tool to
recover dentries, then reset journal and purge queue on original metadata pool.
Before proceeding to alternate metadata pool recovery I was able to start MDS
but it soon failed throwing lots of 'loaded dup inode' errors, not sure if that
involved changing anything in the pool.
I have left the original metadata pool untouched sine then.
>
> Yan, Zheng
>
>>
>>> On 8.10.2018, at 05:15, Yan, Zheng <uker...@gmail.com> wrote:
>>>
>>> Sorry. this is caused wrong backport. downgrading mds to 13.2.1 and
>>> marking mds repaird can resolve this.
>>>
>>> Yan, Zheng
>>> On Sat, Oct 6, 2018 at 8:26 AM Sergey Malinin <h...@newmail.com> wrote:
>>>>
>>>> Update:
>>>> I discovered http://tracker.ceph.com/issues/24236 and
>>>> https://github.com/ceph/ceph/pull/22146
>>>> Make sure that it is not relevant in your case before proceeding to
>>>> operations that modify on-disk data.
>>>>
>>>>
>>>> On 6.10.2018, at 03:17, Sergey Malinin <h...@newmail.com> wrote:
>>>>
>>>> I ended up rescanning the entire fs using alternate metadata pool approach
>>>> as in http://docs.ceph.com/docs/mimic/cephfs/disaster-recovery/
>>>> The process has not competed yet because during the recovery our cluster
>>>> encountered another problem with OSDs that I got fixed yesterday (thanks
>>>> to Igor Fedotov @ SUSE).
>>>> The first stage (scan_extents) completed in 84 hours (120M objects in data
>>>> pool on 8 hdd OSDs on 4 hosts). The second (scan_inodes) was interrupted
>>>> by OSDs failure so I have no timing stats but it seems to be runing 2-3
>>>> times faster than extents scan.
>>>> As to root cause -- in my case I recall that during upgrade I had
>>>> forgotten to restart 3 OSDs, one of which was holding metadata pool
>>>> contents, before restarting MDS daemons and that seemed to had an impact
>>>> on MDS journal corruption, because when I restarted those OSDs, MDS was
>>>> able to start up but soon failed throwing lots of 'loaded dup inode'
>>>> errors.
>>>>
>>>>
>>>> On 6.10.2018, at 00:41, Alfredo Daniel Rezinovsky <alfrenov...@gmail.com>
>>>> wrote:
>>>>
>>>> Same problem...
>>>>
>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>> 2018-10-05 18:37:10.704 7f01f60a9bc0 -1 Missing object 500.0000016c
>>>> Overall journal integrity: DAMAGED
>>>> Objects missing:
>>>> 0x16c
>>>> Corrupt regions:
>>>> 0x5b000000-ffffffffffffffff
>>>>
>>>> Just after upgrade to 13.2.2
>>>>
>>>> Did you fixed it?
>>>>
>>>>
>>>> On 26/09/18 13:05, Sergey Malinin wrote:
>>>>
>>>> Hello,
>>>> Followed standard upgrade procedure to upgrade from 13.2.1 to 13.2.2.
>>>> After upgrade MDS cluster is down, mds rank 0 and purge_queue journal are
>>>> damaged. Resetting purge_queue does not seem to work well as journal still
>>>> appears to be damaged.
>>>> Can anybody help?
>>>>
>>>> mds log:
>>>>
>>>> -789> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.mds2 Updating MDS map to
>>>> version 586 from mon.2
>>>> -788> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map i
>>>> am now mds.0.583
>>>> -787> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 handle_mds_map
>>>> state change up:rejoin --> up:active
>>>> -786> 2018-09-26 18:42:32.527 7f70f78b1700 1 mds.0.583 recovery_done --
>>>> successful recovery!
>>>> <skip>
>>>> -38> 2018-09-26 18:42:32.707 7f70f28a7700 -1 mds.0.purge_queue _consume:
>>>> Decode error at read_pos=0x322ec6636
>>>> -37> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2
>>>> set_want_state: up:active -> down:damaged
>>>> -36> 2018-09-26 18:42:32.707 7f70f28a7700 5 mds.beacon.mds2 _send
>>>> down:damaged seq 137
>>>> -35> 2018-09-26 18:42:32.707 7f70f28a7700 10 monclient: _send_mon_message
>>>> to mon.ceph3 at mon:6789/0
>>>> -34> 2018-09-26 18:42:32.707 7f70f28a7700 1 -- mds:6800/e4cc09cf -->
>>>> mon:6789/0 -- mdsbeacon(14c72/mds2 down:damaged seq 137 v24a) v7 --
>>>> 0x563b321ad480 con 0
>>>> <skip>
>>>> -3> 2018-09-26 18:42:32.743 7f70f98b5700 5 -- mds:6800/3838577103 >>
>>>> mon:6789/0 conn(0x563b3213e000 :-1
>>>> s=STATE_OPEN_MESSAGE_READ_FOOTER_AND_DISPATCH pgs=8 cs=1 l=1). rx mon.2
>>>> seq 29 0x563b321ab880 mdsbeaco
>>>> n(85106/mds2 down:damaged seq 311 v587) v7
>>>> -2> 2018-09-26 18:42:32.743 7f70f98b5700 1 -- mds:6800/3838577103 <==
>>>> mon.2 mon:6789/0 29 ==== mdsbeacon(85106/mds2 down:damaged seq 311 v587)
>>>> v7 ==== 129+0+0 (3296573291 0 0) 0x563b321ab880 con 0x563b3213e
>>>> 000
>>>> -1> 2018-09-26 18:42:32.743 7f70f98b5700 5 mds.beacon.mds2
>>>> handle_mds_beacon down:damaged seq 311 rtt 0.038261
>>>> 0> 2018-09-26 18:42:32.743 7f70f28a7700 1 mds.mds2 respawn!
>>>>
>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>> Overall journal integrity: DAMAGED
>>>> Corrupt regions:
>>>> 0x322ec65d9-ffffffffffffffff
>>>>
>>>> # cephfs-journal-tool --journal=purge_queue journal reset
>>>> old journal was 13470819801~8463
>>>> new journal start will be 13472104448 (1276184 bytes past old end)
>>>> writing journal head
>>>> done
>>>>
>>>> # cephfs-journal-tool --journal=purge_queue journal inspect
>>>> 2018-09-26 19:00:52.848 7f3f9fa50bc0 -1 Missing object 500.00000c8c
>>>> Overall journal integrity: DAMAGED
>>>> Objects missing:
>>>> 0xc8c
>>>> Corrupt regions:
>>>> 0x323000000-ffffffffffffffff
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com