Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread John Spray
On Thu, Jun 21, 2018 at 4:39 PM Benjeman Meekhof wrote: > > I do have one follow-up related question: While doing this I took > offline all the standby MDS, and max_mds on our cluster is at 1. Were > I to enable multiple MDS would they all actively split up processing > the purge queue? When yo

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread Benjeman Meekhof
I do have one follow-up related question: While doing this I took offline all the standby MDS, and max_mds on our cluster is at 1. Were I to enable multiple MDS would they all actively split up processing the purge queue? We have not yet at this point ever allowed multi active MDS but plan to en

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread Benjeman Meekhof
Thanks very much John! Skipping over the corrupt entry by setting a new expire_pos seems to have worked. The journal expire_pos is now advancing and pools are being purged. It has a little while to go to catch up to current write_pos but the journal inspect command gives an 'OK' for overall inte

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-21 Thread John Spray
On Wed, Jun 20, 2018 at 2:17 PM Benjeman Meekhof wrote: > > Thanks for the response. I was also hoping to be able to debug better > once we got onto Mimic. We just finished that upgrade yesterday and > cephfs-journal-tool does find a corruption in the purge queue though > our MDS continues to st

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-20 Thread Benjeman Meekhof
Thanks for the response. I was also hoping to be able to debug better once we got onto Mimic. We just finished that upgrade yesterday and cephfs-journal-tool does find a corruption in the purge queue though our MDS continues to startup and the filesystem appears to be functional as usual. How ca

Re: [ceph-users] MDS: journaler.pq decode error

2018-06-15 Thread John Spray
On Fri, Jun 15, 2018 at 2:55 PM, Benjeman Meekhof wrote: > Have seen some posts and issue trackers related to this topic in the > past but haven't been able to put it together to resolve the issue I'm > having. All on Luminous 12.2.5 (upgraded over time from past > releases). We are going to upg