[ceph-users] Issue with cephfs

LeeQ @ BitBahn.io Fri, 29 May 2020 12:24:14 -0700

We have been using a cephfs pool to store machine data to, the data is not 
overly critical at this time but.



Its got to around 8TB and we started to see kernel panics with the hosts that 
had the mounts in place.


Now when try to start the MDS's they cycle through, Active, Replay, 
ClientReplay about 10 times and then just fail in a active(laggy)state.


So I delete the MDS's


(docker-croit)@us-croit-enc-deploy01 ~ $ ceph fs dump
dumped fsmap epoch 5307
e5307
enable_multiple, ever_enabled_multiple: 0,0
compat: compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
legacy client fscid: 1

Filesystem 'cephfs' (1)
fs_name cephfs
epoch   5307
flags   12
created 2019-10-26 20:43:02.087584
modified        2019-10-26 21:35:17.285598
tableserver     0
root    0
session_timeout 60
session_autoclose       300
max_file_size   1099511627776
min_compat_client       -1 (unspecified)
last_failure    0
last_failure_osd_epoch  2122066
compat  compat={},rocompat={},incompat={1=base v0.20,2=client writeable 
ranges,3=default file layouts on dirs,4=dir inode in separate object,5=mds uses 
versioned encoding,6=dirfrag is stored in omap,8=no anchor table,9=file layout 
v2,10=snaprealm v2}
max_mds 1
in      0
up      {0=267576193}
failed
damaged
stopped
data_pools      [5,14]
metadata_pool   3
inline_data     disabled
balancer
standby_count_wanted    1
267576193:      v1:100.129.255.186:6800/1355970155 'us-ceph-enc-svc02' 
mds.0.5301 up:active seq 16 laggy since 2019-10-26 21:12:08.027863

Looks ok.

Then run

(docker-croit)@us-croit-enc-deploy01 ~ $ ceph fs ls
name: cephfs, metadata pool: cephfs_metadata, data pools: [cephfs_data 
us_enc_datarefuge_001a ]
(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all 
journal export lee.bak
journal is 523855986688~99768
wrote 99768 bytes at offset 523855986688 to lee.bak
NOTE: this is a _sparse_ file; you can
        $ tar cSzf lee.bak.tgz lee.bak
      to efficiently compress it while preserving sparseness.
(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all 
event recover_dentries summary
Events by type:
  RESETJOURNAL: 1
  SESSION: 363
  SESSIONS: 17
  UPDATE: 14
Errors: 0
(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-journal-tool --rank cephfs:all 
journal reset
old journal was 523855986688~99768
new journal start will be 523860180992 (4094536 bytes past old end)
writing journal head
writing EResetJournal entry
done
(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset session
{
    "0": {
        "data": {},
        "result": 0
    }
}

(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset snap
{
    "result": 0
}

(docker-croit)@us-croit-enc-deploy01 ~ $ cephfs-table-tool all reset inode
{
    "0": {
        "data": {},
        "result": 0
    }
}

Re add the MDS's and we go back round in a circle.

Am I missing something? do I need to drop the metadata and recreate it maybe? 
If it comes to it I can drop all the data and start over, but don't really want 
to.



_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

[ceph-users] Issue with cephfs

Reply via email to