Hi,
I deployed rook v0.8.3 with ceph 12.2.7. This is production system deployed for 
a long time.
Because unknown reason, mon couldn't form quorum anymore and I tried to restore 
mon from osd by following document below,
https://github.com/ceph/ceph/blob/v12.2.7/doc/rados/troubleshooting/troubleshooting-mon.rst#recovery-using-osds

After collecting cluster map data, replace store.db and restart mon, monitor 
log indicated that it tried to load "000000.sst" which was not existed. Log 
also indicated that mon found all .sst files during startup.
Detailed log as below.
2020-07-28 09:44:38.100932 I | rook-ceph-mon2: 2020-07-28 09:44:38.100799 
7f2e4abd0ec0  4 rocksdb: CURRENT file:  CURRENT
2020-07-28 09:44:38.100946 I | rook-ceph-mon2:
2020-07-28 09:44:38.100951 I | rook-ceph-mon2: 2020-07-28 09:44:38.100847 
7f2e4abd0ec0  4 rocksdb: IDENTITY file:  IDENTITY
2020-07-28 09:44:38.100958 I | rook-ceph-mon2:
2020-07-28 09:44:38.100963 I | rook-ceph-mon2: 2020-07-28 09:44:38.100865 
7f2e4abd0ec0  4 rocksdb: MANIFEST file:  MANIFEST-000014 size: 284 Bytes
2020-07-28 09:44:38.100967 I | rook-ceph-mon2:
2020-07-28 09:44:38.100972 I | rook-ceph-mon2: 2020-07-28 09:44:38.100869 
7f2e4abd0ec0  4 rocksdb: SST files in 
/var/lib/rook/rook-ceph-mon2/data/store.db dir, Total Num: 3, files: 000004.sst 
000007.sst 000010.sst
2020-07-28 09:44:38.100976 I | rook-ceph-mon2:
2020-07-28 09:44:38.100981 I | rook-ceph-mon2: 2020-07-28 09:44:38.100872 
7f2e4abd0ec0  4 rocksdb: Write Ahead Log file in 
/var/lib/rook/rook-ceph-mon2/data/store.db: 000015.log size: 0 ;
2020-07-28 09:44:38.100985 I | rook-ceph-mon2:
2020-07-28 09:44:38.100989 I | rook-ceph-mon2: 2020-07-28 09:44:38.100874 
7f2e4abd0ec0  4 rocksdb:                         Options.error_if_exists: 0
...
2020-07-28 09:44:38.101528 I | rook-ceph-mon2: 2020-07-28 09:44:38.101467 
7f2e4abd0ec0  4 rocksdb: Fast CRC32 supported: 1
2020-07-28 09:44:38.104667 I | rook-ceph-mon2: 2020-07-28 09:44:38.104317 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2609]
 Recovering from manifest file: MANIFEST-000014
2020-07-28 09:44:38.104726 I | rook-ceph-mon2:
2020-07-28 09:44:38.104926 I | rook-ceph-mon2: 2020-07-28 09:44:38.104582 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/column_family.cc:407]
 --------------- Options for column family [default]:
...
2020-07-28 09:44:38.105633 I | rook-ceph-mon2: 2020-07-28 09:44:38.104857 
7f2e4abd0ec0  4 rocksdb:                Options.report_bg_io_stats: 0
2020-07-28 09:44:38.111205 I | rook-ceph-mon2: 2020-07-28 09:44:38.110905 
7f2e4abd0ec0  2 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:1062]
 Unable to load table properties for file 0 --- IO error: 
/var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.111266 I | rook-ceph-mon2:
2020-07-28 09:44:38.111693 I | rook-ceph-mon2: 2020-07-28 09:44:38.110999 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2859]
 Recovered from manifest 
file:/var/lib/rook/rook-ceph-mon2/data/store.db/MANIFEST-000014 
succeeded,manifest_file_number is 14, next_file_number is 17, last_sequence is 
111, log_number is 0,prev_log_number is 0,max_column_family is 0
2020-07-28 09:44:38.111723 I | rook-ceph-mon2:
2020-07-28 09:44:38.111732 I | rook-ceph-mon2: 2020-07-28 09:44:38.111006 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/version_set.cc:2867]
 Column family [default] (ID 0), log number is 13
2020-07-28 09:44:38.111738 I | rook-ceph-mon2:
2020-07-28 09:44:38.111746 I | rook-ceph-mon2: 2020-07-28 09:44:38.111101 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:217]
 Shutdown: canceling all background work
2020-07-28 09:44:38.111764 I | rook-ceph-mon2: 2020-07-28 09:44:38.111214 
7f2e4abd0ec0  4 rocksdb: 
[/home/jenkins-build/build/workspace/ceph-build/ARCH/x86_64/AVAILABLE_ARCH/x86_64/AVAILABLE_DIST/centos7/DIST/centos7/MACHINE_SIZE/huge/release/12.2.7/rpm/el7/BUILD/ceph-12.2.7/src/rocksdb/db/db_impl.cc:343]
 Shutdown complete
2020-07-28 09:44:38.112077 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 
7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: 
/var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.112096 I | rook-ceph-mon2:
2020-07-28 09:44:38.112103 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 
7f2e4abd0ec0 -1 error opening mon data directory at 
'/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument
2020-07-28 09:44:38.116813 I | rook-ceph-mon2: 2020-07-28 09:44:38.111862 
7f2e4abd0ec0 -1 rocksdb: Corruption: Can't access /000000.sst: IO error: 
/var/lib/rook/rook-ceph-mon2/data/store.db/000000.sst: No such file or directory
2020-07-28 09:44:38.116874 I | rook-ceph-mon2:
2020-07-28 09:44:38.116883 I | rook-ceph-mon2: 2020-07-28 09:44:38.111941 
7f2e4abd0ec0 -1 error opening mon data directory at 
'/var/lib/rook/rook-ceph-mon2/data': (22) Invalid argument
failed to run mon. failed to start mon: Failed to complete 'rook-ceph-mon2': 
exit status 1.

   It's a little weird that it tried to load "000000.sst" although rook found 
the correct three .sst files during startup. There is no "000000.sst" related 
content in store.db.

    Any advice for this problem? Is it possible that I executed any wrong 
steps? Or is there any workaround for this?

root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# ls -al store.db/
total 92
drwxr-xr-x 2 root root   188 Jul 28 01:35 .
drwxr--r-- 3 root root    55 Jul 28 02:37 ..
-rw-r--r-- 1 root root 56547 Jul 28 02:38 000004.sst
-rw-r--r-- 1 root root  1179 Jul 28 02:38 000007.sst
-rw-r--r-- 1 root root  1243 Jul 28 02:38 000010.sst
-rw-r--r-- 1 root root     0 Jul 28 02:38 000015.log
-rw-r--r-- 1 root root    16 Jul 28 02:38 CURRENT
-rw-r--r-- 1 root root    37 Jul 28 02:38 IDENTITY
-rw-r--r-- 1 root root     0 Jul 28 02:38 LOCK
-rw-r--r-- 1 root root   284 Jul 28 02:38 MANIFEST-000014
-rw-r--r-- 1 root root  4620 Jul 28 02:38 OPTIONS-000014
-rw-r--r-- 1 root root  4620 Jul 28 02:38 OPTIONS-000017
root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep 
"000000.sst"
root@bagwig4:/var/lib/rook/rook-ceph-mon2/data# find . -type f | xargs grep 
"000000"
./store.db/OPTIONS-000017:  delete_obsolete_files_period_micros=21600000000
./store.db/OPTIONS-000017:  memtable_prefix_bloom_size_ratio=0.000000
./store.db/OPTIONS-000017:  max_bytes_for_level_multiplier=10.000000
./store.db/OPTIONS-000014:  delete_obsolete_files_period_micros=21600000000
./store.db/OPTIONS-000014:  memtable_prefix_bloom_size_ratio=0.000000
./store.db/OPTIONS-000014:  max_bytes_for_level_multiplier=10.000000




Thanks,

Jared, (韦煜)
Software developer
Interested in open source software, big data, Linux
_______________________________________________
ceph-users mailing list -- ceph-users@ceph.io
To unsubscribe send an email to ceph-users-le...@ceph.io

Reply via email to