Hi,all
My cluster running 12.2.0 with bluestore, we used fio tool with librbd
ioengine make io test yesterday, and serval osds crash one after another.
3 * node, 30 OSD, 1TB SATA HDD for OSD data, 1GB SATA SSD partition for db,
576 MB SATA SSD partition for wal.
ceph options:
bluestore_shard_finishers = true
mon_osd_prime_pg_temp = false
mon_allow_pool_delete = true
mgr_op_latency_sample_interval = 300
-9> 2017-09-15 12:20:38.879807 7f079d1a4700 4 rocksdb:
[/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1403]
[default] [JOB 3] Compacting 1@1 + 1@2 files to L2, score 1.22
-8> 2017-09-15 12:20:38.879814 7f079d1a4700 4 rocksdb:
[/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:1407]
[default] Compaction start summary: Base version 2 Base level 1, inputs:
[792(66MB)], [406(65MB)]
-7> 2017-09-15 12:20:38.879831 7f079d1a4700 4 rocksdb: EVENT_LOG_v1
{"time_micros": 1505449238879818, "job": 3, "event": "compaction_started",
"files_L1": [792], "files_L2": [406], "score": 1.2195, "input_data_size":
138472863}
-6> 2017-09-15 12:20:38.946227 7f07b7e07d00 1 freelist init
-5> 2017-09-15 12:20:40.633404 7f079d1a4700 3 rocksdb:
[/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1591]
Compaction error: Corruption: block checksum mismatch
-4> 2017-09-15 12:20:40.633487 7f079d1a4700 4 rocksdb: (Original Log Time
2017/09/15-12:20:40.633205)
[/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/compaction_job.cc:621]
[default] compacted to: base level 1 max bytes base 268435456 files[1 5 4 0 0
0 0] max score 0.96, MB/sec: 79.0 rd, 38.3 wr, level 2, files in(1, 1) out(1)
MB in(66.5, 65.5) out(64.0), read-write-amplify(2.9) write-amplify(1.0)
Corruption: block checksum mismatch, records in: 870254, records dropped: 500216
-3> 2017-09-15 12:20:40.633502 7f079d1a4700 4 rocksdb: (Original Log Time
2017/09/15-12:20:40.633373) EVENT_LOG_v1 {"time_micros": 1505449240633323,
"job": 3, "event": "compaction_finished", "compaction_time_micros": 1753285,
"output_level": 2, "num_output_files": 1, "total_output_size": 67111607,
"num_input_records": 857815, "num_output_records": 357599,
"num_subcompactions": 1, "num_single_delete_mismatches": 0,
"num_single_delete_fallthrough": 1, "lsm_state": [1, 5, 4, 0, 0, 0, 0]}
-2> 2017-09-15 12:20:40.633505 7f079d1a4700 2 rocksdb:
[/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/rocksdb/db/db_impl_compaction_flush.cc:1275]
Waiting after background compaction error: Corruption: block checksum
mismatch, Accumulated background error counts: 1
-1> 2017-09-15 12:20:40.671905 7f07b7e07d00 1
bluestore(/var/lib/ceph/osd/ceph-11) _open_alloc opening allocation metadata
0> 2017-09-15 12:20:40.678281 7f07b7e07d00 -1
/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc:
In function 'virtual void BitAllocator::free_blocks(int64_t, int64_t)' thread
7f07b7e07d00 time 2017-09-15 12:20:40.675594
/clove/vm/clove/ceph/rpmbuild/BUILD/ceph-12.2.0/src/os/bluestore/BitAllocator.cc:
1270: FAILED assert(start_block + num_blocks <= size())
ceph version 12.2.0-2 (d177b39d8bf8a81dfacff53487d7d9747e6eadad) luminous
(stable)
1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
const*)+0x110) [0x7f07b88b9970]
2: (()+0xa3411a) [0x7f07b887011a]
3: (BitMapAllocator::insert_free(unsigned long, unsigned long)+0x9d)
[0x7f07b886ebcd]
4: (BitMapAllocator::init_add_free(unsigned long, unsigned long)+0xd3)
[0x7f07b886f173]
5: (BlueStore::_open_alloc()+0x1c0) [0x7f07b8727970]
6: (BlueStore::_mount(bool)+0x443) [0x7f07b8794fa3]
7: (OSD::init()+0x3ba) [0x7f07b834e35a]
8: (main()+0x2def) [0x7f07b825552f]
9: (__libc_start_main()+0xf5) [0x7f07b4473af5]
10: (()+0x4b7cc6) [0x7f07b82f3cc6]
NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
interpret this.
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com