Hi 

 

We got a pair of OSD located in  node that crash randomly since 14.2.2

 

OS Version : Centos 7.6

 

There're a ton of lines before crash , I will unespected:

 

--

3045> 2019-08-07 00:39:32.013 7fe9a4996700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3044> 2019-08-07 00:39:32.013 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3043> 2019-08-07 00:39:32.033 7fe9a4195700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

-3042> 2019-08-07 00:39:32.033 7fe9a4996700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

--

-----

 

Some hundred lines of:

-164> 2019-08-07 00:47:36.628 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -163> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

  -162> 2019-08-07 00:47:36.632 7fe9a3994700  1 heartbeat_map is_healthy
'OSD::osd_op_tp thread 0x7fe98964c700' had timed out after 60

-----

 

   -78> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient: tick

   -77> 2019-08-07 00:50:51.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:21.756453)

   -76> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient: tick

   -75> 2019-08-07 00:51:01.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:31.756604)

   -74> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient: tick

   -73> 2019-08-07 00:51:11.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:41.756788)

   -72> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient: tick

   -71> 2019-08-07 00:51:21.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:50:51.756982)

   -70> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient: tick

   -69> 2019-08-07 00:51:31.755 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:01.757206)

   -68> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient: tick

   -67> 2019-08-07 00:51:41.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:11.757364)

   -66> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient: tick

   -65> 2019-08-07 00:51:51.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:21.757535)

   -64> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout
'OSD::osd_op_tp thread 0x7fe987e49700' had timed out after 15

   -63> 2019-08-07 00:51:52.861 7fe987e49700  1 heartbeat_map clear_timeout
'OSD::osd_op_tp thread 0x7fe987e49700' had suicide timed out after 150

   -62> 2019-08-07 00:51:52.948 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1721180160 mapped: 4297818112 old cache_size:
1994018210 new cache size: 1992784572

   -61> 2019-08-07 00:51:52.948 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1992784572
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654593191 data_alloc: 452984832 data_used: 455929856

   -60> 2019-08-07 00:51:57.923 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1994110827
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -59> 2019-08-07 00:51:57.973 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1994110827 new cache size: 1994442069

   -58> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient: tick

   -57> 2019-08-07 00:52:01.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:31.757684)

   -56> 2019-08-07 00:52:02.933 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1995765747
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -55> 2019-08-07 00:52:02.983 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1995765747 new cache size: 1996096345

   -54> 2019-08-07 00:52:07.943 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1997417449
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -53> 2019-08-07 00:52:07.993 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1997417449 new cache size: 1997747404

   -52> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient: tick

   -51> 2019-08-07 00:52:11.757 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:41.757855)

   -50> 2019-08-07 00:52:12.952 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 1999065941
kv_alloc: 763363328 kv_used: 749381098 meta_alloc: 763363328 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -49> 2019-08-07 00:52:13.002 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
1999065941 new cache size: 1999395254

   -48> 2019-08-07 00:52:17.962 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2000711226
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -47> 2019-08-07 00:52:18.012 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2000711226 new cache size: 2001039899

   -46> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient: tick

   -45> 2019-08-07 00:52:21.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:51:51.758043)

   -44> 2019-08-07 00:52:22.971 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2002353314
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -43> 2019-08-07 00:52:23.022 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2002353314 new cache size: 2002681348

   -42> 2019-08-07 00:52:27.982 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2003992210
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -41> 2019-08-07 00:52:28.031 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2003992210 new cache size: 2004319607

   -40> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient: tick

   -39> 2019-08-07 00:52:31.756 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:01.758219)

   -38> 2019-08-07 00:52:32.991 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2005627920
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -37> 2019-08-07 00:52:33.041 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2005627920 new cache size: 2005954680

   -36> 2019-08-07 00:52:38.001 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2007260450
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -35> 2019-08-07 00:52:38.051 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2007260450 new cache size: 2007586575

   -34> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient: tick

   -33> 2019-08-07 00:52:41.757 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:11.758447)

   -32> 2019-08-07 00:52:43.011 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2008889806
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -31> 2019-08-07 00:52:43.061 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2008889806 new cache size: 2009215297

   -30> 2019-08-07 00:52:48.021 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2010515995
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -29> 2019-08-07 00:52:48.071 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2010515995 new cache size: 2010840853

   -28> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient: tick

   -27> 2019-08-07 00:52:51.757 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:21.758631)

   -26> 2019-08-07 00:52:53.031 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2012139023
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -25> 2019-08-07 00:52:53.081 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2012139023 new cache size: 2012463250

   -24> 2019-08-07 00:52:58.042 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2013758896
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -23> 2019-08-07 00:52:58.092 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2013758896 new cache size: 2014082492

   -22> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient: tick

   -21> 2019-08-07 00:53:01.758 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:31.758799)

   -20> 2019-08-07 00:53:03.052 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2015375620
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -19> 2019-08-07 00:53:03.102 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2015375620 new cache size: 2015698587

   -18> 2019-08-07 00:53:08.062 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2016989201
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -17> 2019-08-07 00:53:08.112 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2016989201 new cache size: 2017311541

   -16> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient: tick

   -15> 2019-08-07 00:53:11.758 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:41.759013)

   -14> 2019-08-07 00:53:13.071 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2018599645
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -13> 2019-08-07 00:53:13.121 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2018599645 new cache size: 2018921358

   -12> 2019-08-07 00:53:18.081 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2020206960
kv_alloc: 771751936 kv_used: 749381098 meta_alloc: 771751936 meta_used:
654590799 data_alloc: 452984832 data_used: 451538944

   -11> 2019-08-07 00:53:18.130 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2020206960 new cache size: 2020528048

   -10> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient: tick

    -9> 2019-08-07 00:53:21.757 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:52:51.759214)

    -8> 2019-08-07 00:53:23.090 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2021811150
kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used:
654590799 data_alloc: 461373440 data_used: 451538944

    -7> 2019-08-07 00:53:23.140 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2021811150 new cache size: 2022131613

    -6> 2019-08-07 00:53:28.100 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2023412220
kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used:
654590799 data_alloc: 461373440 data_used: 451538944

    -5> 2019-08-07 00:53:28.150 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2023412220 new cache size: 2023732060

    -4> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient: tick

    -3> 2019-08-07 00:53:31.758 7fe995bfa700 10 monclient:
_check_auth_rotating have uptodate secrets (they expire after 2019-08-07
00:53:01.759334)

    -2> 2019-08-07 00:53:33.110 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _trim_shards cache_size: 2025010178
kv_alloc: 780140544 kv_used: 749381098 meta_alloc: 780140544 meta_used:
654590799 data_alloc: 461373440 data_used: 451538944

    -1> 2019-08-07 00:53:33.160 7fe99966c700  5
bluestore.MempoolThread(0x55ff04ad6a88) _tune_cache_size target: 4294967296
heap: 6018998272 unmapped: 1725702144 mapped: 4293296128 old cache_size:
2025010178 new cache size: 2025329397

     0> 2019-08-07 00:53:37.655 7fe987e49700 -1 *** Caught signal (Aborted)
**

in thread 7fe987e49700 thread_name:tp_osd_tp

 

ceph version 14.2.2 (4f8fa0a0024755aae7d95567c63f11d6862d55be) nautilus
(stable)

1: (()+0xf5d0) [0x7fe9a7cba5d0]

2: (pthread_kill()+0x31) [0x7fe9a7cb79d1]

3: (ceph::HeartbeatMap::_check(ceph::heartbeat_handle_d const*, char const*,
unsigned long)+0x466) [0x55fef8748176]

4: (ceph::HeartbeatMap::clear_timeout(ceph::heartbeat_handle_d*)+0x7b)
[0x55fef874878b]

5:
(BlueStore::queue_transactions(boost::intrusive_ptr<ObjectStore::CollectionI
mpl>&, std::vector<ObjectStore::Transaction,
std::allocator<ObjectStore::Transaction> >&,
boost::intrusive_ptr<TrackedOp>, ThreadPool::TPHandle*)+0xa9e)
[0x55fef86085de]

6:
(ObjectStore::queue_transaction(boost::intrusive_ptr<ObjectStore::Collection
Impl>&, ObjectStore::Transaction&&, boost::intrusive_ptr<TrackedOp>,
ThreadPool::TPHandle*)+0x7f) [0x55fef81cd7ff]

7: (OSD::dispatch_context_transaction(PG::RecoveryCtx&, PG*,
ThreadPool::TPHandle*)+0x58) [0x55fef8118298]

8: (OSD::dequeue_peering_evt(OSDShard*, PG*,
std::shared_ptr<PGPeeringEvent>, ThreadPool::TPHandle&)+0x202)
[0x55fef81767c2]

9: (PGPeeringItem::run(OSD*, OSDShard*, boost::intrusive_ptr<PG>&,
ThreadPool::TPHandle&)+0x50) [0x55fef83eb490]

10: (OSD::ShardedOpWQ::_process(unsigned int,
ceph::heartbeat_handle_d*)+0x9f4) [0x55fef816aef4]

11: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x433)
[0x55fef8769ce3]

12: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x55fef876cd80]

13: (()+0x7dd5) [0x7fe9a7cb2dd5]

14: (clone()+0x6d) [0x7fe9a6b7202d]

 

About server load:  

 

[root@CEPH008 ~]# top

top - 00:57:30 up 186 days, 22 min,  1 user,  load average: 11.65, 13.42,
13.51

Tasks: 316 total,   1 running, 315 sleeping,   0 stopped,   0 zombie

%Cpu(s):  2.3 us,  1.2 sy,  0.0 ni, 74.1 id, 22.4 wa,  0.0 hi,  0.1 si,  0.0
st

KiB Mem : 65737480 total,   431824 free, 49046608 used, 16259048 buff/cache

KiB Swap: 29241340 total, 19406504 free,  9834836 used. 15917556 avail Mem

 

 

Currently the server is doing some deep-scrub that we got off during the
last two weeks due a node evict and a new node install. 

 

 

 

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to