I followed:

$ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid)
$ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal'
--partition-guid=1:$journal_uuid
--typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk

Then

$ sudo ceph-osd --mkjournal -i 20
$ sudo service ceph start osd.20

>From 
>https://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/

Which they all started without a problem.


On Sun, 2 Sep 2018 at 15:43, David Turner <drakonst...@gmail.com> wrote:

> It looks like osds on the first failed node are having problems. What
> commands did you run to bring it back online?
>
> On Sun, Sep 2, 2018, 10:27 AM Lee <lqui...@gmail.com> wrote:
>
>> Ok I have a lot in the health detail...
>>
>> root@node31-a4:~# ceph health detail
>> HEALTH_ERR 64 pgs backfill; 27 pgs backfill_toofull; 39 pgs backfilling;
>> 26 pgs degraded; 4 pgs down; 31 pgs incomplete; 1 pgs inconsistent; 12 pgs
>> recovery_wait; 1 pgs stale; 26 pgs stuck degraded; 31 pgs stuck inactive; 1
>> pgs stuck stale; 161 pgs stuck unclean; 9 pgs stuck undersized; 9 pgs
>> undersized; 726 requests are blocked > 32 sec; 9 osds have slow requests;
>> recovery 59636/5032695 objects degraded (1.185%); recovery 1280976/5032695
>> objects misplaced (25.453%); 1 scrub errors; noscrub,nodeep-scrub flag(s)
>> set
>> pg 2.2a is stuck inactive for 97629.478505, current state incomplete,
>> last acting [24,5]
>> pg 2.b0 is stuck inactive for 98000.688979, current state incomplete,
>> last acting [24,7]
>> pg 9.42 is stuck inactive for 108836.103738, current state incomplete,
>> last acting [31,12]
>> pg 9.de is stuck inactive since forever, current state incomplete, last
>> acting [6,5]
>> pg 2.75 is stuck inactive since forever, current state down+incomplete,
>> last acting [7,15]
>> pg 9.dc is stuck inactive for 113491.800208, current state incomplete,
>> last acting [6,7]
>> pg 2.74 is stuck inactive for 97658.382960, current state incomplete,
>> last acting [13,5]
>> pg 9.1e is stuck inactive since forever, current state incomplete, last
>> acting [7,15]
>> pg 2.15 is stuck inactive since forever, current state incomplete, last
>> acting [7,31]
>> pg 11.1c is stuck inactive since forever, current state down+incomplete,
>> last acting [6,7]
>> pg 2.a1 is stuck inactive for 98785.888826, current state incomplete,
>> last acting [14,12]
>> pg 9.d8 is stuck inactive for 115082.575098, current state
>> down+incomplete, last acting [21,5]
>> pg 9.a8 is stuck inactive for 118575.035210, current state incomplete,
>> last acting [14,7]
>> pg 9.78 is stuck inactive since forever, current state incomplete, last
>> acting [5,24]
>> pg 2.a2 is stuck inactive since forever, current state incomplete, last
>> acting [5,13]
>> pg 7.16 is stuck inactive since forever, current state incomplete, last
>> acting [6,7]
>> pg 2.13 is stuck inactive since forever, current state incomplete, last
>> acting [7,10]
>> pg 9.f5 is stuck inactive for 103009.439003, current state incomplete,
>> last acting [18,5]
>> pg 2.d is stuck inactive since forever, current state incomplete, last
>> acting [5,10]
>> pg 9.5 is stuck inactive since forever, current state incomplete, last
>> acting [5,18]
>> pg 9.3 is stuck inactive since forever, current state incomplete, last
>> acting [7,15]
>> pg 9.fc is stuck inactive for 201476.092908, current state incomplete,
>> last acting [13,5]
>> pg 11.33 is stuck inactive since forever, current state down+incomplete,
>> last acting [7,6]
>> pg 9.3f is stuck inactive since forever, current state incomplete, last
>> acting [5,14]
>> pg 9.a is stuck inactive for 113328.467457, current state incomplete,
>> last acting [18,7]
>> pg 2.63 is stuck inactive for 97665.176520, current state incomplete,
>> last acting [31,7]
>> pg 2.3 is stuck inactive for 97655.279670, current state incomplete, last
>> acting [14,5]
>> pg 2.32 is stuck inactive since forever, current state incomplete, last
>> acting [5,13]
>> pg 2.bf is stuck inactive for 99913.875808, current state incomplete,
>> last acting [15,7]
>> pg 9.26 is stuck inactive since forever, current state incomplete, last
>> acting [5,24]
>> pg 9.22 is stuck inactive since forever, current state incomplete, last
>> acting [7,24]
>> pg 9.25 is stuck unclean for 20091.777921, current state
>> active+degraded+remapped+wait_backfill, last acting [15,2]
>> pg 7.2b is stuck unclean for 98830.660179, current state
>> stale+active+undersized+degraded, last acting [5]
>> pg 11.27 is stuck unclean for 1777813.502308, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [4,36]
>> pg 2.f1 is stuck unclean for 26585.481715, current state
>> active+recovery_wait+degraded, last acting [13,8]
>> pg 9.22 is stuck unclean since forever, current state incomplete, last
>> acting [7,24]
>> pg 2.29 is stuck unclean for 5629.190514, current state
>> active+remapped+wait_backfill, last acting [24,40]
>> pg 9.fb is stuck unclean for 3640.777545, current state
>> active+remapped+wait_backfill, last acting [8,39]
>> pg 9.23 is stuck unclean for 3595.306511, current state
>> active+remapped+wait_backfill, last acting [35,9]
>> pg 2.f3 is stuck unclean for 4993.558900, current state
>> active+remapped+wait_backfill, last acting [6,9]
>> pg 2.f2 is stuck unclean for 8871.835444, current state
>> active+recovery_wait+degraded, last acting [6,4]
>> pg 2.2a is stuck unclean for 97629.478922, current state incomplete, last
>> acting [24,5]
>> pg 2.ed is stuck unclean for 3595.395657, current state
>> active+remapped+backfilling, last acting [9,40]
>> pg 2.24 is stuck unclean for 6391.873856, current state
>> active+remapped+wait_backfill, last acting [13,40]
>> pg 2.27 is stuck unclean for 6814.809178, current state
>> active+recovery_wait+degraded, last acting [13,3]
>> pg 2.e8 is stuck unclean for 11759.373756, current state
>> active+remapped+wait_backfill, last acting [15,36]
>> pg 11.29 is stuck unclean for 6907.684021, current state
>> active+remapped+wait_backfill, last acting [14,40]
>> pg 2.eb is stuck unclean for 14474.951608, current state
>> active+remapped+backfilling, last acting [0,31]
>> pg 2.ea is stuck unclean for 3595.396597, current state
>> active+remapped+backfilling, last acting [9,34]
>> pg 12.13 is stuck unclean for 5629.177184, current state active+remapped,
>> last acting [8,31]
>> pg 2.1d is stuck unclean for 12245.891518, current state
>> active+remapped+backfilling, last acting [3,6]
>> pg 11.15 is stuck unclean for 14683.173113, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [34,9]
>> pg 2.1c is stuck unclean for 14683.755228, current state
>> active+degraded+remapped+backfilling, last acting [14,11]
>> pg 11.16 is stuck unclean for 5629.180301, current state
>> active+remapped+wait_backfill, last acting [15,40]
>> pg 2.1f is stuck unclean for 11858.149360, current state
>> active+remapped+wait_backfill, last acting [15,3]
>> pg 0.1c is stuck unclean for 6907.683196, current state
>> active+remapped+wait_backfill, last acting [12,3]
>> pg 2.1e is stuck unclean for 102531.318993, current state
>> active+undersized+degraded+remapped+backfilling, last acting [13]
>> pg 2.e0 is stuck unclean for 3571.898995, current state
>> active+remapped+inconsistent+wait_backfill, last acting [6,9]
>> pg 2.18 is stuck unclean for 3502.358091, current state
>> active+remapped+backfilling, last acting [18,9]
>> pg 2.e3 is stuck unclean for 12047.716242, current state
>> active+remapped+backfilling, last acting [4,41]
>> pg 11.13 is stuck unclean for 6907.682681, current state
>> active+remapped+wait_backfill, last acting [14,8]
>> pg 9.d6 is stuck unclean for 7416.596559, current state
>> active+remapped+wait_backfill, last acting [1,9]
>> pg 9.1e is stuck unclean since forever, current state incomplete, last
>> acting [7,15]
>> pg 11.1c is stuck unclean since forever, current state down+incomplete,
>> last acting [6,7]
>> pg 2.15 is stuck unclean since forever, current state incomplete, last
>> acting [7,31]
>> pg 2.dc is stuck unclean for 11709.774640, current state
>> active+remapped+backfilling, last acting [40,4]
>> pg 2.14 is stuck unclean for 3504.589025, current state
>> active+remapped+backfilling, last acting [18,9]
>> pg 2.df is stuck unclean for 5047.489499, current state
>> active+remapped+wait_backfill, last acting [0,13]
>> pg 11.1e is stuck unclean for 1968924.322629, current state
>> active+remapped+wait_backfill, last acting [3,38]
>> pg 2.de is stuck unclean for 97621.617826, current state
>> active+undersized+degraded+remapped+backfilling, last acting [3]
>> pg 9.1d is stuck unclean for 48349.818420, current state
>> active+remapped+backfill_toofull, last acting [12,36]
>> pg 3.17 is stuck unclean for 5629.187939, current state active+remapped,
>> last acting [5,13]
>> pg 2.d8 is stuck unclean for 7418.583365, current state
>> active+remapped+backfilling, last acting [21,41]
>> pg 7.15 is stuck unclean for 98830.449502, current state
>> active+remapped+wait_backfill, last acting [13,2]
>> pg 11.19 is stuck unclean for 3925.828027, current state
>> active+remapped+wait_backfill, last acting [15,38]
>> pg 2.db is stuck unclean for 3595.396853, current state
>> active+remapped+backfilling, last acting [9,40]
>> pg 9.18 is stuck unclean for 27500.110917, current state
>> active+remapped+backfill_toofull, last acting [18,13]
>> pg 7.16 is stuck unclean since forever, current state incomplete, last
>> acting [6,7]
>> pg 2.13 is stuck unclean since forever, current state incomplete, last
>> acting [7,10]
>> pg 9.de is stuck unclean since forever, current state incomplete, last
>> acting [6,5]
>> pg 9.6 is stuck unclean for 219342.087677, current state
>> active+remapped+backfill_toofull, last acting [2,41]
>> pg 2.d is stuck unclean since forever, current state incomplete, last
>> acting [5,10]
>> pg 9.df is stuck unclean for 48360.843924, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [35,2]
>> pg 8.6 is stuck unclean for 5629.183555, current state active+remapped,
>> last acting [12,13]
>> pg 2.d7 is stuck unclean for 83782.680541, current state
>> active+undersized+degraded+remapped+backfilling, last acting [36]
>> pg 9.dc is stuck unclean for 113491.800754, current state incomplete,
>> last acting [6,7]
>> pg 7.a is stuck unclean for 3844.286529, current state
>> active+remapped+wait_backfill, last acting [38,2]
>> pg 9.5 is stuck unclean since forever, current state incomplete, last
>> acting [5,18]
>> pg 4.8 is stuck unclean for 3893.186289, current state
>> active+recovery_wait+degraded, last acting [15,2]
>> pg 3.d0 is stuck unclean for 7418.584435, current state
>> active+remapped+wait_backfill, last acting [12,2]
>> pg 2.d1 is stuck unclean for 83769.259615, current state
>> active+undersized+degraded+remapped+backfill_toofull, last acting [36]
>> pg 9.3 is stuck unclean since forever, current state incomplete, last
>> acting [7,15]
>> pg 9.d8 is stuck unclean for 115082.575647, current state
>> down+incomplete, last acting [21,5]
>> pg 2.b is stuck unclean for 7418.564413, current state
>> active+remapped+backfilling, last acting [40,24]
>> pg 9.d9 is stuck unclean for 14681.601684, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [39,4]
>> pg 9.1 is stuck unclean for 3930.973909, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [39,3]
>> pg 2.cc is stuck unclean for 5078.643356, current state active+remapped,
>> last acting [40,24]
>> pg 11.d is stuck unclean for 14592.297817, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [36,4]
>> pg 9.c5 is stuck unclean for 3844.281162, current state
>> active+remapped+wait_backfill, last acting [5,38]
>> pg 9.a is stuck unclean for 113328.467988, current state incomplete, last
>> acting [18,7]
>> pg 11.9 is stuck unclean for 7418.578072, current state
>> active+remapped+wait_backfill, last acting [21,39]
>> pg 2.0 is stuck unclean for 97873.488751, current state
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last
>> acting [1]
>> pg 2.cb is stuck unclean for 25031.035830, current state
>> active+degraded+remapped+wait_backfill+backfill_toofull, last acting [1,4]
>> pg 9.8 is stuck unclean for 24341.317696, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [5,24]
>> pg 2.3 is stuck unclean for 97655.280232, current state incomplete, last
>> acting [14,5]
>> pg 2.2 is stuck unclean for 97734.492834, current state
>> active+recovery_wait+degraded+remapped, last acting [13,9]
>> pg 2.c4 is stuck unclean for 3595.525931, current state
>> active+remapped+backfilling, last acting [34,9]
>> pg 2.c7 is stuck unclean for 8871.729496, current state
>> active+recovery_wait+degraded, last acting [13,2]
>> pg 9.cb is stuck unclean for 5629.175300, current state active+remapped,
>> last acting [11,31]
>> pg 9.c9 is stuck unclean for 14683.752701, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [5,34]
>> pg 2.c2 is stuck unclean for 3504.738005, current state
>> active+remapped+wait_backfill, last acting [9,15]
>> pg 2.bd is stuck unclean for 3571.325492, current state
>> active+remapped+backfilling, last acting [39,9]
>> pg 2.bf is stuck unclean for 99913.876400, current state incomplete,
>> last acting [15,7]
>> pg 9.b3 is stuck unclean for 3925.828356, current state
>> active+remapped+wait_backfill, last acting [15,35]
>> pg 2.b5 is stuck unclean for 28026.340079, current state active+remapped,
>> last acting [2,40]
>> pg 2.b6 is stuck unclean for 11859.834286, current state
>> active+remapped+backfilling, last acting [1,31]
>> pg 2.b0 is stuck unclean for 98000.689674, current state incomplete, last
>> acting [24,7]
>> pg 2.b3 is stuck unclean for 5629.182841, current state
>> active+remapped+backfilling, last acting [3,0]
>> pg 2.ad is stuck unclean for 6907.677050, current state
>> active+remapped+backfilling, last acting [2,39]
>> pg 2.ae is stuck unclean for 11862.967346, current state
>> active+remapped+backfilling, last acting [34,13]
>> pg 9.a0 is stuck unclean for 14683.746136, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [1,3]
>> pg 2.aa is stuck unclean for 3571.307756, current state
>> active+remapped+backfilling, last acting [40,9]
>> pg 2.a7 is stuck unclean for 25030.658836, current state
>> active+remapped+wait_backfill, last acting [2,1]
>> pg 2.a6 is stuck unclean for 3930.913873, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [2,35]
>> pg 9.ad is stuck unclean for 8871.819919, current state
>> active+recovery_wait+degraded, last acting [6,8]
>> pg 2.a1 is stuck unclean for 98785.889529, current state incomplete, last
>> acting [14,12]
>> pg 1.a0 is stuck unclean for 5629.186426, current state active+remapped,
>> last acting [5,40]
>> pg 9.a8 is stuck unclean for 118575.035913, current state incomplete,
>> last acting [14,7]
>> pg 2.a2 is stuck unclean since forever, current state incomplete, last
>> acting [5,13]
>> pg 2.9d is stuck unclean for 11861.496234, current state
>> active+remapped+backfilling, last acting [6,38]
>> pg 2.9c is stuck unclean for 3506.888979, current state
>> active+remapped+wait_backfill, last acting [35,11]
>> pg 2.9b is stuck unclean for 5629.183979, current state
>> active+remapped+wait_backfill, last acting [6,0]
>> pg 9.91 is stuck unclean for 85752.028652, current state
>> active+remapped+wait_backfill, last acting [31,9]
>> pg 2.97 is stuck unclean for 9736.783735, current state
>> active+remapped+backfilling, last acting [35,24]
>> pg 2.91 is stuck unclean for 28553.979772, current state
>> active+remapped+backfilling, last acting [0,24]
>> pg 2.90 is stuck unclean for 30364.623932, current state
>> active+degraded+remapped+backfill_toofull, last acting [41,24]
>> pg 2.92 is stuck unclean for 25031.211566, current state
>> active+undersized+degraded+remapped+backfilling, last acting [8]
>> pg 9.99 is stuck unclean for 11862.827419, current state
>> active+remapped+wait_backfill, last acting [13,4]
>> pg 2.8f is stuck unclean for 17426.148382, current state
>> active+remapped+wait_backfill, last acting [15,9]
>> pg 2.88 is stuck unclean for 3591.054564, current state
>> active+remapped+wait_backfill, last acting [14,9]
>> pg 9.8f is stuck unclean for 3595.395794, current state
>> active+remapped+wait_backfill, last acting [9,15]
>> pg 2.87 is stuck unclean for 3844.271547, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [1,2]
>> pg 2.81 is stuck unclean for 83759.347793, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 9.8a is stuck unclean for 27697.026446, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [12,1]
>> pg 2.79 is stuck unclean for 12137.676488, current state
>> active+remapped+backfilling, last acting [7,40]
>> pg 2.78 is stuck unclean for 29127.120125, current state
>> active+remapped+backfilling, last acting [0,6]
>> pg 2.75 is stuck unclean since forever, current state down+incomplete,
>> last acting [7,15]
>> pg 2.74 is stuck unclean for 97658.383751, current state incomplete, last
>> acting [13,5]
>> pg 9.7c is stuck unclean for 114170.469704, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 9.7d is stuck unclean for 14077.123326, current state
>> active+remapped+backfilling, last acting [5,24]
>> pg 2.71 is stuck unclean for 11859.344208, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [21,3]
>> pg 2.73 is stuck unclean for 11859.417605, current state
>> active+remapped+backfilling, last acting [39,15]
>> pg 9.78 is stuck unclean since forever, current state incomplete, last
>> acting [5,24]
>> pg 9.79 is stuck unclean for 14595.569162, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [39,3]
>> pg 2.6d is stuck unclean for 27802.265038, current state
>> active+remapped+backfilling, last acting [4,13]
>> pg 9.62 is stuck unclean for 25030.488507, current state
>> active+remapped+backfill_toofull, last acting [36,2]
>> pg 2.6a is stuck unclean for 20323.517565, current state
>> active+remapped+wait_backfill, last acting [6,40]
>> pg 9.6c is stuck unclean for 14234.077824, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [41,2]
>> pg 9.6a is stuck unclean for 27035.043476, current state
>> active+remapped+backfill_toofull, last acting [36,4]
>> pg 2.63 is stuck unclean for 97665.177288, current state incomplete, last
>> acting [31,7]
>> pg 2.5d is stuck unclean for 3549.763078, current state
>> active+remapped+wait_backfill, last acting [9,34]
>> pg 2.5e is stuck unclean for 97736.064280, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [35,36]
>> pg 2.52 is stuck unclean for 8871.832670, current state
>> active+recovery_wait+degraded, last acting [6,4]
>> pg 9.59 is stuck unclean for 26868.986032, current state
>> active+remapped+wait_backfill, last acting [31,34]
>> pg 2.4f is stuck unclean for 12108.325792, current state
>> active+remapped+backfilling, last acting [11,40]
>> pg 2.49 is stuck unclean for 30446.302835, current state
>> active+remapped+wait_backfill, last acting [9,24]
>> pg 9.42 is stuck unclean for 108836.104626, current state incomplete,
>> last acting [31,12]
>> pg 2.45 is stuck unclean for 11284.580305, current state
>> active+degraded+remapped+backfilling, last acting [24,2]
>> pg 9.4f is stuck unclean for 3893.672356, current state
>> active+remapped+wait_backfill, last acting [0,21]
>> pg 2.44 is stuck unclean for 27623.439527, current state
>> active+recovery_wait+degraded+remapped, last acting [6,11]
>> pg 9.4c is stuck unclean for 6907.681859, current state
>> active+remapped+wait_backfill, last acting [15,36]
>> pg 2.46 is stuck unclean for 6907.682263, current state
>> active+remapped+backfilling, last acting [11,24]
>> pg 9.49 is stuck unclean for 14683.624639, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [2,31]
>> pg 11.35 is stuck unclean for 5872394.444913, current state
>> active+remapped+wait_backfill, last acting [40,36]
>> pg 2.3e is stuck unclean for 6907.683506, current state
>> active+remapped+backfilling, last acting [4,41]
>> pg 2.38 is stuck unclean for 5140.320861, current state
>> active+remapped+wait_backfill, last acting [0,5]
>> pg 2.3b is stuck unclean for 14456.624593, current state
>> active+remapped+wait_backfill+backfill_toofull, last acting [18,2]
>> pg 11.33 is stuck unclean since forever, current state down+incomplete,
>> last acting [7,6]
>> pg 10.3d is stuck unclean for 3595.395921, current state
>> active+remapped+wait_backfill, last acting [9,36]
>> pg 2.35 is stuck unclean for 8872.226171, current state
>> active+recovery_wait+degraded, last acting [6,11]
>> pg 2.fc is stuck unclean for 5820.330202, current state
>> active+remapped+backfilling, last acting [31,0]
>> pg 9.3f is stuck unclean since forever, current state incomplete, last
>> acting [5,14]
>> pg 2.ff is stuck unclean for 3595.396088, current state
>> active+remapped+backfilling, last acting [9,39]
>> pg 2.fe is stuck unclean for 6904.439076, current state
>> active+remapped+backfilling, last acting [21,0]
>> pg 9.f5 is stuck unclean for 103009.439909, current state incomplete,
>> last acting [18,5]
>> pg 7.34 is stuck unclean for 3886.510000, current state
>> active+remapped+wait_backfill, last acting [13,39]
>> pg 2.fb is stuck unclean for 57173.985429, current state
>> active+recovery_wait+degraded+remapped, last acting [6,8]
>> pg 2.32 is stuck unclean since forever, current state incomplete, last
>> acting [5,13]
>> pg 9.fe is stuck unclean for 7418.564930, current state
>> active+recovery_wait+degraded+remapped, last acting [6,3]
>> pg 9.26 is stuck unclean since forever, current state incomplete, last
>> acting [5,24]
>> pg 2.f7 is stuck unclean for 6915.532617, current state
>> active+remapped+backfilling, last acting [4,15]
>> pg 9.fc is stuck unclean for 201476.093824, current state incomplete,
>> last acting [13,5]
>> pg 7.2b is stuck undersized for 64282.169836, current state
>> stale+active+undersized+degraded, last acting [5]
>> pg 2.1e is stuck undersized for 3895.207475, current state
>> active+undersized+degraded+remapped+backfilling, last acting [13]
>> pg 2.de is stuck undersized for 3886.529396, current state
>> active+undersized+degraded+remapped+backfilling, last acting [3]
>> pg 2.d7 is stuck undersized for 7417.316099, current state
>> active+undersized+degraded+remapped+backfilling, last acting [36]
>> pg 2.d1 is stuck undersized for 6903.297196, current state
>> active+undersized+degraded+remapped+backfill_toofull, last acting [36]
>> pg 2.0 is stuck undersized for 4999.401505, current state
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last
>> acting [1]
>> pg 2.92 is stuck undersized for 4999.406547, current state
>> active+undersized+degraded+remapped+backfilling, last acting [8]
>> pg 2.81 is stuck undersized for 7417.378668, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 9.7c is stuck undersized for 3894.953894, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 9.25 is stuck degraded for 7413.083043, current state
>> active+degraded+remapped+wait_backfill, last acting [15,2]
>> pg 7.2b is stuck degraded for 64282.169913, current state
>> stale+active+undersized+degraded, last acting [5]
>> pg 2.f1 is stuck degraded for 3848.032008, current state
>> active+recovery_wait+degraded, last acting [13,8]
>> pg 2.f2 is stuck degraded for 7411.108195, current state
>> active+recovery_wait+degraded, last acting [6,4]
>> pg 2.27 is stuck degraded for 3893.230317, current state
>> active+recovery_wait+degraded, last acting [13,3]
>> pg 2.1c is stuck degraded for 7414.316299, current state
>> active+degraded+remapped+backfilling, last acting [14,11]
>> pg 2.1e is stuck degraded for 3895.207564, current state
>> active+undersized+degraded+remapped+backfilling, last acting [13]
>> pg 2.de is stuck degraded for 3886.529484, current state
>> active+undersized+degraded+remapped+backfilling, last acting [3]
>> pg 2.d7 is stuck degraded for 7417.316187, current state
>> active+undersized+degraded+remapped+backfilling, last acting [36]
>> pg 4.8 is stuck degraded for 3490.406821, current state
>> active+recovery_wait+degraded, last acting [15,2]
>> pg 2.d1 is stuck degraded for 6903.297288, current state
>> active+undersized+degraded+remapped+backfill_toofull, last acting [36]
>> pg 2.0 is stuck degraded for 4999.401597, current state
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last
>> acting [1]
>> pg 2.cb is stuck degraded for 7413.316930, current state
>> active+degraded+remapped+wait_backfill+backfill_toofull, last acting [1,4]
>> pg 2.2 is stuck degraded for 3894.930841, current state
>> active+recovery_wait+degraded+remapped, last acting [13,9]
>> pg 2.c7 is stuck degraded for 3886.500328, current state
>> active+recovery_wait+degraded, last acting [13,2]
>> pg 9.ad is stuck degraded for 7411.181412, current state
>> active+recovery_wait+degraded, last acting [6,8]
>> pg 2.90 is stuck degraded for 3893.715235, current state
>> active+degraded+remapped+backfill_toofull, last acting [41,24]
>> pg 2.92 is stuck degraded for 4999.406655, current state
>> active+undersized+degraded+remapped+backfilling, last acting [8]
>> pg 2.81 is stuck degraded for 7417.378776, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 9.7c is stuck degraded for 3894.954001, current state
>> active+undersized+degraded+remapped+wait_backfill, last acting [39]
>> pg 2.52 is stuck degraded for 7411.108431, current state
>> active+recovery_wait+degraded, last acting [6,4]
>> pg 2.45 is stuck degraded for 3892.755878, current state
>> active+degraded+remapped+backfilling, last acting [24,2]
>> pg 2.44 is stuck degraded for 7411.213966, current state
>> active+recovery_wait+degraded+remapped, last acting [6,11]
>> pg 2.35 is stuck degraded for 7411.295348, current state
>> active+recovery_wait+degraded, last acting [6,11]
>> pg 2.fb is stuck degraded for 6903.301076, current state
>> active+recovery_wait+degraded+remapped, last acting [6,8]
>> pg 9.fe is stuck degraded for 7413.453955, current state
>> active+recovery_wait+degraded+remapped, last acting [6,3]
>> pg 7.2b is stuck stale for 64232.262041, current state
>> stale+active+undersized+degraded, last acting [5]
>> pg 2.fc is active+remapped+backfilling, acting [31,0]
>> pg 2.ff is active+remapped+backfilling, acting [9,39]
>> pg 9.f5 is incomplete, acting [18,5]
>> pg 2.fe is active+remapped+backfilling, acting [21,0]
>> pg 2.fb is active+recovery_wait+degraded+remapped, acting [6,8]
>> pg 9.fe is active+recovery_wait+degraded+remapped, acting [6,3]
>> pg 9.fc is incomplete, acting [13,5]
>> pg 2.f7 is active+remapped+backfilling, acting [4,15]
>> pg 2.f1 is active+recovery_wait+degraded, acting [13,8]
>> pg 9.fb is active+remapped+wait_backfill, acting [8,39]
>> pg 2.f3 is active+remapped+wait_backfill, acting [6,9]
>> pg 2.f2 is active+recovery_wait+degraded, acting [6,4]
>> pg 2.ed is active+remapped+backfilling, acting [9,40]
>> pg 2.e8 is active+remapped+wait_backfill, acting [15,36]
>> pg 2.eb is active+remapped+backfilling, acting [0,31]
>> pg 2.ea is active+remapped+backfilling, acting [9,34]
>> pg 2.e0 is active+remapped+inconsistent+wait_backfill, acting [6,9]
>> pg 2.e3 is active+remapped+backfilling, acting [4,41]
>> pg 9.d6 is active+remapped+wait_backfill, acting [1,9]
>> pg 2.dc is active+remapped+backfilling, acting [40,4]
>> pg 2.df is active+remapped+wait_backfill, acting [0,13]
>> pg 2.de is active+undersized+degraded+remapped+backfilling, acting [3]
>> pg 2.d8 is active+remapped+backfilling, acting [21,41]
>> pg 2.db is active+remapped+backfilling, acting [9,40]
>> pg 9.de is incomplete, acting [6,5]
>> pg 9.df is active+remapped+wait_backfill+backfill_toofull, acting [35,2]
>> pg 9.dc is incomplete, acting [6,7]
>> pg 2.d7 is active+undersized+degraded+remapped+backfilling, acting [36]
>> pg 2.d1 is active+undersized+degraded+remapped+backfill_toofull, acting
>> [36]
>> pg 3.d0 is active+remapped+wait_backfill, acting [12,2]
>> pg 9.d8 is down+incomplete, acting [21,5]
>> pg 9.d9 is active+remapped+wait_backfill+backfill_toofull, acting [39,4]
>> pg 9.c5 is active+remapped+wait_backfill, acting [5,38]
>> pg 2.cb is active+degraded+remapped+wait_backfill+backfill_toofull,
>> acting [1,4]
>> pg 2.c4 is active+remapped+backfilling, acting [34,9]
>> pg 2.c7 is active+recovery_wait+degraded, acting [13,2]
>> pg 2.c2 is active+remapped+wait_backfill, acting [9,15]
>> pg 9.c9 is active+remapped+wait_backfill+backfill_toofull, acting [5,34]
>> pg 2.bd is active+remapped+backfilling, acting [39,9]
>> pg 2.bf is incomplete, acting [15,7]
>> pg 9.b3 is active+remapped+wait_backfill, acting [15,35]
>> pg 2.b6 is active+remapped+backfilling, acting [1,31]
>> pg 2.b0 is incomplete, acting [24,7]
>> pg 2.b3 is active+remapped+backfilling, acting [3,0]
>> pg 2.ad is active+remapped+backfilling, acting [2,39]
>> pg 2.ae is active+remapped+backfilling, acting [34,13]
>> pg 9.a0 is active+remapped+wait_backfill+backfill_toofull, acting [1,3]
>> pg 2.aa is active+remapped+backfilling, acting [40,9]
>> pg 2.a7 is active+remapped+wait_backfill, acting [2,1]
>> pg 2.a6 is active+remapped+wait_backfill+backfill_toofull, acting [2,35]
>> pg 9.ad is active+recovery_wait+degraded, acting [6,8]
>> pg 2.a1 is incomplete, acting [14,12]
>> pg 9.a8 is incomplete, acting [14,7]
>> pg 2.a2 is incomplete, acting [5,13]
>> pg 2.9d is active+remapped+backfilling, acting [6,38]
>> pg 2.9c is active+remapped+wait_backfill, acting [35,11]
>> pg 2.9b is active+remapped+wait_backfill, acting [6,0]
>> pg 9.91 is active+remapped+wait_backfill, acting [31,9]
>> pg 2.97 is active+remapped+backfilling, acting [35,24]
>> pg 2.91 is active+remapped+backfilling, acting [0,24]
>> pg 2.90 is active+degraded+remapped+backfill_toofull, acting [41,24]
>> pg 2.92 is active+undersized+degraded+remapped+backfilling, acting [8]
>> pg 9.99 is active+remapped+wait_backfill, acting [13,4]
>> pg 2.8f is active+remapped+wait_backfill, acting [15,9]
>> pg 2.88 is active+remapped+wait_backfill, acting [14,9]
>> pg 9.8f is active+remapped+wait_backfill, acting [9,15]
>> pg 2.87 is active+remapped+wait_backfill+backfill_toofull, acting [1,2]
>> pg 2.81 is active+undersized+degraded+remapped+wait_backfill, acting [39]
>> pg 9.8a is active+remapped+wait_backfill+backfill_toofull, acting [12,1]
>> pg 2.79 is active+remapped+backfilling, acting [7,40]
>> pg 2.78 is active+remapped+backfilling, acting [0,6]
>> pg 2.75 is down+incomplete, acting [7,15]
>> pg 2.74 is incomplete, acting [13,5]
>> pg 9.7c is active+undersized+degraded+remapped+wait_backfill, acting [39]
>> pg 9.7d is active+remapped+backfilling, acting [5,24]
>> pg 2.71 is active+remapped+wait_backfill+backfill_toofull, acting [21,3]
>> pg 2.73 is active+remapped+backfilling, acting [39,15]
>> pg 9.78 is incomplete, acting [5,24]
>> pg 9.79 is active+remapped+wait_backfill+backfill_toofull, acting [39,3]
>> pg 2.6d is active+remapped+backfilling, acting [4,13]
>> pg 9.62 is active+remapped+backfill_toofull, acting [36,2]
>> pg 2.6a is active+remapped+wait_backfill, acting [6,40]
>> pg 9.6c is active+remapped+wait_backfill+backfill_toofull, acting [41,2]
>> pg 9.6a is active+remapped+backfill_toofull, acting [36,4]
>> pg 2.63 is incomplete, acting [31,7]
>> pg 2.5d is active+remapped+wait_backfill, acting [9,34]
>> pg 2.5e is active+remapped+wait_backfill+backfill_toofull, acting [35,36]
>> pg 2.52 is active+recovery_wait+degraded, acting [6,4]
>> pg 9.59 is active+remapped+wait_backfill, acting [31,34]
>> pg 2.4f is active+remapped+backfilling, acting [11,40]
>> pg 2.49 is active+remapped+wait_backfill, acting [9,24]
>> pg 9.42 is incomplete, acting [31,12]
>> pg 2.45 is active+degraded+remapped+backfilling, acting [24,2]
>> pg 2.44 is active+recovery_wait+degraded+remapped, acting [6,11]
>> pg 9.4f is active+remapped+wait_backfill, acting [0,21]
>> pg 9.4c is active+remapped+wait_backfill, acting [15,36]
>> pg 2.46 is active+remapped+backfilling, acting [11,24]
>> pg 9.49 is active+remapped+wait_backfill+backfill_toofull, acting [2,31]
>> pg 11.35 is active+remapped+wait_backfill, acting [40,36]
>> pg 2.3e is active+remapped+backfilling, acting [4,41]
>> pg 2.38 is active+remapped+wait_backfill, acting [0,5]
>> pg 2.3b is active+remapped+wait_backfill+backfill_toofull, acting [18,2]
>> pg 11.33 is down+incomplete, acting [7,6]
>> pg 2.35 is active+recovery_wait+degraded, acting [6,11]
>> pg 10.3d is active+remapped+wait_backfill, acting [9,36]
>> pg 9.3f is incomplete, acting [5,14]
>> pg 7.34 is active+remapped+wait_backfill, acting [13,39]
>> pg 2.32 is incomplete, acting [5,13]
>> pg 9.26 is incomplete, acting [5,24]
>> pg 11.27 is active+remapped+wait_backfill+backfill_toofull, acting [4,36]
>> pg 9.25 is active+degraded+remapped+wait_backfill, acting [15,2]
>> pg 2.29 is active+remapped+wait_backfill, acting [24,40]
>> pg 9.22 is incomplete, acting [7,24]
>> pg 9.23 is active+remapped+wait_backfill, acting [35,9]
>> pg 2.2a is incomplete, acting [24,5]
>> pg 2.24 is active+remapped+wait_backfill, acting [13,40]
>> pg 2.27 is active+recovery_wait+degraded, acting [13,3]
>> pg 11.29 is active+remapped+wait_backfill, acting [14,40]
>> pg 2.1d is active+remapped+backfilling, acting [3,6]
>> pg 2.1c is active+degraded+remapped+backfilling, acting [14,11]
>> pg 11.15 is active+remapped+wait_backfill+backfill_toofull, acting [34,9]
>> pg 2.1f is active+remapped+wait_backfill, acting [15,3]
>> pg 11.16 is active+remapped+wait_backfill, acting [15,40]
>> pg 2.1e is active+undersized+degraded+remapped+backfilling, acting [13]
>> pg 0.1c is active+remapped+wait_backfill, acting [12,3]
>> pg 2.18 is active+remapped+backfilling, acting [18,9]
>> pg 11.13 is active+remapped+wait_backfill, acting [14,8]
>> pg 2.15 is incomplete, acting [7,31]
>> pg 11.1c is down+incomplete, acting [6,7]
>> pg 9.1e is incomplete, acting [7,15]
>> pg 2.14 is active+remapped+backfilling, acting [18,9]
>> pg 11.1e is active+remapped+wait_backfill, acting [3,38]
>> pg 9.1d is active+remapped+backfill_toofull, acting [12,36]
>> pg 11.19 is active+remapped+wait_backfill, acting [15,38]
>> pg 7.15 is active+remapped+wait_backfill, acting [13,2]
>> pg 2.13 is incomplete, acting [7,10]
>> pg 7.16 is incomplete, acting [6,7]
>> pg 9.18 is active+remapped+backfill_toofull, acting [18,13]
>> pg 2.d is incomplete, acting [5,10]
>> pg 9.6 is active+remapped+backfill_toofull, acting [2,41]
>> pg 7.a is active+remapped+wait_backfill, acting [38,2]
>> pg 4.8 is active+recovery_wait+degraded, acting [15,2]
>> pg 9.5 is incomplete, acting [5,18]
>> pg 9.3 is incomplete, acting [7,15]
>> pg 2.b is active+remapped+backfilling, acting [40,24]
>> pg 9.1 is active+remapped+wait_backfill+backfill_toofull, acting [39,3]
>> pg 11.d is active+remapped+wait_backfill+backfill_toofull, acting [36,4]
>> pg 9.a is incomplete, acting [18,7]
>> pg 2.0 is
>> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, acting
>> [1]
>> pg 11.9 is active+remapped+wait_backfill, acting [21,39]
>> pg 2.3 is incomplete, acting [14,5]
>> pg 9.8 is active+remapped+wait_backfill+backfill_toofull, acting [5,24]
>> pg 2.2 is active+recovery_wait+degraded+remapped, acting [13,9]
>> 33 ops are blocked > 16777.2 sec
>> 368 ops are blocked > 8388.61 sec
>> 238 ops are blocked > 4194.3 sec
>> 87 ops are blocked > 1048.58 sec
>> 2 ops are blocked > 8388.61 sec on osd.5
>> 98 ops are blocked > 4194.3 sec on osd.5
>> 98 ops are blocked > 8388.61 sec on osd.6
>> 1 ops are blocked > 8388.61 sec on osd.7
>> 27 ops are blocked > 4194.3 sec on osd.7
>> 12 ops are blocked > 4194.3 sec on osd.13
>> 87 ops are blocked > 1048.58 sec on osd.13
>> 2 ops are blocked > 16777.2 sec on osd.14
>> 98 ops are blocked > 8388.61 sec on osd.14
>> 3 ops are blocked > 16777.2 sec on osd.15
>> 97 ops are blocked > 8388.61 sec on osd.15
>> 1 ops are blocked > 4194.3 sec on osd.18
>> 100 ops are blocked > 4194.3 sec on osd.24
>> 28 ops are blocked > 16777.2 sec on osd.31
>> 72 ops are blocked > 8388.61 sec on osd.31
>> 9 osds have slow requests
>> recovery 59636/5032695 objects degraded (1.185%)
>> recovery 1280976/5032695 objects misplaced (25.453%)
>> 1 scrub errors
>> noscrub,nodeep-scrub flag(s) set
>>
>>
>> On the first failed host is 6, 13, 14, 15, 18, 24, 31
>>
>> On the second host that went down was 5 and 7
>>
>>
>>
>> On Sun, 2 Sep 2018 at 15:15, David Turner <drakonst...@gmail.com> wrote:
>>
>>> When the first node went offline with a dead SSD journal, all of the
>>> dates on the OSDs was useless. Unless you could flush the journals, you
>>> can't guarantee that a wire the cluster think happened actually made it to
>>> the disk.  The proper procedure here is to remove those OSDs and add them
>>> again as new OSDs.
>>>
>>> `ceph health detail` will give you some more information on the blocked
>>> requests. Depending on what that shows you can often find the OSD that is
>>> causing the problems.  But your biggest problem is that you have dishes
>>> with potentially inconsistent data in your closer.
>>>
>>> On Sun, Sep 2, 2018, 4:42 AM Lee <lqui...@gmail.com> wrote:
>>>
>>>> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x
>>>> OSD Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting
>>>> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left
>>>> with 2 nodes running correctly, 2 hours later a second OSD node failed
>>>> complaining of readwrite errors to the physical drives, i assume this was a
>>>> heat issue as when rebooted this came back online ok and ceph started to
>>>> repair itself. We have since brought the first failed node back on by
>>>> replacing the ssd and recreating the journals hoping it would all repair..
>>>> Our pools are min 2 repl.
>>>>
>>>> The problem we have is client IO (read) is totally blocked, and when I
>>>> query the stuck PG's it just hangs..
>>>>
>>>> For example the check version command just errors with:
>>>>
>>>> Error EINTR: problem getting command descriptions from on various OSD's
>>>> so I cannot even query the inactive PG's
>>>>
>>>> root@node31-a4:~# ceph -s
>>>>     cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2
>>>>      health HEALTH_WARN
>>>>             83 pgs backfill
>>>>             2 pgs backfill_toofull
>>>>             3 pgs backfilling
>>>>             48 pgs degraded
>>>>             1 pgs down
>>>>             31 pgs incomplete
>>>>             1 pgs recovering
>>>>             29 pgs recovery_wait
>>>>             1 pgs stale
>>>>             48 pgs stuck degraded
>>>>             31 pgs stuck inactive
>>>>             1 pgs stuck stale
>>>>             148 pgs stuck unclean
>>>>             17 pgs stuck undersized
>>>>             17 pgs undersized
>>>>             599 requests are blocked > 32 sec
>>>>             recovery 111489/4697618 objects degraded (2.373%)
>>>>             recovery 772268/4697618 objects misplaced (16.440%)
>>>>             recovery 1/2171314 unfound (0.000%)
>>>>      monmap e5: 3 mons at {bc07s12-a7=
>>>> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0
>>>> }
>>>>             election epoch 198, quorum 0,1,2
>>>> bc07s12-a7,bc07s14-a7,bc07s13-a7
>>>>      osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs
>>>>       pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects
>>>>             16783 GB used, 6487 GB / 23270 GB avail
>>>>             111489/4697618 objects degraded (2.373%)
>>>>             772268/4697618 objects misplaced (16.440%)
>>>>             1/2171314 unfound (0.000%)
>>>>                 1639 active+clean
>>>>                   66 active+remapped+wait_backfill
>>>>                   30 incomplete
>>>>                   25 active+recovery_wait+degraded
>>>>                   15 active+undersized+degraded+remapped+wait_backfill
>>>>                    4 active+recovery_wait+degraded+remapped
>>>>                    4 active+clean+scrubbing
>>>>                    2 active+remapped+wait_backfill+backfill_toofull
>>>>                    1 down+incomplete
>>>>                    1 active+remapped+backfilling
>>>>                    1 active+clean+scrubbing+deep
>>>>                    1 stale+active+undersized+degraded
>>>>                    1 active+undersized+degraded+remapped+backfilling
>>>>                    1 active+degraded+remapped+backfilling
>>>>                    1 active+recovering+degraded
>>>> recovery io 29385 kB/s, 7 objects/s
>>>>   client io 5877 B/s wr, 1 op/s
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list
>>>> ceph-users@lists.ceph.com
>>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>>
>>>
_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Reply via email to