I followed: $ journal_uuid=$(sudo cat /var/lib/ceph/osd/ceph-0/journal_uuid) $ sudo sgdisk --new=1:0:+20480M --change-name=1:'ceph journal' --partition-guid=1:$journal_uuid --typecode=1:45b0969e-9b03-4f30-b4c6-b4b80ceff106 --mbrtogpt -- /dev/sdk
Then $ sudo ceph-osd --mkjournal -i 20 $ sudo service ceph start osd.20 >From >https://www.sebastien-han.fr/blog/2014/11/27/ceph-recover-osds-after-ssd-journal-failure/ Which they all started without a problem. On Sun, 2 Sep 2018 at 15:43, David Turner <drakonst...@gmail.com> wrote: > It looks like osds on the first failed node are having problems. What > commands did you run to bring it back online? > > On Sun, Sep 2, 2018, 10:27 AM Lee <lqui...@gmail.com> wrote: > >> Ok I have a lot in the health detail... >> >> root@node31-a4:~# ceph health detail >> HEALTH_ERR 64 pgs backfill; 27 pgs backfill_toofull; 39 pgs backfilling; >> 26 pgs degraded; 4 pgs down; 31 pgs incomplete; 1 pgs inconsistent; 12 pgs >> recovery_wait; 1 pgs stale; 26 pgs stuck degraded; 31 pgs stuck inactive; 1 >> pgs stuck stale; 161 pgs stuck unclean; 9 pgs stuck undersized; 9 pgs >> undersized; 726 requests are blocked > 32 sec; 9 osds have slow requests; >> recovery 59636/5032695 objects degraded (1.185%); recovery 1280976/5032695 >> objects misplaced (25.453%); 1 scrub errors; noscrub,nodeep-scrub flag(s) >> set >> pg 2.2a is stuck inactive for 97629.478505, current state incomplete, >> last acting [24,5] >> pg 2.b0 is stuck inactive for 98000.688979, current state incomplete, >> last acting [24,7] >> pg 9.42 is stuck inactive for 108836.103738, current state incomplete, >> last acting [31,12] >> pg 9.de is stuck inactive since forever, current state incomplete, last >> acting [6,5] >> pg 2.75 is stuck inactive since forever, current state down+incomplete, >> last acting [7,15] >> pg 9.dc is stuck inactive for 113491.800208, current state incomplete, >> last acting [6,7] >> pg 2.74 is stuck inactive for 97658.382960, current state incomplete, >> last acting [13,5] >> pg 9.1e is stuck inactive since forever, current state incomplete, last >> acting [7,15] >> pg 2.15 is stuck inactive since forever, current state incomplete, last >> acting [7,31] >> pg 11.1c is stuck inactive since forever, current state down+incomplete, >> last acting [6,7] >> pg 2.a1 is stuck inactive for 98785.888826, current state incomplete, >> last acting [14,12] >> pg 9.d8 is stuck inactive for 115082.575098, current state >> down+incomplete, last acting [21,5] >> pg 9.a8 is stuck inactive for 118575.035210, current state incomplete, >> last acting [14,7] >> pg 9.78 is stuck inactive since forever, current state incomplete, last >> acting [5,24] >> pg 2.a2 is stuck inactive since forever, current state incomplete, last >> acting [5,13] >> pg 7.16 is stuck inactive since forever, current state incomplete, last >> acting [6,7] >> pg 2.13 is stuck inactive since forever, current state incomplete, last >> acting [7,10] >> pg 9.f5 is stuck inactive for 103009.439003, current state incomplete, >> last acting [18,5] >> pg 2.d is stuck inactive since forever, current state incomplete, last >> acting [5,10] >> pg 9.5 is stuck inactive since forever, current state incomplete, last >> acting [5,18] >> pg 9.3 is stuck inactive since forever, current state incomplete, last >> acting [7,15] >> pg 9.fc is stuck inactive for 201476.092908, current state incomplete, >> last acting [13,5] >> pg 11.33 is stuck inactive since forever, current state down+incomplete, >> last acting [7,6] >> pg 9.3f is stuck inactive since forever, current state incomplete, last >> acting [5,14] >> pg 9.a is stuck inactive for 113328.467457, current state incomplete, >> last acting [18,7] >> pg 2.63 is stuck inactive for 97665.176520, current state incomplete, >> last acting [31,7] >> pg 2.3 is stuck inactive for 97655.279670, current state incomplete, last >> acting [14,5] >> pg 2.32 is stuck inactive since forever, current state incomplete, last >> acting [5,13] >> pg 2.bf is stuck inactive for 99913.875808, current state incomplete, >> last acting [15,7] >> pg 9.26 is stuck inactive since forever, current state incomplete, last >> acting [5,24] >> pg 9.22 is stuck inactive since forever, current state incomplete, last >> acting [7,24] >> pg 9.25 is stuck unclean for 20091.777921, current state >> active+degraded+remapped+wait_backfill, last acting [15,2] >> pg 7.2b is stuck unclean for 98830.660179, current state >> stale+active+undersized+degraded, last acting [5] >> pg 11.27 is stuck unclean for 1777813.502308, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [4,36] >> pg 2.f1 is stuck unclean for 26585.481715, current state >> active+recovery_wait+degraded, last acting [13,8] >> pg 9.22 is stuck unclean since forever, current state incomplete, last >> acting [7,24] >> pg 2.29 is stuck unclean for 5629.190514, current state >> active+remapped+wait_backfill, last acting [24,40] >> pg 9.fb is stuck unclean for 3640.777545, current state >> active+remapped+wait_backfill, last acting [8,39] >> pg 9.23 is stuck unclean for 3595.306511, current state >> active+remapped+wait_backfill, last acting [35,9] >> pg 2.f3 is stuck unclean for 4993.558900, current state >> active+remapped+wait_backfill, last acting [6,9] >> pg 2.f2 is stuck unclean for 8871.835444, current state >> active+recovery_wait+degraded, last acting [6,4] >> pg 2.2a is stuck unclean for 97629.478922, current state incomplete, last >> acting [24,5] >> pg 2.ed is stuck unclean for 3595.395657, current state >> active+remapped+backfilling, last acting [9,40] >> pg 2.24 is stuck unclean for 6391.873856, current state >> active+remapped+wait_backfill, last acting [13,40] >> pg 2.27 is stuck unclean for 6814.809178, current state >> active+recovery_wait+degraded, last acting [13,3] >> pg 2.e8 is stuck unclean for 11759.373756, current state >> active+remapped+wait_backfill, last acting [15,36] >> pg 11.29 is stuck unclean for 6907.684021, current state >> active+remapped+wait_backfill, last acting [14,40] >> pg 2.eb is stuck unclean for 14474.951608, current state >> active+remapped+backfilling, last acting [0,31] >> pg 2.ea is stuck unclean for 3595.396597, current state >> active+remapped+backfilling, last acting [9,34] >> pg 12.13 is stuck unclean for 5629.177184, current state active+remapped, >> last acting [8,31] >> pg 2.1d is stuck unclean for 12245.891518, current state >> active+remapped+backfilling, last acting [3,6] >> pg 11.15 is stuck unclean for 14683.173113, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [34,9] >> pg 2.1c is stuck unclean for 14683.755228, current state >> active+degraded+remapped+backfilling, last acting [14,11] >> pg 11.16 is stuck unclean for 5629.180301, current state >> active+remapped+wait_backfill, last acting [15,40] >> pg 2.1f is stuck unclean for 11858.149360, current state >> active+remapped+wait_backfill, last acting [15,3] >> pg 0.1c is stuck unclean for 6907.683196, current state >> active+remapped+wait_backfill, last acting [12,3] >> pg 2.1e is stuck unclean for 102531.318993, current state >> active+undersized+degraded+remapped+backfilling, last acting [13] >> pg 2.e0 is stuck unclean for 3571.898995, current state >> active+remapped+inconsistent+wait_backfill, last acting [6,9] >> pg 2.18 is stuck unclean for 3502.358091, current state >> active+remapped+backfilling, last acting [18,9] >> pg 2.e3 is stuck unclean for 12047.716242, current state >> active+remapped+backfilling, last acting [4,41] >> pg 11.13 is stuck unclean for 6907.682681, current state >> active+remapped+wait_backfill, last acting [14,8] >> pg 9.d6 is stuck unclean for 7416.596559, current state >> active+remapped+wait_backfill, last acting [1,9] >> pg 9.1e is stuck unclean since forever, current state incomplete, last >> acting [7,15] >> pg 11.1c is stuck unclean since forever, current state down+incomplete, >> last acting [6,7] >> pg 2.15 is stuck unclean since forever, current state incomplete, last >> acting [7,31] >> pg 2.dc is stuck unclean for 11709.774640, current state >> active+remapped+backfilling, last acting [40,4] >> pg 2.14 is stuck unclean for 3504.589025, current state >> active+remapped+backfilling, last acting [18,9] >> pg 2.df is stuck unclean for 5047.489499, current state >> active+remapped+wait_backfill, last acting [0,13] >> pg 11.1e is stuck unclean for 1968924.322629, current state >> active+remapped+wait_backfill, last acting [3,38] >> pg 2.de is stuck unclean for 97621.617826, current state >> active+undersized+degraded+remapped+backfilling, last acting [3] >> pg 9.1d is stuck unclean for 48349.818420, current state >> active+remapped+backfill_toofull, last acting [12,36] >> pg 3.17 is stuck unclean for 5629.187939, current state active+remapped, >> last acting [5,13] >> pg 2.d8 is stuck unclean for 7418.583365, current state >> active+remapped+backfilling, last acting [21,41] >> pg 7.15 is stuck unclean for 98830.449502, current state >> active+remapped+wait_backfill, last acting [13,2] >> pg 11.19 is stuck unclean for 3925.828027, current state >> active+remapped+wait_backfill, last acting [15,38] >> pg 2.db is stuck unclean for 3595.396853, current state >> active+remapped+backfilling, last acting [9,40] >> pg 9.18 is stuck unclean for 27500.110917, current state >> active+remapped+backfill_toofull, last acting [18,13] >> pg 7.16 is stuck unclean since forever, current state incomplete, last >> acting [6,7] >> pg 2.13 is stuck unclean since forever, current state incomplete, last >> acting [7,10] >> pg 9.de is stuck unclean since forever, current state incomplete, last >> acting [6,5] >> pg 9.6 is stuck unclean for 219342.087677, current state >> active+remapped+backfill_toofull, last acting [2,41] >> pg 2.d is stuck unclean since forever, current state incomplete, last >> acting [5,10] >> pg 9.df is stuck unclean for 48360.843924, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [35,2] >> pg 8.6 is stuck unclean for 5629.183555, current state active+remapped, >> last acting [12,13] >> pg 2.d7 is stuck unclean for 83782.680541, current state >> active+undersized+degraded+remapped+backfilling, last acting [36] >> pg 9.dc is stuck unclean for 113491.800754, current state incomplete, >> last acting [6,7] >> pg 7.a is stuck unclean for 3844.286529, current state >> active+remapped+wait_backfill, last acting [38,2] >> pg 9.5 is stuck unclean since forever, current state incomplete, last >> acting [5,18] >> pg 4.8 is stuck unclean for 3893.186289, current state >> active+recovery_wait+degraded, last acting [15,2] >> pg 3.d0 is stuck unclean for 7418.584435, current state >> active+remapped+wait_backfill, last acting [12,2] >> pg 2.d1 is stuck unclean for 83769.259615, current state >> active+undersized+degraded+remapped+backfill_toofull, last acting [36] >> pg 9.3 is stuck unclean since forever, current state incomplete, last >> acting [7,15] >> pg 9.d8 is stuck unclean for 115082.575647, current state >> down+incomplete, last acting [21,5] >> pg 2.b is stuck unclean for 7418.564413, current state >> active+remapped+backfilling, last acting [40,24] >> pg 9.d9 is stuck unclean for 14681.601684, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [39,4] >> pg 9.1 is stuck unclean for 3930.973909, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [39,3] >> pg 2.cc is stuck unclean for 5078.643356, current state active+remapped, >> last acting [40,24] >> pg 11.d is stuck unclean for 14592.297817, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [36,4] >> pg 9.c5 is stuck unclean for 3844.281162, current state >> active+remapped+wait_backfill, last acting [5,38] >> pg 9.a is stuck unclean for 113328.467988, current state incomplete, last >> acting [18,7] >> pg 11.9 is stuck unclean for 7418.578072, current state >> active+remapped+wait_backfill, last acting [21,39] >> pg 2.0 is stuck unclean for 97873.488751, current state >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last >> acting [1] >> pg 2.cb is stuck unclean for 25031.035830, current state >> active+degraded+remapped+wait_backfill+backfill_toofull, last acting [1,4] >> pg 9.8 is stuck unclean for 24341.317696, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [5,24] >> pg 2.3 is stuck unclean for 97655.280232, current state incomplete, last >> acting [14,5] >> pg 2.2 is stuck unclean for 97734.492834, current state >> active+recovery_wait+degraded+remapped, last acting [13,9] >> pg 2.c4 is stuck unclean for 3595.525931, current state >> active+remapped+backfilling, last acting [34,9] >> pg 2.c7 is stuck unclean for 8871.729496, current state >> active+recovery_wait+degraded, last acting [13,2] >> pg 9.cb is stuck unclean for 5629.175300, current state active+remapped, >> last acting [11,31] >> pg 9.c9 is stuck unclean for 14683.752701, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [5,34] >> pg 2.c2 is stuck unclean for 3504.738005, current state >> active+remapped+wait_backfill, last acting [9,15] >> pg 2.bd is stuck unclean for 3571.325492, current state >> active+remapped+backfilling, last acting [39,9] >> pg 2.bf is stuck unclean for 99913.876400, current state incomplete, >> last acting [15,7] >> pg 9.b3 is stuck unclean for 3925.828356, current state >> active+remapped+wait_backfill, last acting [15,35] >> pg 2.b5 is stuck unclean for 28026.340079, current state active+remapped, >> last acting [2,40] >> pg 2.b6 is stuck unclean for 11859.834286, current state >> active+remapped+backfilling, last acting [1,31] >> pg 2.b0 is stuck unclean for 98000.689674, current state incomplete, last >> acting [24,7] >> pg 2.b3 is stuck unclean for 5629.182841, current state >> active+remapped+backfilling, last acting [3,0] >> pg 2.ad is stuck unclean for 6907.677050, current state >> active+remapped+backfilling, last acting [2,39] >> pg 2.ae is stuck unclean for 11862.967346, current state >> active+remapped+backfilling, last acting [34,13] >> pg 9.a0 is stuck unclean for 14683.746136, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [1,3] >> pg 2.aa is stuck unclean for 3571.307756, current state >> active+remapped+backfilling, last acting [40,9] >> pg 2.a7 is stuck unclean for 25030.658836, current state >> active+remapped+wait_backfill, last acting [2,1] >> pg 2.a6 is stuck unclean for 3930.913873, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [2,35] >> pg 9.ad is stuck unclean for 8871.819919, current state >> active+recovery_wait+degraded, last acting [6,8] >> pg 2.a1 is stuck unclean for 98785.889529, current state incomplete, last >> acting [14,12] >> pg 1.a0 is stuck unclean for 5629.186426, current state active+remapped, >> last acting [5,40] >> pg 9.a8 is stuck unclean for 118575.035913, current state incomplete, >> last acting [14,7] >> pg 2.a2 is stuck unclean since forever, current state incomplete, last >> acting [5,13] >> pg 2.9d is stuck unclean for 11861.496234, current state >> active+remapped+backfilling, last acting [6,38] >> pg 2.9c is stuck unclean for 3506.888979, current state >> active+remapped+wait_backfill, last acting [35,11] >> pg 2.9b is stuck unclean for 5629.183979, current state >> active+remapped+wait_backfill, last acting [6,0] >> pg 9.91 is stuck unclean for 85752.028652, current state >> active+remapped+wait_backfill, last acting [31,9] >> pg 2.97 is stuck unclean for 9736.783735, current state >> active+remapped+backfilling, last acting [35,24] >> pg 2.91 is stuck unclean for 28553.979772, current state >> active+remapped+backfilling, last acting [0,24] >> pg 2.90 is stuck unclean for 30364.623932, current state >> active+degraded+remapped+backfill_toofull, last acting [41,24] >> pg 2.92 is stuck unclean for 25031.211566, current state >> active+undersized+degraded+remapped+backfilling, last acting [8] >> pg 9.99 is stuck unclean for 11862.827419, current state >> active+remapped+wait_backfill, last acting [13,4] >> pg 2.8f is stuck unclean for 17426.148382, current state >> active+remapped+wait_backfill, last acting [15,9] >> pg 2.88 is stuck unclean for 3591.054564, current state >> active+remapped+wait_backfill, last acting [14,9] >> pg 9.8f is stuck unclean for 3595.395794, current state >> active+remapped+wait_backfill, last acting [9,15] >> pg 2.87 is stuck unclean for 3844.271547, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [1,2] >> pg 2.81 is stuck unclean for 83759.347793, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 9.8a is stuck unclean for 27697.026446, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [12,1] >> pg 2.79 is stuck unclean for 12137.676488, current state >> active+remapped+backfilling, last acting [7,40] >> pg 2.78 is stuck unclean for 29127.120125, current state >> active+remapped+backfilling, last acting [0,6] >> pg 2.75 is stuck unclean since forever, current state down+incomplete, >> last acting [7,15] >> pg 2.74 is stuck unclean for 97658.383751, current state incomplete, last >> acting [13,5] >> pg 9.7c is stuck unclean for 114170.469704, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 9.7d is stuck unclean for 14077.123326, current state >> active+remapped+backfilling, last acting [5,24] >> pg 2.71 is stuck unclean for 11859.344208, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [21,3] >> pg 2.73 is stuck unclean for 11859.417605, current state >> active+remapped+backfilling, last acting [39,15] >> pg 9.78 is stuck unclean since forever, current state incomplete, last >> acting [5,24] >> pg 9.79 is stuck unclean for 14595.569162, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [39,3] >> pg 2.6d is stuck unclean for 27802.265038, current state >> active+remapped+backfilling, last acting [4,13] >> pg 9.62 is stuck unclean for 25030.488507, current state >> active+remapped+backfill_toofull, last acting [36,2] >> pg 2.6a is stuck unclean for 20323.517565, current state >> active+remapped+wait_backfill, last acting [6,40] >> pg 9.6c is stuck unclean for 14234.077824, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [41,2] >> pg 9.6a is stuck unclean for 27035.043476, current state >> active+remapped+backfill_toofull, last acting [36,4] >> pg 2.63 is stuck unclean for 97665.177288, current state incomplete, last >> acting [31,7] >> pg 2.5d is stuck unclean for 3549.763078, current state >> active+remapped+wait_backfill, last acting [9,34] >> pg 2.5e is stuck unclean for 97736.064280, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [35,36] >> pg 2.52 is stuck unclean for 8871.832670, current state >> active+recovery_wait+degraded, last acting [6,4] >> pg 9.59 is stuck unclean for 26868.986032, current state >> active+remapped+wait_backfill, last acting [31,34] >> pg 2.4f is stuck unclean for 12108.325792, current state >> active+remapped+backfilling, last acting [11,40] >> pg 2.49 is stuck unclean for 30446.302835, current state >> active+remapped+wait_backfill, last acting [9,24] >> pg 9.42 is stuck unclean for 108836.104626, current state incomplete, >> last acting [31,12] >> pg 2.45 is stuck unclean for 11284.580305, current state >> active+degraded+remapped+backfilling, last acting [24,2] >> pg 9.4f is stuck unclean for 3893.672356, current state >> active+remapped+wait_backfill, last acting [0,21] >> pg 2.44 is stuck unclean for 27623.439527, current state >> active+recovery_wait+degraded+remapped, last acting [6,11] >> pg 9.4c is stuck unclean for 6907.681859, current state >> active+remapped+wait_backfill, last acting [15,36] >> pg 2.46 is stuck unclean for 6907.682263, current state >> active+remapped+backfilling, last acting [11,24] >> pg 9.49 is stuck unclean for 14683.624639, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [2,31] >> pg 11.35 is stuck unclean for 5872394.444913, current state >> active+remapped+wait_backfill, last acting [40,36] >> pg 2.3e is stuck unclean for 6907.683506, current state >> active+remapped+backfilling, last acting [4,41] >> pg 2.38 is stuck unclean for 5140.320861, current state >> active+remapped+wait_backfill, last acting [0,5] >> pg 2.3b is stuck unclean for 14456.624593, current state >> active+remapped+wait_backfill+backfill_toofull, last acting [18,2] >> pg 11.33 is stuck unclean since forever, current state down+incomplete, >> last acting [7,6] >> pg 10.3d is stuck unclean for 3595.395921, current state >> active+remapped+wait_backfill, last acting [9,36] >> pg 2.35 is stuck unclean for 8872.226171, current state >> active+recovery_wait+degraded, last acting [6,11] >> pg 2.fc is stuck unclean for 5820.330202, current state >> active+remapped+backfilling, last acting [31,0] >> pg 9.3f is stuck unclean since forever, current state incomplete, last >> acting [5,14] >> pg 2.ff is stuck unclean for 3595.396088, current state >> active+remapped+backfilling, last acting [9,39] >> pg 2.fe is stuck unclean for 6904.439076, current state >> active+remapped+backfilling, last acting [21,0] >> pg 9.f5 is stuck unclean for 103009.439909, current state incomplete, >> last acting [18,5] >> pg 7.34 is stuck unclean for 3886.510000, current state >> active+remapped+wait_backfill, last acting [13,39] >> pg 2.fb is stuck unclean for 57173.985429, current state >> active+recovery_wait+degraded+remapped, last acting [6,8] >> pg 2.32 is stuck unclean since forever, current state incomplete, last >> acting [5,13] >> pg 9.fe is stuck unclean for 7418.564930, current state >> active+recovery_wait+degraded+remapped, last acting [6,3] >> pg 9.26 is stuck unclean since forever, current state incomplete, last >> acting [5,24] >> pg 2.f7 is stuck unclean for 6915.532617, current state >> active+remapped+backfilling, last acting [4,15] >> pg 9.fc is stuck unclean for 201476.093824, current state incomplete, >> last acting [13,5] >> pg 7.2b is stuck undersized for 64282.169836, current state >> stale+active+undersized+degraded, last acting [5] >> pg 2.1e is stuck undersized for 3895.207475, current state >> active+undersized+degraded+remapped+backfilling, last acting [13] >> pg 2.de is stuck undersized for 3886.529396, current state >> active+undersized+degraded+remapped+backfilling, last acting [3] >> pg 2.d7 is stuck undersized for 7417.316099, current state >> active+undersized+degraded+remapped+backfilling, last acting [36] >> pg 2.d1 is stuck undersized for 6903.297196, current state >> active+undersized+degraded+remapped+backfill_toofull, last acting [36] >> pg 2.0 is stuck undersized for 4999.401505, current state >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last >> acting [1] >> pg 2.92 is stuck undersized for 4999.406547, current state >> active+undersized+degraded+remapped+backfilling, last acting [8] >> pg 2.81 is stuck undersized for 7417.378668, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 9.7c is stuck undersized for 3894.953894, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 9.25 is stuck degraded for 7413.083043, current state >> active+degraded+remapped+wait_backfill, last acting [15,2] >> pg 7.2b is stuck degraded for 64282.169913, current state >> stale+active+undersized+degraded, last acting [5] >> pg 2.f1 is stuck degraded for 3848.032008, current state >> active+recovery_wait+degraded, last acting [13,8] >> pg 2.f2 is stuck degraded for 7411.108195, current state >> active+recovery_wait+degraded, last acting [6,4] >> pg 2.27 is stuck degraded for 3893.230317, current state >> active+recovery_wait+degraded, last acting [13,3] >> pg 2.1c is stuck degraded for 7414.316299, current state >> active+degraded+remapped+backfilling, last acting [14,11] >> pg 2.1e is stuck degraded for 3895.207564, current state >> active+undersized+degraded+remapped+backfilling, last acting [13] >> pg 2.de is stuck degraded for 3886.529484, current state >> active+undersized+degraded+remapped+backfilling, last acting [3] >> pg 2.d7 is stuck degraded for 7417.316187, current state >> active+undersized+degraded+remapped+backfilling, last acting [36] >> pg 4.8 is stuck degraded for 3490.406821, current state >> active+recovery_wait+degraded, last acting [15,2] >> pg 2.d1 is stuck degraded for 6903.297288, current state >> active+undersized+degraded+remapped+backfill_toofull, last acting [36] >> pg 2.0 is stuck degraded for 4999.401597, current state >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, last >> acting [1] >> pg 2.cb is stuck degraded for 7413.316930, current state >> active+degraded+remapped+wait_backfill+backfill_toofull, last acting [1,4] >> pg 2.2 is stuck degraded for 3894.930841, current state >> active+recovery_wait+degraded+remapped, last acting [13,9] >> pg 2.c7 is stuck degraded for 3886.500328, current state >> active+recovery_wait+degraded, last acting [13,2] >> pg 9.ad is stuck degraded for 7411.181412, current state >> active+recovery_wait+degraded, last acting [6,8] >> pg 2.90 is stuck degraded for 3893.715235, current state >> active+degraded+remapped+backfill_toofull, last acting [41,24] >> pg 2.92 is stuck degraded for 4999.406655, current state >> active+undersized+degraded+remapped+backfilling, last acting [8] >> pg 2.81 is stuck degraded for 7417.378776, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 9.7c is stuck degraded for 3894.954001, current state >> active+undersized+degraded+remapped+wait_backfill, last acting [39] >> pg 2.52 is stuck degraded for 7411.108431, current state >> active+recovery_wait+degraded, last acting [6,4] >> pg 2.45 is stuck degraded for 3892.755878, current state >> active+degraded+remapped+backfilling, last acting [24,2] >> pg 2.44 is stuck degraded for 7411.213966, current state >> active+recovery_wait+degraded+remapped, last acting [6,11] >> pg 2.35 is stuck degraded for 7411.295348, current state >> active+recovery_wait+degraded, last acting [6,11] >> pg 2.fb is stuck degraded for 6903.301076, current state >> active+recovery_wait+degraded+remapped, last acting [6,8] >> pg 9.fe is stuck degraded for 7413.453955, current state >> active+recovery_wait+degraded+remapped, last acting [6,3] >> pg 7.2b is stuck stale for 64232.262041, current state >> stale+active+undersized+degraded, last acting [5] >> pg 2.fc is active+remapped+backfilling, acting [31,0] >> pg 2.ff is active+remapped+backfilling, acting [9,39] >> pg 9.f5 is incomplete, acting [18,5] >> pg 2.fe is active+remapped+backfilling, acting [21,0] >> pg 2.fb is active+recovery_wait+degraded+remapped, acting [6,8] >> pg 9.fe is active+recovery_wait+degraded+remapped, acting [6,3] >> pg 9.fc is incomplete, acting [13,5] >> pg 2.f7 is active+remapped+backfilling, acting [4,15] >> pg 2.f1 is active+recovery_wait+degraded, acting [13,8] >> pg 9.fb is active+remapped+wait_backfill, acting [8,39] >> pg 2.f3 is active+remapped+wait_backfill, acting [6,9] >> pg 2.f2 is active+recovery_wait+degraded, acting [6,4] >> pg 2.ed is active+remapped+backfilling, acting [9,40] >> pg 2.e8 is active+remapped+wait_backfill, acting [15,36] >> pg 2.eb is active+remapped+backfilling, acting [0,31] >> pg 2.ea is active+remapped+backfilling, acting [9,34] >> pg 2.e0 is active+remapped+inconsistent+wait_backfill, acting [6,9] >> pg 2.e3 is active+remapped+backfilling, acting [4,41] >> pg 9.d6 is active+remapped+wait_backfill, acting [1,9] >> pg 2.dc is active+remapped+backfilling, acting [40,4] >> pg 2.df is active+remapped+wait_backfill, acting [0,13] >> pg 2.de is active+undersized+degraded+remapped+backfilling, acting [3] >> pg 2.d8 is active+remapped+backfilling, acting [21,41] >> pg 2.db is active+remapped+backfilling, acting [9,40] >> pg 9.de is incomplete, acting [6,5] >> pg 9.df is active+remapped+wait_backfill+backfill_toofull, acting [35,2] >> pg 9.dc is incomplete, acting [6,7] >> pg 2.d7 is active+undersized+degraded+remapped+backfilling, acting [36] >> pg 2.d1 is active+undersized+degraded+remapped+backfill_toofull, acting >> [36] >> pg 3.d0 is active+remapped+wait_backfill, acting [12,2] >> pg 9.d8 is down+incomplete, acting [21,5] >> pg 9.d9 is active+remapped+wait_backfill+backfill_toofull, acting [39,4] >> pg 9.c5 is active+remapped+wait_backfill, acting [5,38] >> pg 2.cb is active+degraded+remapped+wait_backfill+backfill_toofull, >> acting [1,4] >> pg 2.c4 is active+remapped+backfilling, acting [34,9] >> pg 2.c7 is active+recovery_wait+degraded, acting [13,2] >> pg 2.c2 is active+remapped+wait_backfill, acting [9,15] >> pg 9.c9 is active+remapped+wait_backfill+backfill_toofull, acting [5,34] >> pg 2.bd is active+remapped+backfilling, acting [39,9] >> pg 2.bf is incomplete, acting [15,7] >> pg 9.b3 is active+remapped+wait_backfill, acting [15,35] >> pg 2.b6 is active+remapped+backfilling, acting [1,31] >> pg 2.b0 is incomplete, acting [24,7] >> pg 2.b3 is active+remapped+backfilling, acting [3,0] >> pg 2.ad is active+remapped+backfilling, acting [2,39] >> pg 2.ae is active+remapped+backfilling, acting [34,13] >> pg 9.a0 is active+remapped+wait_backfill+backfill_toofull, acting [1,3] >> pg 2.aa is active+remapped+backfilling, acting [40,9] >> pg 2.a7 is active+remapped+wait_backfill, acting [2,1] >> pg 2.a6 is active+remapped+wait_backfill+backfill_toofull, acting [2,35] >> pg 9.ad is active+recovery_wait+degraded, acting [6,8] >> pg 2.a1 is incomplete, acting [14,12] >> pg 9.a8 is incomplete, acting [14,7] >> pg 2.a2 is incomplete, acting [5,13] >> pg 2.9d is active+remapped+backfilling, acting [6,38] >> pg 2.9c is active+remapped+wait_backfill, acting [35,11] >> pg 2.9b is active+remapped+wait_backfill, acting [6,0] >> pg 9.91 is active+remapped+wait_backfill, acting [31,9] >> pg 2.97 is active+remapped+backfilling, acting [35,24] >> pg 2.91 is active+remapped+backfilling, acting [0,24] >> pg 2.90 is active+degraded+remapped+backfill_toofull, acting [41,24] >> pg 2.92 is active+undersized+degraded+remapped+backfilling, acting [8] >> pg 9.99 is active+remapped+wait_backfill, acting [13,4] >> pg 2.8f is active+remapped+wait_backfill, acting [15,9] >> pg 2.88 is active+remapped+wait_backfill, acting [14,9] >> pg 9.8f is active+remapped+wait_backfill, acting [9,15] >> pg 2.87 is active+remapped+wait_backfill+backfill_toofull, acting [1,2] >> pg 2.81 is active+undersized+degraded+remapped+wait_backfill, acting [39] >> pg 9.8a is active+remapped+wait_backfill+backfill_toofull, acting [12,1] >> pg 2.79 is active+remapped+backfilling, acting [7,40] >> pg 2.78 is active+remapped+backfilling, acting [0,6] >> pg 2.75 is down+incomplete, acting [7,15] >> pg 2.74 is incomplete, acting [13,5] >> pg 9.7c is active+undersized+degraded+remapped+wait_backfill, acting [39] >> pg 9.7d is active+remapped+backfilling, acting [5,24] >> pg 2.71 is active+remapped+wait_backfill+backfill_toofull, acting [21,3] >> pg 2.73 is active+remapped+backfilling, acting [39,15] >> pg 9.78 is incomplete, acting [5,24] >> pg 9.79 is active+remapped+wait_backfill+backfill_toofull, acting [39,3] >> pg 2.6d is active+remapped+backfilling, acting [4,13] >> pg 9.62 is active+remapped+backfill_toofull, acting [36,2] >> pg 2.6a is active+remapped+wait_backfill, acting [6,40] >> pg 9.6c is active+remapped+wait_backfill+backfill_toofull, acting [41,2] >> pg 9.6a is active+remapped+backfill_toofull, acting [36,4] >> pg 2.63 is incomplete, acting [31,7] >> pg 2.5d is active+remapped+wait_backfill, acting [9,34] >> pg 2.5e is active+remapped+wait_backfill+backfill_toofull, acting [35,36] >> pg 2.52 is active+recovery_wait+degraded, acting [6,4] >> pg 9.59 is active+remapped+wait_backfill, acting [31,34] >> pg 2.4f is active+remapped+backfilling, acting [11,40] >> pg 2.49 is active+remapped+wait_backfill, acting [9,24] >> pg 9.42 is incomplete, acting [31,12] >> pg 2.45 is active+degraded+remapped+backfilling, acting [24,2] >> pg 2.44 is active+recovery_wait+degraded+remapped, acting [6,11] >> pg 9.4f is active+remapped+wait_backfill, acting [0,21] >> pg 9.4c is active+remapped+wait_backfill, acting [15,36] >> pg 2.46 is active+remapped+backfilling, acting [11,24] >> pg 9.49 is active+remapped+wait_backfill+backfill_toofull, acting [2,31] >> pg 11.35 is active+remapped+wait_backfill, acting [40,36] >> pg 2.3e is active+remapped+backfilling, acting [4,41] >> pg 2.38 is active+remapped+wait_backfill, acting [0,5] >> pg 2.3b is active+remapped+wait_backfill+backfill_toofull, acting [18,2] >> pg 11.33 is down+incomplete, acting [7,6] >> pg 2.35 is active+recovery_wait+degraded, acting [6,11] >> pg 10.3d is active+remapped+wait_backfill, acting [9,36] >> pg 9.3f is incomplete, acting [5,14] >> pg 7.34 is active+remapped+wait_backfill, acting [13,39] >> pg 2.32 is incomplete, acting [5,13] >> pg 9.26 is incomplete, acting [5,24] >> pg 11.27 is active+remapped+wait_backfill+backfill_toofull, acting [4,36] >> pg 9.25 is active+degraded+remapped+wait_backfill, acting [15,2] >> pg 2.29 is active+remapped+wait_backfill, acting [24,40] >> pg 9.22 is incomplete, acting [7,24] >> pg 9.23 is active+remapped+wait_backfill, acting [35,9] >> pg 2.2a is incomplete, acting [24,5] >> pg 2.24 is active+remapped+wait_backfill, acting [13,40] >> pg 2.27 is active+recovery_wait+degraded, acting [13,3] >> pg 11.29 is active+remapped+wait_backfill, acting [14,40] >> pg 2.1d is active+remapped+backfilling, acting [3,6] >> pg 2.1c is active+degraded+remapped+backfilling, acting [14,11] >> pg 11.15 is active+remapped+wait_backfill+backfill_toofull, acting [34,9] >> pg 2.1f is active+remapped+wait_backfill, acting [15,3] >> pg 11.16 is active+remapped+wait_backfill, acting [15,40] >> pg 2.1e is active+undersized+degraded+remapped+backfilling, acting [13] >> pg 0.1c is active+remapped+wait_backfill, acting [12,3] >> pg 2.18 is active+remapped+backfilling, acting [18,9] >> pg 11.13 is active+remapped+wait_backfill, acting [14,8] >> pg 2.15 is incomplete, acting [7,31] >> pg 11.1c is down+incomplete, acting [6,7] >> pg 9.1e is incomplete, acting [7,15] >> pg 2.14 is active+remapped+backfilling, acting [18,9] >> pg 11.1e is active+remapped+wait_backfill, acting [3,38] >> pg 9.1d is active+remapped+backfill_toofull, acting [12,36] >> pg 11.19 is active+remapped+wait_backfill, acting [15,38] >> pg 7.15 is active+remapped+wait_backfill, acting [13,2] >> pg 2.13 is incomplete, acting [7,10] >> pg 7.16 is incomplete, acting [6,7] >> pg 9.18 is active+remapped+backfill_toofull, acting [18,13] >> pg 2.d is incomplete, acting [5,10] >> pg 9.6 is active+remapped+backfill_toofull, acting [2,41] >> pg 7.a is active+remapped+wait_backfill, acting [38,2] >> pg 4.8 is active+recovery_wait+degraded, acting [15,2] >> pg 9.5 is incomplete, acting [5,18] >> pg 9.3 is incomplete, acting [7,15] >> pg 2.b is active+remapped+backfilling, acting [40,24] >> pg 9.1 is active+remapped+wait_backfill+backfill_toofull, acting [39,3] >> pg 11.d is active+remapped+wait_backfill+backfill_toofull, acting [36,4] >> pg 9.a is incomplete, acting [18,7] >> pg 2.0 is >> active+undersized+degraded+remapped+wait_backfill+backfill_toofull, acting >> [1] >> pg 11.9 is active+remapped+wait_backfill, acting [21,39] >> pg 2.3 is incomplete, acting [14,5] >> pg 9.8 is active+remapped+wait_backfill+backfill_toofull, acting [5,24] >> pg 2.2 is active+recovery_wait+degraded+remapped, acting [13,9] >> 33 ops are blocked > 16777.2 sec >> 368 ops are blocked > 8388.61 sec >> 238 ops are blocked > 4194.3 sec >> 87 ops are blocked > 1048.58 sec >> 2 ops are blocked > 8388.61 sec on osd.5 >> 98 ops are blocked > 4194.3 sec on osd.5 >> 98 ops are blocked > 8388.61 sec on osd.6 >> 1 ops are blocked > 8388.61 sec on osd.7 >> 27 ops are blocked > 4194.3 sec on osd.7 >> 12 ops are blocked > 4194.3 sec on osd.13 >> 87 ops are blocked > 1048.58 sec on osd.13 >> 2 ops are blocked > 16777.2 sec on osd.14 >> 98 ops are blocked > 8388.61 sec on osd.14 >> 3 ops are blocked > 16777.2 sec on osd.15 >> 97 ops are blocked > 8388.61 sec on osd.15 >> 1 ops are blocked > 4194.3 sec on osd.18 >> 100 ops are blocked > 4194.3 sec on osd.24 >> 28 ops are blocked > 16777.2 sec on osd.31 >> 72 ops are blocked > 8388.61 sec on osd.31 >> 9 osds have slow requests >> recovery 59636/5032695 objects degraded (1.185%) >> recovery 1280976/5032695 objects misplaced (25.453%) >> 1 scrub errors >> noscrub,nodeep-scrub flag(s) set >> >> >> On the first failed host is 6, 13, 14, 15, 18, 24, 31 >> >> On the second host that went down was 5 and 7 >> >> >> >> On Sun, 2 Sep 2018 at 15:15, David Turner <drakonst...@gmail.com> wrote: >> >>> When the first node went offline with a dead SSD journal, all of the >>> dates on the OSDs was useless. Unless you could flush the journals, you >>> can't guarantee that a wire the cluster think happened actually made it to >>> the disk. The proper procedure here is to remove those OSDs and add them >>> again as new OSDs. >>> >>> `ceph health detail` will give you some more information on the blocked >>> requests. Depending on what that shows you can often find the OSD that is >>> causing the problems. But your biggest problem is that you have dishes >>> with potentially inconsistent data in your closer. >>> >>> On Sun, Sep 2, 2018, 4:42 AM Lee <lqui...@gmail.com> wrote: >>> >>>> Running 0.94.5 as part of a Openstack enviroment, our ceph setup is 3x >>>> OSD Nodes 3x MON Nodes, yesterday we had a aircon outage in our hosting >>>> enviroment, 1 OSD node failed (offline with a the journal SSD dead) left >>>> with 2 nodes running correctly, 2 hours later a second OSD node failed >>>> complaining of readwrite errors to the physical drives, i assume this was a >>>> heat issue as when rebooted this came back online ok and ceph started to >>>> repair itself. We have since brought the first failed node back on by >>>> replacing the ssd and recreating the journals hoping it would all repair.. >>>> Our pools are min 2 repl. >>>> >>>> The problem we have is client IO (read) is totally blocked, and when I >>>> query the stuck PG's it just hangs.. >>>> >>>> For example the check version command just errors with: >>>> >>>> Error EINTR: problem getting command descriptions from on various OSD's >>>> so I cannot even query the inactive PG's >>>> >>>> root@node31-a4:~# ceph -s >>>> cluster 7c24e1b9-24b3-4a1b-8889-9b2d7fd88cd2 >>>> health HEALTH_WARN >>>> 83 pgs backfill >>>> 2 pgs backfill_toofull >>>> 3 pgs backfilling >>>> 48 pgs degraded >>>> 1 pgs down >>>> 31 pgs incomplete >>>> 1 pgs recovering >>>> 29 pgs recovery_wait >>>> 1 pgs stale >>>> 48 pgs stuck degraded >>>> 31 pgs stuck inactive >>>> 1 pgs stuck stale >>>> 148 pgs stuck unclean >>>> 17 pgs stuck undersized >>>> 17 pgs undersized >>>> 599 requests are blocked > 32 sec >>>> recovery 111489/4697618 objects degraded (2.373%) >>>> recovery 772268/4697618 objects misplaced (16.440%) >>>> recovery 1/2171314 unfound (0.000%) >>>> monmap e5: 3 mons at {bc07s12-a7= >>>> 172.27.16.11:6789/0,bc07s13-a7=172.27.16.21:6789/0,bc07s14-a7=172.27.16.15:6789/0 >>>> } >>>> election epoch 198, quorum 0,1,2 >>>> bc07s12-a7,bc07s14-a7,bc07s13-a7 >>>> osdmap e18727: 25 osds: 25 up, 25 in; 90 remapped pgs >>>> pgmap v70996322: 1792 pgs, 13 pools, 8210 GB data, 2120 kobjects >>>> 16783 GB used, 6487 GB / 23270 GB avail >>>> 111489/4697618 objects degraded (2.373%) >>>> 772268/4697618 objects misplaced (16.440%) >>>> 1/2171314 unfound (0.000%) >>>> 1639 active+clean >>>> 66 active+remapped+wait_backfill >>>> 30 incomplete >>>> 25 active+recovery_wait+degraded >>>> 15 active+undersized+degraded+remapped+wait_backfill >>>> 4 active+recovery_wait+degraded+remapped >>>> 4 active+clean+scrubbing >>>> 2 active+remapped+wait_backfill+backfill_toofull >>>> 1 down+incomplete >>>> 1 active+remapped+backfilling >>>> 1 active+clean+scrubbing+deep >>>> 1 stale+active+undersized+degraded >>>> 1 active+undersized+degraded+remapped+backfilling >>>> 1 active+degraded+remapped+backfilling >>>> 1 active+recovering+degraded >>>> recovery io 29385 kB/s, 7 objects/s >>>> client io 5877 B/s wr, 1 op/s >>>> >>>> _______________________________________________ >>>> ceph-users mailing list >>>> ceph-users@lists.ceph.com >>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >>>> >>>
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com