Hi, The host that is taken down has 12 disks in it?
Have a look at the down PG's '18 pgs down' - I suspect this will be what is causing the I/O freeze. Is your cursh map setup correctly to split data over different hosts? Thanks On Tue, Sep 13, 2016 at 11:45 AM, Daznis <daz...@gmail.com> wrote: > No, no errors about that. I have set noout before it happened, but it > still started recovery. I have added > nobackfill,norebalance,norecover,noscrub,nodeep-scrub once i noticed > it started doing crazy stuff. So recovery I/O stopped but the cluster > can't read any info. Only writes to cache layer. > > cluster cdca2074-4c91-4047-a607-faebcbc1ee17 > health HEALTH_WARN > 2225 pgs degraded > 18 pgs down > 18 pgs peering > 89 pgs stale > 2225 pgs stuck degraded > 18 pgs stuck inactive > 89 pgs stuck stale > 2257 pgs stuck unclean > 2225 pgs stuck undersized > 2225 pgs undersized > recovery 4180820/11837906 objects degraded (35.317%) > recovery 24016/11837906 objects misplaced (0.203%) > 12/39 in osds are down > noout,nobackfill,norebalance,norecover,noscrub,nodeep-scrub > flag(s) set > monmap e9: 7 mons at {} > election epoch 170, quorum 0,1,2,3,4,5,6 > osdmap e40290: 40 osds: 27 up, 39 in; 14 remapped pgs > flags noout,nobackfill,norebalance, > norecover,noscrub,nodeep-scrub > pgmap v39326300: 4096 pgs, 4 pools, 21455 GB data, 5780 kobjects > 42407 GB used, 75772 GB / 115 TB avail > 4180820/11837906 objects degraded (35.317%) > 24016/11837906 objects misplaced (0.203%) > 2136 active+undersized+degraded > 1837 active+clean > 89 stale+active+undersized+degraded > 18 down+peering > 14 active+remapped > 2 active+clean+scrubbing+deep > client io 0 B/s rd, 9509 kB/s wr, 3469 op/s > > On Tue, Sep 13, 2016 at 1:34 PM, M Ranga Swami Reddy > <swamire...@gmail.com> wrote: > > Please check if any osd is nearfull ERR. Can you please share the ceph -s > > o/p? > > > > Thanks > > Swami > > > > On Tue, Sep 13, 2016 at 3:54 PM, Daznis <daz...@gmail.com> wrote: > >> > >> Hello, > >> > >> > >> I have encountered a strange I/O freeze while rebooting one OSD node > >> for maintenance purpose. It was one of the 3 Nodes in the entire > >> cluster. Before this rebooting or shutting down and entire node just > >> slowed down the ceph, but not completely froze it. > >> _______________________________________________ > >> ceph-users mailing list > >> ceph-users@lists.ceph.com > >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com