Looks like it's overloaded and runs into a timeout. For a test/dev environment: try to set the nodown flag for this experiment if you just want to ignore these timeouts completely.
Paul -- Paul Emmerich Looking for help with your Ceph cluster? Contact us at https://croit.io croit GmbH Freseniusstr. 31h 81247 München www.croit.io Tel: +49 89 1896585 90 On Wed, Jun 26, 2019 at 1:26 PM zhanrzh...@teamsun.com.cn < zhanrzh...@teamsun.com.cn> wrote: > Hi,all: > I start ceph cluster on my machine with development mode,to estimate > the time of recoverying after increasing pgp_num. > all of daemon run on one machine. > CPU: Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz > memory: 377GB > OS:CentOS Linux release 7.6.1810 > ceph version:hammer > > builded ceph according to > http://docs.ceph.com/docs/hammer/dev/quick_guide/, > ceph -s shows: > cluster 15ec2f3f-86e5-46bc-bf98-4b35841ee6a5 > health HEALTH_WARN > pool rbd pg_num 512 > pgp_num 256 > monmap e1: 1 mons at {a=172.30.250.25:6789/0} > election epoch 2, quorum 0 a > osdmap e88: 30 osds: 30 up, 30 in > pgmap v829: 512 pgs, 1 pools, 57812 MB data, 14454 objects > 5691 GB used, 27791 GB / 33483 GB avail > 512 active+clean > and ceph osd tree[3] > It start to recovering after i increased pgp_num. ceph -w says there are > some osd down, but the process is runing.All configuration items of osd or > mon are default[1] > some messages that ceph -w[2] says,as below : > > 2019-06-26 15:03:21.839750 mon.0 [INF] pgmap v842: 512 pgs: 127 > active+degraded, 84 activating+degraded, 256 active+clean, 45 > active+recovering+degraded; 57812 MB data, 5714 GB used, 27769 GB / 33483 > GB avail; 22200/43362 objects degraded (51.197%); 50789 kB/s, 12 objects/s > recovering > 2019-06-26 15:03:21.840884 mon.0 [INF] osd.1 172.30.250.25:6804/22500 > failed (3 reports from 3 peers after 24.867116 >= grace 20.000000) > 2019-06-26 15:03:21.841459 mon.0 [INF] osd.9 172.30.250.25:6836/25078 > failed (3 reports from 3 peers after 24.867645 >= grace 20.000000) > 2019-06-26 15:03:21.841709 mon.0 [INF] osd.0 172.30.250.25:6800/22260 > failed (3 reports from 3 peers after 24.846423 >= grace 20.000000) > 2019-06-26 15:03:21.842286 mon.0 [INF] osd.13 172.30.250.25:6852/26651 > failed (3 reports from 3 peers after 24.846896 >= grace 20.000000) > 2019-06-26 15:03:21.842607 mon.0 [INF] osd.5 172.30.250.25:6820/23661 > failed (3 reports from 3 peers after 24.804869 >= grace 20.000000) > 2019-06-26 15:03:21.842938 mon.0 [INF] osd.10 172.30.250.25:6840/25490 > failed (3 reports from 3 peers after 24.805155 >= grace 20.000000) > 2019-06-26 15:03:21.843134 mon.0 [INF] osd.12 172.30.250.25:6848/26277 > failed (3 reports from 3 peers after 24.805329 >= grace 20.000000) > 2019-06-26 15:03:21.843591 mon.0 [INF] osd.8 172.30.250.25:6832/24722 > failed (3 reports from 3 peers after 24.805843 >= grace 20.000000) > 2019-06-26 15:03:21.849664 mon.0 [INF] osd.21 172.30.250.25:6884/29762 > failed (3 reports from 3 peers after 23.497080 >= grace 20.000000) > 2019-06-26 15:03:21.862729 mon.0 [INF] osd.14 172.30.250.25:6856/27025 > failed (3 reports from 3 peers after 23.510172 >= grace 20.000000) > 2019-06-26 15:03:21.864222 mon.0 [INF] osdmap e91: 30 osds: 29 up, 30 in > 2019-06-26 15:03:20.336758 osd.11 [WRN] map e91 wrongly marked me down > 2019-06-26 15:03:23.408659 mon.0 [INF] pgmap v843: 512 pgs: 8 > stale+activating+degraded, 8 stale+active+clean, 161 active+degraded, 2 > stale+active+recovering+degraded, 33 activating+degraded, 248 active+clean, > 45 active+recovering+degraded, 7 stale+active+degraded; 57812 MB data, 5730 > GB used, 27752 GB / 33483 GB avail; 27317/43362 objects degraded (62.998%); > 61309 kB/s, 14 objects/s recovering > 2019-06-26 15:03:27.538229 mon.0 [INF] osd.18 172.30.250.25:6872/28632 > failed (3 reports from 3 peers after 23.180489 >= grace 20.000000) > 2019-06-26 15:03:27.539416 mon.0 [INF] osd.7 172.30.250.25:6828/24366 > failed (3 reports from 3 peers after 21.900054 >= grace 20.000000) > 2019-06-26 15:03:27.541831 mon.0 [INF] osdmap e92: 30 osds: 19 up, 30 in > 2019-06-26 15:03:32.748179 mon.0 [INF] osdmap e93: 30 osds: 17 up, 30 in > 2019-06-26 15:03:33.678682 mon.0 [INF] pgmap v845: 512 pgs: 17 > stale+activating+degraded, 95 stale+active+clean, 55 active+degraded, 13 > peering, 18 stale+active+recovering+degraded, 20 activating+degraded, 155 > active+clean, 22 active+recovery_wait+degraded, 48 > active+recovering+degraded, 69 stale+active+degraded; 57812 MB data, 5734 > GB used, 27748 GB / 33483 GB avail; 26979/43362 objects degraded (62.218%); > 11510 kB/s, 2 objects/s recovering > 2019-06-26 15:03:33.775701 osd.1 [WRN] map e92 wrongly marked me down > > Has anyone got any thoughts on what might have happened, or tips on how to > dig further into this? > > [1] https://github.com/rongzhen-zhan/myfile/blob/master/osd.0.conf > [2] https://github.com/rongzhen-zhan/myfile/blob/master/ceph-watch.txt > [3] https://github.com/rongzhen-zhan/myfile/blob/master/ceph%20osd%20tree > > > > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com