It seems that you only have two host in your crush map. But the default ruleset would separate the object by host. If you set size 3 for pools, then there would be one object can't build because you only have two hosts.
2016-03-23 20:17 GMT+08:00 Zhang Qiang <dotslash...@gmail.com>: > And here's the osd tree if it matters. > > ID WEIGHT TYPE NAME UP/DOWN REWEIGHT PRIMARY-AFFINITY > -1 22.39984 root default > -2 21.39984 host 10 > 0 1.06999 osd.0 up 1.00000 1.00000 > 1 1.06999 osd.1 up 1.00000 1.00000 > 2 1.06999 osd.2 up 1.00000 1.00000 > 3 1.06999 osd.3 up 1.00000 1.00000 > 4 1.06999 osd.4 up 1.00000 1.00000 > 5 1.06999 osd.5 up 1.00000 1.00000 > 6 1.06999 osd.6 up 1.00000 1.00000 > 7 1.06999 osd.7 up 1.00000 1.00000 > 8 1.06999 osd.8 up 1.00000 1.00000 > 9 1.06999 osd.9 up 1.00000 1.00000 > 10 1.06999 osd.10 up 1.00000 1.00000 > 11 1.06999 osd.11 up 1.00000 1.00000 > 12 1.06999 osd.12 up 1.00000 1.00000 > 13 1.06999 osd.13 up 1.00000 1.00000 > 14 1.06999 osd.14 up 1.00000 1.00000 > 15 1.06999 osd.15 up 1.00000 1.00000 > 16 1.06999 osd.16 up 1.00000 1.00000 > 17 1.06999 osd.17 up 1.00000 1.00000 > 18 1.06999 osd.18 up 1.00000 1.00000 > 19 1.06999 osd.19 up 1.00000 1.00000 > -3 1.00000 host 148_96 > 0 1.00000 osd.0 up 1.00000 1.00000 > > On Wed, 23 Mar 2016 at 19:10 Zhang Qiang <dotslash...@gmail.com> wrote: > >> Oliver, Goncalo, >> >> Sorry to disturb again, but recreating the pool with a smaller pg_num >> didn't seem to work, now all 666 pgs are degraded + undersized. >> >> New status: >> cluster d2a69513-ad8e-4b25-8f10-69c4041d624d >> health HEALTH_WARN >> 666 pgs degraded >> 82 pgs stuck unclean >> 666 pgs undersized >> monmap e5: 5 mons at {1= >> 10.3.138.37:6789/0,2=10.3.138.39:6789/0,3=10.3.138.40:6789/0,4=10.3.138.59:6789/0,GGZ-YG-S0311-PLATFORM-138=10.3.138.36:6789/0 >> } >> election epoch 28, quorum 0,1,2,3,4 >> GGZ-YG-S0311-PLATFORM-138,1,2,3,4 >> osdmap e705: 20 osds: 20 up, 20 in >> pgmap v1961: 666 pgs, 1 pools, 0 bytes data, 0 objects >> 13223 MB used, 20861 GB / 21991 GB avail >> 666 active+undersized+degraded >> >> Only one pool and its size is 3. So I think according to the algorithm, >> (20 * 100) / 3 = 666 pgs is reasonable. >> >> I updated health detail and also attached a pg query result on gist( >> https://gist.github.com/dotSlashLu/22623b4cefa06a46e0d4). >> >> On Wed, 23 Mar 2016 at 09:01 Dotslash Lu <dotslash...@gmail.com> wrote: >> >>> Hello Gonçalo, >>> >>> Thanks for your reminding. I was just setting up the cluster for test, >>> so don't worry, I can just remove the pool. And I learnt that since the >>> replication number and pool number are related to pg_num, I'll consider >>> them carefully before deploying any data. >>> >>> On Mar 23, 2016, at 6:58 AM, Goncalo Borges < >>> goncalo.bor...@sydney.edu.au> wrote: >>> >>> Hi Zhang... >>> >>> If I can add some more info, the change of PGs is a heavy operation, and >>> as far as i know, you should NEVER decrease PGs. From the notes in pgcalc ( >>> http://ceph.com/pgcalc/): >>> >>> "It's also important to know that the PG count can be increased, but >>> NEVER decreased without destroying / recreating the pool. However, >>> increasing the PG Count of a pool is one of the most impactful events in a >>> Ceph Cluster, and should be avoided for production clusters if possible." >>> >>> So, in your case, I would consider in adding more OSDs. >>> >>> Cheers >>> Goncalo >>> >>> > _______________________________________________ > ceph-users mailing list > ceph-users@lists.ceph.com > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > >
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com