We investigated the issue and set debug_mon up to 20 during little change of osdmap get many messages for all pgs of each pool (for all cluster):
> 2018-12-25 19:28:42.426776 7f075af7d700 20 mon.1@0(leader).osd e1373789 > prime_pg_tempnext_up === next_acting now, clear pg_temp > 2018-12-25 19:28:42.426776 7f075a77c700 20 mon.1@0(leader).osd e1373789 > prime_pg_tempnext_up === next_acting now, clear pg_temp > 2018-12-25 19:28:42.426777 7f075977a700 20 mon.1@0(leader).osd e1373789 > prime_pg_tempnext_up === next_acting now, clear pg_temp > 2018-12-25 19:28:42.426779 7f075af7d700 20 mon.1@0(leader).osd e1373789 > prime_pg_temp 3.1000 [97,812,841]/[] -> [97,812,841]/[97,812,841], priming > [] > 2018-12-25 19:28:42.426780 7f075a77c700 20 mon.1@0(leader).osd e1373789 > prime_pg_temp 3.0 [84,370,847]/[] -> [84,370,847]/[84,370,847], priming [] > 2018-12-25 19:28:42.426781 7f075977a700 20 mon.1@0(leader).osd e1373789 > prime_pg_temp 4.0 [404,857,11]/[] -> [404,857,11]/[404,857,11], priming [] though no pg_temps are created as result(no single backfill) We suppose this behavior changed in commit https://github.com/ceph/ceph/pull/16530/commits/ea723fbb88c69bd00fefd32a3ee94bf5ce53569c because earlier function *OSDMonitor::prime_pg_temp* should return in https://github.com/ceph/ceph/blob/luminous/src/mon/OSDMonitor.cc#L1009 like in jewel https://github.com/ceph/ceph/blob/jewel/src/mon/OSDMonitor.cc#L1214 i accept that we may be mistaken On Wed, Dec 12, 2018 at 10:53 PM Gregory Farnum <gfar...@redhat.com> wrote: > Hmm that does seem odd. How are you looking at those sizes? > > On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov <palz...@gmail.com> wrote: > >> Greq, for example for our cluster ~1000 osd: >> >> size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860, >> modified 2018-12-12 04:00:17.661731) >> size osdmap.1357882__0_F7FE772D__none = 363KB >> size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861, >> modified 2018-12-12 04:00:27.385702) >> size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB >> >> difference between epoch 1357881 and 1357883: crush weight one osd was >> increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size >> inc_osdmap so huge >> >> чт, 6 дек. 2018 г. в 06:20, Gregory Farnum <gfar...@redhat.com>: >> > >> > On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov <palz...@gmail.com> wrote: >> >> >> >> Hi guys >> >> >> >> I faced strange behavior of crushmap change. When I change crush >> >> weight osd I sometimes get increment osdmap(1.2MB) which size is >> >> significantly bigger than size of osdmap(0.4MB) >> > >> > >> > This is probably because when CRUSH changes, the new primary OSDs for a >> PG will tend to set a "pg temp" value (in the OSDMap) that temporarily >> reassigns it to the old acting set, so the data can be accessed while the >> new OSDs get backfilled. Depending on the size of your cluster, the number >> of PGs on it, and the size of the CRUSH change, this can easily be larger >> than the rest of the map because it is data with size linear in the number >> of PGs affected, instead of being more normally proportional to the number >> of OSDs. >> > -Greg >> > >> >> >> >> I use luminois 12.2.8. Cluster was installed a long ago, I suppose >> >> that initially it was firefly >> >> How can I view content of increment osdmap or can you give me opinion >> >> on this problem. I think that spikes of traffic tight after change of >> >> crushmap relates to this crushmap behavior >> >> _______________________________________________ >> >> ceph-users mailing list >> >> ceph-users@lists.ceph.com >> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com >> >> >> >> -- >> Best regards, Sergey Dolgov >> > -- Best regards, Sergey Dolgov
_______________________________________________ ceph-users mailing list ceph-users@lists.ceph.com http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com