Re: [ceph-users] size of inc_osdmap vs osdmap

Sergey Dolgov Thu, 27 Dec 2018 13:20:42 -0800

We investigated the issue and set debug_mon up to 20 during little change
of osdmap get many messages for all pgs of each pool (for all cluster):


> 2018-12-25 19:28:42.426776 7f075af7d700 20 mon.1@0(leader).osd e1373789
> prime_pg_tempnext_up === next_acting now, clear pg_temp
> 2018-12-25 19:28:42.426776 7f075a77c700 20 mon.1@0(leader).osd e1373789
> prime_pg_tempnext_up === next_acting now, clear pg_temp
> 2018-12-25 19:28:42.426777 7f075977a700 20 mon.1@0(leader).osd e1373789
> prime_pg_tempnext_up === next_acting now, clear pg_temp
> 2018-12-25 19:28:42.426779 7f075af7d700 20 mon.1@0(leader).osd e1373789
> prime_pg_temp 3.1000 [97,812,841]/[] -> [97,812,841]/[97,812,841], priming
> []
> 2018-12-25 19:28:42.426780 7f075a77c700 20 mon.1@0(leader).osd e1373789
> prime_pg_temp 3.0 [84,370,847]/[] -> [84,370,847]/[84,370,847], priming []
> 2018-12-25 19:28:42.426781 7f075977a700 20 mon.1@0(leader).osd e1373789
> prime_pg_temp 4.0 [404,857,11]/[] -> [404,857,11]/[404,857,11], priming []

though no pg_temps are created as result(no single backfill)

We suppose this behavior changed in commit
https://github.com/ceph/ceph/pull/16530/commits/ea723fbb88c69bd00fefd32a3ee94bf5ce53569c
because earlier function *OSDMonitor::prime_pg_temp* should return in
https://github.com/ceph/ceph/blob/luminous/src/mon/OSDMonitor.cc#L1009 like
in jewel https://github.com/ceph/ceph/blob/jewel/src/mon/OSDMonitor.cc#L1214

i accept that we may be mistaken


On Wed, Dec 12, 2018 at 10:53 PM Gregory Farnum <gfar...@redhat.com> wrote:

> Hmm that does seem odd. How are you looking at those sizes?
>
> On Wed, Dec 12, 2018 at 4:38 AM Sergey Dolgov <palz...@gmail.com> wrote:
>
>> Greq, for example for our cluster ~1000 osd:
>>
>> size osdmap.1357881__0_F7FE779D__none = 363KB (crush_version 9860,
>> modified 2018-12-12 04:00:17.661731)
>> size osdmap.1357882__0_F7FE772D__none = 363KB
>> size osdmap.1357883__0_F7FE74FD__none = 363KB (crush_version 9861,
>> modified 2018-12-12 04:00:27.385702)
>> size inc_osdmap.1357882__0_B783A4EA__none = 1.2MB
>>
>> difference between epoch 1357881 and 1357883: crush weight one osd was
>> increased by 0.01 so we get 5 new pg_temp in osdmap.1357883 but size
>> inc_osdmap so huge
>>
>> чт, 6 дек. 2018 г. в 06:20, Gregory Farnum <gfar...@redhat.com>:
>> >
>> > On Wed, Dec 5, 2018 at 3:32 PM Sergey Dolgov <palz...@gmail.com> wrote:
>> >>
>> >> Hi guys
>> >>
>> >> I faced strange behavior of crushmap change. When I change crush
>> >> weight osd I sometimes get  increment osdmap(1.2MB) which size is
>> >> significantly bigger than size of osdmap(0.4MB)
>> >
>> >
>> > This is probably because when CRUSH changes, the new primary OSDs for a
>> PG will tend to set a "pg temp" value (in the OSDMap) that temporarily
>> reassigns it to the old acting set, so the data can be accessed while the
>> new OSDs get backfilled. Depending on the size of your cluster, the number
>> of PGs on it, and the size of the CRUSH change, this can easily be larger
>> than the rest of the map because it is data with size linear in the number
>> of PGs affected, instead of being more normally proportional to the number
>> of OSDs.
>> > -Greg
>> >
>> >>
>> >> I use luminois 12.2.8. Cluster was installed a long ago, I suppose
>> >> that initially it was firefly
>> >> How can I view content of increment osdmap or can you give me opinion
>> >> on this problem. I think that spikes of traffic tight after change of
>> >> crushmap relates to this crushmap behavior
>> >> _______________________________________________
>> >> ceph-users mailing list
>> >> ceph-users@lists.ceph.com
>> >> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>
>>
>>
>> --
>> Best regards, Sergey Dolgov
>>
>

-- 
Best regards, Sergey Dolgov

_______________________________________________
ceph-users mailing list
ceph-users@lists.ceph.com
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

Re: [ceph-users] size of inc_osdmap vs osdmap

Reply via email to