Hey all,
We have been running a kafka-stream based service in production for the
last couple of years (we have 4 brokers on this specific cluster).
Inside this service, which does a lot of stuff - there are 2 GlobalKTables
(which are based on 2 compacted topics).

When I restart my app and clean the local state - the restoration of those
topics begins, and the weird thing I am noticing is that for *much *fewer
messages, 1 topic takes *a lot* more time to complete.

Let's call the compacted topics A and B, both are compacted and mainly
configured the same:
Replication 3
Number of Partitions 36

max.message.bytes 25000000
segment.index.bytes 2097152
min.cleanable.dirty.ratio 0.1
cleanup.policy compact
delete.retention.ms 900000
segment.bytes 52428800
segment.ms 3600000
*WIth the exception, that for topic B, we use *
cleanup.policy compact,delete
retention.ms 604800000
*The behaviour i've noticed for a single partition:*
Topic A has 62,552 totalRecordsToBeRestored - and it takes around 20s
Topic B has 24,730,506 totalRecordsToBeRestored - and it takes around 1s

It is worth mentioning that the data that B holds for each record *is much
much bigger *(A holds an integer while B holds a big object)*.*

Now, I get the feeling that the reason is because the data in B is always
relatively "fresh", while the data in topic A is mostly stale (the business
logic behind it suggests that topic A updates at a very low rate - and a
lot of keys would never be updated)
So, for example, it's probably holding keys that haven't been updated since
2018.
Topic B keeps getting updated every couple of milliseconds.

Another difference that I just realized i might share is that topic A is
being "joined" by 6 other streams, while topic B is only being joined by 2.

I find it hard to explain the relation between keeping "old" records and
the huge difference in number of records and their size.

So I guess I am missing some basic concept when it comes to understanding
compacted topics and the way the broker saves and fetches the data OR maybe
we have some underlying problem which can explain it.

Let me know if you need some more info

Thanks!

-- 

Nitay Kufert
Backend Team Leader
[image: ironSource] <http://www.ironsrc.com>

email nita...@ironsrc.com
mobile +972-54-5480021
fax +972-77-5448273
skype nitay.kufert.ssa
121 Menachem Begin St., Tel Aviv, Israel
ironsrc.com <http://www.ironsrc.com>
[image: linkedin] <https://www.linkedin.com/company/ironsource> [image:
twitter] <https://twitter.com/ironsource> [image: facebook]
<https://www.facebook.com/ironSource> [image: googleplus]
<https://plus.google.com/+ironsrc>
This email (including any attachments) is for the sole use of the intended
recipient and may contain confidential information which may be protected
by legal privilege. If you are not the intended recipient, or the employee
or agent responsible for delivering it to the intended recipient, you are
hereby notified that any use, dissemination, distribution or copying of
this communication and/or its content is strictly prohibited. If you are
not the intended recipient, please immediately notify us by reply email or
by telephone, delete this email and destroy any copies. Thank you.

Reply via email to