Just curious,

Can you double check if you have log compaction enabled on your Kafka
brokers ?

On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev <w...@fastmail.fm> wrote:

> Hello,
>
> I have a problem with changelog in one of my samza jobs grows indefinitely.
>
> The job is quite simple, it reads messages from the input kafka topic, and
> either creates or updates a key in task-local samza store. Once in a minute
> the window method kicks-in, it iterates over all keys in the store and
> deletes some of them, selecting on the contents of their value.
>
> Message rate in input topic is about 3000 messages per second. The input
> topic is partitioned in 48 partitions. Average number of keys, kept in the
> store is more or less stable and do not exceed 10000 keys per task. Average
> size of values is 50 bytes. So I expected that sum of all segments' size in
> kafka data directory for the job's changelog topic should not exceed
> 10000*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days
> running from scratch) and it is growing.
>
> I tried to change default segment size for changelog topic in kafka, and
> it worked a bit - instead of 500Mbyte segments I have now 50Mbyte segments,
> but it did not heal the indefinite data growth problem.
>
> Moreover, if I stop the job and start it again it cannot restart, it
> breaks right after reading all records from changelog topic.
>
> Did somebody have similar problem? How it could be resolved?
>
> Best regards,
> Vladimir
>
> --
> Vladimir Lebedev
> w...@fastmail.fm
>
>


-- 
Thanks and regards

Chinmay Soman

Reply via email to