Just curious, Can you double check if you have log compaction enabled on your Kafka brokers ?
On Wed, Jul 29, 2015 at 8:30 AM, Vladimir Lebedev <w...@fastmail.fm> wrote: > Hello, > > I have a problem with changelog in one of my samza jobs grows indefinitely. > > The job is quite simple, it reads messages from the input kafka topic, and > either creates or updates a key in task-local samza store. Once in a minute > the window method kicks-in, it iterates over all keys in the store and > deletes some of them, selecting on the contents of their value. > > Message rate in input topic is about 3000 messages per second. The input > topic is partitioned in 48 partitions. Average number of keys, kept in the > store is more or less stable and do not exceed 10000 keys per task. Average > size of values is 50 bytes. So I expected that sum of all segments' size in > kafka data directory for the job's changelog topic should not exceed > 10000*50*48 ~= 24Mbytes. In fact it is more than 2.5GB (after 6 days > running from scratch) and it is growing. > > I tried to change default segment size for changelog topic in kafka, and > it worked a bit - instead of 500Mbyte segments I have now 50Mbyte segments, > but it did not heal the indefinite data growth problem. > > Moreover, if I stop the job and start it again it cannot restart, it > breaks right after reading all records from changelog topic. > > Did somebody have similar problem? How it could be resolved? > > Best regards, > Vladimir > > -- > Vladimir Lebedev > w...@fastmail.fm > > -- Thanks and regards Chinmay Soman