Hi Prateek,

As far as I can gather, we are indeed seeing duplicate keys in the
checkpoint topics. We definitely have cleanup.policy=compact on all of
these topics. When you suggest that I check the topic partition size graph,
is this just the checkpoint topics, or any that have cleanup.policy=compact?

Cheers,
Malcolm McFarland
Cavulus


This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
unauthorized or improper disclosure, copying, distribution, or use of the
contents of this message is prohibited. The information contained in this
message is intended only for the personal and confidential use of the
recipient(s) named above. If you have received this message in error,
please notify the sender immediately and delete the original message.


On Wed, Nov 6, 2019 at 12:00 PM Prateek Maheshwari <prateek...@gmail.com>
wrote:

> Hi Malcolm,
>
> Using cleanup.policy=compact on the Kafka checkpoint topic should be
> sufficient, and is  the default when the topic is created by Samza. Under
> normal operations, a checkpoint topic should only have ~ num task messages.
>
> I can suggest the following ways to identify the issue:
> 1. Read the topic contents using kafka-console-consumer and check if the
> extra size is due to incorrect entries (a second / non-samza writer), or
> due to duplicate entries for the same key (log compaction issues).
> 2. If duplicate keys, verify if Kafka's log compaction is kicking in and
> compacting stale entries. One evidence of this working is a sawtooth
> pattern in the Kafka topic partition size graph. You can also check the
> Kafka broker logs for any log compaction related error messages.
> 3. If log compaction isn't working, verify if the related Kafka topic /
> broker configurations are appropriate. E.g, log.cleaner.enable,
> log.cleaner.threads, min.cleanable.dirty.ratio, min/max.compaction.lag.ms,
> delete.retention.ms etc.
>
> Let us know if you are able to find any more details.
>
> Thanks,
> Prateek
>
>
> On Tue, Nov 5, 2019 at 9:20 AM Malcolm McFarland <mmcfarl...@cavulus.com>
> wrote:
>
> > Hey folks,
> >
> > We have cleanup.policy=compact set on our checkpoint topics. Even with
> > this, we have almost 3 billion messages in some of these topics, and this
> > is causing huge startup times. Are there any other settings we should set
> > to optimize our startup times?
> >
> > Cheers,
> > Malcolm McFarland
> > Cavulus
> >
> >
> > This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> > unauthorized or improper disclosure, copying, distribution, or use of the
> > contents of this message is prohibited. The information contained in this
> > message is intended only for the personal and confidential use of the
> > recipient(s) named above. If you have received this message in error,
> > please notify the sender immediately and delete the original message.
> >
>

Reply via email to