Hey Dotan,

Another way to create topics with specific partitions is to use the
bin/kafka-topics.sh tool in Kafka's binary distribution. This allows you to
create topics with a specific count.

> Our concern is how it affects the metrics and checkpoint topics.

Good question. In 0.7.0, the checkpoint topic must have the same number of
partitions as the input topic(s). Samza will use the max(partition count)
when multiple input topics exist (e.g. input topics with 4 and 8 partitions
would result in an 8 partition checkpoint topic). Samza will automatically
create a checkpoint topic of the appropriate size when the job first
executes.

If the input topic(s)'s partition count changes after the job has been
started, the checkpoint topic will have to be resized. In 0.8.0, the
checkpoint topic is a single partition, and this problem goes away.

Keep in mind that resizing a topic (checkpoint, or otherwise) if you're
depending on keyed messages has a big impact, since it will change the
partition that a key gets mapped to (key.hashCode % partitionCount =
partition; partition changes when partitionCount changes).

As for the metrics topic, there shouldn't be any issue.

Cheers,
Chris

On Mon, Feb 2, 2015 at 10:00 PM, Dotan Patrich <[email protected]> wrote:

> Hi,
>
> We're using Samza with Kafka and we would like to use multiple partitions
> in our topics.
>
> We've noticed that the number of partitions is defined in
> server.properties.
> According to the Kafka documentation
> <http://kafka.apache.org/07/configuration.html>, there are 2 options to
> defined the number of partitions:
>
>    - num.partitions - Specifies the default number of partitions per topic.
>    - topic.partition.count.map - Override parameter to control the number
>    of partitions for selected topics. E.g., topic1:10,topic2:20
>
> We thought about using the first option (num.partitions) in order to avoid
> the overhead of adding every new topic to a map. Our concern is how it
> affects the metrics and checkpoint topics.
>
> Can anyone share what is the best practice for using multiple partitions in
> Kafka?
>
>
> Thanks,
> Dotan
>

Reply via email to