Managing Millions of Paritions in Kafka

Ravindranath Akila Sat, 05 Oct 2013 20:20:01 -0700

Initially, I thought dynamic topic creation can be used to maintain per
user data on Kafka. The I read that partitions can and should be used for
this instead.


If a partition is to be used to map a user, can there be a million, or even
billion partitions in a cluster? How does one go about designing such a
model.

Can the replication tool be used to assign, say partitions 1 - 10,000 on
replica 1, and 10,001 - 20,000 on replica 2?

If not, since there is a ulimit on the file system, should one model it
based on a replica/topic/partition approach. Say users 1-10,000 go on topic
10k-1, and has 10,000 partitions, and users 10,0001-20,000 go on topic
10k-2, and has 10,000 partitions.

Simply put, how can a million stateful data points be handled? (I deduced
that a userid-partition number mapping can be done via a partitioner, but
unless a server can be configured to handle only a given set of partitions,
with a range based notation, it is almost impossible to handle a large
dataset. Is it that Kafka can only handle a limited set of stateful data
right now?)

http://stackoverflow.com/questions/17205561/data-modeling-with-kafka-topics-and-partitions

Btw, why does Kafka have to keep open each partition? Can't a partition be
opened for read/write when needed only?

Thanks in advance!

Managing Millions of Paritions in Kafka

Reply via email to