[ 
https://issues.apache.org/jira/browse/KAFKA-347?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13554062#comment-13554062
 ] 

Jay Kreps commented on KAFKA-347:
---------------------------------

Well i think there are really two mappings here:
key => partition
and
partition => broker

This is the generalization of consistent hashing that most persistent data 
systems use.

In Kafka key=>partition is user-defined (Partitioner interface) and defaults to 
just hash(key)%num_partitions. partition=>broker is assigned at topic creation 
time and from then on is semi-static (changing it is an admin command). So when 
adding a broker we already can move just the number of partitions we need by 
having the tool compute the best set of partitions to migrate or choosing at 
random.

So the idea is that you over-partition and then the partition count doesn't 
change and hence the key=>partition assignment doesn't change.

The question is, do we need to support changing the number of partitions to 
handle the case where you don't over-partition by enough? If you do this then 
the change in mapping would be large. That could be helped a bit by a 
consistent hash partitioner for the key=>partition mapping on the client side, 
but even in that case you would still have lots of values that are now in the 
wrong partition, so any code that depended on the partitioning would be broken.

Alternately you could do the hard work of actually implementing partition 
splitting on the broker by having the broker split a partition into two and 
then migrating the new partitions.

The question I would ask is, is any of this worth it? Many data system don't 
support partition splitting they just say "choose your partition count wisely 
or else delete it and start fresh". Arguably most messaging use cases are 
particularly concerned with recent data so this might be a fine answer. So an 
alternate strategy would just be to spend the time working on scaling the 
number of partitions we can handle and over-partitioning heavily.

                
> change number of partitions of a topic online
> ---------------------------------------------
>
>                 Key: KAFKA-347
>                 URL: https://issues.apache.org/jira/browse/KAFKA-347
>             Project: Kafka
>          Issue Type: Sub-task
>          Components: core
>    Affects Versions: 0.8
>            Reporter: Jun Rao
>              Labels: features
>
> We will need an admin tool to change the number of partitions of a topic 
> online.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

Reply via email to