I'm not sure what design doc you are looking at (v1 probably?, v3 is here: https://cwiki.apache.org/KAFKA/kafka-detailed-replication-design-v3.html ) but If I understand correctly, consistent hashing for partitioning is more about remapping as few keys as possible when adding/deleting partitions, which you can implement already with a custom partitioner by doing partition_id = abs(num_partitions * hash(key)/hash_space). But the added value is mitigated by the fact that if you add/delete partitions you already destroy your partitioning and make it kind of useless?
However if you are talking about partition assignment to brokers that's another part, and I guess the current state of things is just a simple round robin to assign partitions on topic creation (to be confirmed?). It could be interesting to have partitions assigned to brokers with some consistent hashing so that adding a broker requires moving as few partitions as possible. That process is done manually as of now using a ReassignPartition command, and it could be automated with a tool (provided that you over-partition as Jun recommends, so that you have some granularity and the load gets spread evenly over brokers). On Mon, Jan 14, 2013 at 5:04 PM, Stan Rosenberg <stan.rosenb...@gmail.com>wrote: > On Fri, Jan 11, 2013 at 12:37 AM, Jun Rao <jun...@gmail.com> wrote: > > > Our current partitioning strategy is to mod key by # of partitions, not # > > brokers. For better balancing partitions over brokers, one simple > strategy > > is to over partition, i.e., you have a few times of more partitions than > > brokers. That way, if one adds more brokers overtime, you can just move > > some existing partitions to the new broker. > > > > Consistent hashing requires changing # partitions dynamically.However, > for > > some applications, they may prefer not to change partitions. > > > > > What's your use case for consistent hashing? > > > > My use case is essentially the same as above, i.e., dynamic load balancing. > I now understand why the current partitioning strategy is used as opposed > to consistent hashing; partition "stickiness" > is definitely to be desired for the sake of moving computation to data. > However, the dynamic rebalancing as described in "Kafka Replication > Design", sect. 1.2 looks very similar to what's typically achieved by using > consistent hashing. > Is this rebalancing implemented in 0.8 or am I reading the now obsolete > documentation? :) (If yes, could you please refer me to the code.) > > Thanks, > > stan >