To answer my own question to an extent, I guess one thing I could do is have a 
supplementary topic with 1/16th the partitions. You use that one for auto 
partition rebalancing and then subscribe explicitly to the main topic with 
[partition*16, partition*16 + 16) partitions. That way we can move block of 
partitions around automatically but still have the IO scaleout. I'm sure I can 
even think of a use for the supplementary topic!

-----Original Message-----
From: Young, Ben [mailto:ben.yo...@fisglobal.com]
Sent: 17 May 2017 19:09
To: users@kafka.apache.org
Subject: Partition groups

Hi

I was wondering if something like this was possible. I'd like to be able to use 
partitions to gain some IO parallelism, but certain sets of partitions should 
not be distributed across different machines. Let's say I have data that can be 
processed by time bucket, but I'd like each day's data to go to a single 
machine. I'd have 4x 16 core servers and 64 partitions (for example), and each 
server would get a block of 16 partitions. This could be handled by making the 
hash key be the hashed date and then a random last 4 bits. With the range 
partitioner this works fine.

However if one server dies you'll get a batch of 16 split across two servers, 
whereas I'd like to move a whole group of 16 to one of the remaining servers.

Is this kind of thing possible at all? It can't be unusual to want a kind of 
affinity between related partitions?

I know I can do this with manual assignment, but is this my only option? The 
other option is just to have 4 partitions and thread internally, but then I 
won't get the IO performance.

Thanks,
Ben Young
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.
The information contained in this message is proprietary and/or confidential. 
If you are not the intended recipient, please: (i) delete the message and all 
copies; (ii) do not disclose, distribute or use the message in any manner; and 
(iii) notify the sender immediately. In addition, please be aware that any 
message addressed to our domain is subject to archiving and review by persons 
other than the intended recipient. Thank you.

Reply via email to