Hey Jay,

Thanks for the comment!

I have not specifically thought about how this works with Streams and
Connect. The current KIP w.r.t. the interface that our producer and
consumer exposes to the user. It ensures that if there are two messages
with the same key produced by the same producer, say messageA and messageB,
and suppose messageB is produced after messageA to a different partition
than messageA, then we can guarantee that the following sequence can happen
in order:

- Consumer of messageA can execute callback, in which user can flush state
related to the key of messageA.
- messageA is delivered by its consumer to the application
- Consumer of messageB can execute callback, in which user can load the
state related to the key of messageB.
- messageB is delivered by its consumer to the application.

So it seems that it should support Streams and Connect properly. But I am
not entirely sure because I have not looked into how Streams and Connect
works. I can think about it more if you can provide an example where this
does not work for Streams and Connect.

Regarding the second question, I think linear hashing approach provides a
way to reduce the number of partitions that can "conflict" with a give
partition to *log_2(n)*, as compares to *n* in the current KIP, where n is
the total number of partitions of the topic. This will be useful when
number of partition is large and asymptotic complexity matters.

I personally don't think this optimization is worth the additional
complexity in Kafka. This is because partition expansion or deletion should
happen infrequently and the largest number of partitions of a single topic
today is not that large -- probably 1000 or less. And when partitions of a
topic changes, each consumer will likely need to query and wait for
positions of a large percentage of partitions of the topic anyway even with
this optimization. I think this algorithm is kind of orthogonal to this
KIP. We can extend the KIP to support this algorithm in the future as well.

Thanks,
Dong

On Thu, Feb 22, 2018 at 5:19 PM, Jay Kreps <j...@confluent.io> wrote:

> Hey Dong,
>
> Two questions:
> 1. How will this work with Streams and Connect?
> 2. How does this compare to a solution where we physically split partitions
> using a linear hashing approach (the partition number is equivalent to the
> hash bucket in a hash table)? https://en.wikipedia.org/wiki/Linear_hashing
>
> -Jay
>
> On Sat, Feb 10, 2018 at 3:35 PM, Dong Lin <lindon...@gmail.com> wrote:
>
> > Hi all,
> >
> > I have created KIP-253: Support in-order message delivery with partition
> > expansion. See
> > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 253%3A+Support+in-order+message+delivery+with+partition+expansion
> > .
> >
> > This KIP provides a way to allow messages of the same key from the same
> > producer to be consumed in the same order they are produced even if we
> > expand partition of the topic.
> >
> > Thanks,
> > Dong
> >
>

Reply via email to