With regard to KIP-478, there is the idea to introduce a `RecordContext` class.
Thus, we could just change the `StreamPartitioner` to take this new class as parameter to `partition()`? This might actually kill two birds with one stone, because I could imagine use cases in which users want to partition based on header information that is currently not exposed either. For this case, we don't even need to provide any default implementation of `StreamPartitioner` but users can just implement it themselves. The use case itself makes sense, but it does not seem to be generic enough that we need to provide an out-of-the-box implementation for it. -Matthias On 9/10/20 3:59 PM, Sophie Blee-Goldman wrote: > Hey Balan, thanks for the KIP! > > The motivation here makes sense to me, but I have a few questions about the > proposed API > > I guess the main thing to point out is that if we just add new addSink() > overloads to Topology, > then only the lower level Processor API will benefit and users of the DSL > won't be able to utilize > this. This seems like a useful feature that we should make available to > anyone. > > We could follow a similar approach and add new toStream overloads to the > KStream class, but > that would expand the surface area of the API pretty significantly. The > additional addSink() > overloads alone would do this. The addSink() methods already have a pretty > large number > of optional parameters which means more and more overloads every time a new > one is added. > We should avoid making this problem worse wherever possible. > > Existing StreamPartitioner in SinkNode will be made null when context >> partition is enabled > > > This line from your KIP gave me some idea that it might be avoidable in > this case. The implication > of this quote is that the StreamPartitioner and useContextPartition > parameter are inherently > incompatible since they are two ways of specifying the same thing, the > target partition. Well, if > that's the case, then we should be able to just leverage the existing > StreamPartitioner in some > way to specify that we want records to end up in the source partition, > without introducing a new > parameter. > > One option would be to just let users pass in a null StreamPartitioner to > mean that it should > use the source partition, but that seems a bit subtle. Maybe a better API > would be to offer > a new out-of-the-box StreamPartitioner called SourceContextPartitioner (or > something), > and then users just have to pass in an instance of this. WDYT? > > On Thu, Sep 10, 2020 at 8:00 AM Balan k <ksbalan2...@gmail.com> wrote: > >> >> Forgot to add the link >> >> https://cwiki.apache.org/confluence/display/KAFKA/KIP-669%3A+Preserve+Source+Partition+in+Kafka+Streams+from+context >> >> >> >> On 2020/09/10 13:40:02, satyanarayan komandur <ksbalan2...@gmail.com> >> wrote: >>> Hi, >>> >>> I have submitted a new KIP for preserving processor record context >> partition from source. I am looking for suggestions/comments. >>> >>> In most use cases where source message is getting transformed and sent >> to a target topic, where >>> 1. number of partitions on source and sink topic are same >>> 2. there is no change to the key >>> 3. more importantly if we would like to preserve the partition as is >> without re-deriving using partition from context would help. >>> >>> I am aware of one caveat where record processor context partition is not >> known in stream punctuation. >>> >>> Please look over the KIP and chime in more ideas >>> >>> Thanks >>> Balan >>> >>> >>> >> >
signature.asc
Description: OpenPGP digital signature