Re: [DISCUSS] KIP-213 Support non-key joining in KTable

Jan Filipiak Fri, 16 Feb 2018 05:01:06 -0800

Update:

I want to give a quick update on what I found porting the 0.10 versiontowards 1.0.


1. It is difficult to provide a stock CombinedKey Serde.

We effectively wrap 2 serdes for the key. We do not have good topicnames to feed into the Avro Serde for K1 and K2 for the same topic.We can also not carry along the Serdes from the creation of thetable and remember the topic name because of whitelist subscriptions.

2. We should drop the Idea of keysplitter and combiner

I cannot seem to find a good place to have a single layer to handlethis. It seems to spread everywhere throughout the codebase. I thinkthat its due to the fact that it is an oddity and a break in thearchitecture to have something like this. Maybe one introduces that in alater step but it'svery messy to have that in the first step and really consuming 80%of the effort put into the KIP.3. Caching is messing with my head very heavily at the moment. I havefull control over the RocksDB holding the right side (b), So I can makeit not cache. Which is good. I do inherit the store of the left side(A) and I have no control over its caching behaviour.

    Let me elaborate:

Say a tuple A,B got emmited after joining and the delete for A goes intothe cache.

After that the B record would be deleted aswell.

B's join processor would look up A and see `null` while computing forold and new value(at this point we can execute joiner with A beeing null and still emitsomething, but its not gonna represent the actual oldValue)

Then As cache flushes
it doesn't see B so its also not gonna put a proper oldValue.

The output can then not be used for say any aggregate as a deletewould not reliably find its old aggregate where it needs to be removed fromfilter will also break as it stopps null,null changes frompropagating. So for me it looks pretty clearly that Caching with Joinbreaks KTable semantics. be it my new join or thecurrently existing once.

4. I further want to propose that I leave out IQ support in the firststep. Copy pasting the if(storeName == null) that is in almost anyprocessor is unideal. I want to lift it to the topology level inthe next step (adding a new processor that will maintain the userprovided store as a downstream processor)

That is where I stand currently. I would appreciate feedback on all thepoints


Best Jan







On 27.10.2017 06:38, Jan Filipiak wrote:

Hello everyone,

this is the new discussion thread after the ID-clash.

Best
Jan

______


Hello Kafka-users,

I want to continue with the development of KAFKA-3705, which allows
the Streams DSL to perform KTableKTable-Joins when the KTables have a
one-to-many relationship.
To make sure we cover the requirements of as many users as possible
and have a good solution afterwards I invite everyone to read through
the KIP I put together and discuss it here in this Thread.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable

https://issues.apache.org/jira/browse/KAFKA-3705
https://github.com/apache/kafka/pull/3720

I think a public discussion and vote on a solution is exactly what is
needed to bring this feauture into kafka-streams. I am looking forward
to everyones opinion!

Please keep the discussion on the mailing list rather than commenting
on the wiki (wiki discussions get unwieldy fast).

Best
Jan

Re: [DISCUSS] KIP-213 Support non-key joining in KTable

Reply via email to