Update:

I want to give a quick update on what I found porting the 0.10 version towards 1.0.

1. It is difficult to provide a stock CombinedKey Serde.
We effectively wrap 2 serdes for the key. We do not have good topic names to feed into the Avro Serde for K1 and K2 for the same topic. We can also not carry along the Serdes from the creation of the table and remember the topic name because of whitelist subscriptions.
2. We should drop the Idea of keysplitter and combiner
I cannot seem to find a good place to have a single layer to handle this. It seems to spread everywhere throughout the codebase. I think that its due to the fact that it is an oddity and a break in the architecture to have something like this. Maybe one introduces that in a later step but it's very messy to have that in the first step and really consuming 80% of the effort put into the KIP. 3. Caching is messing with my head very heavily at the moment. I have full control over the RocksDB holding the right side (b), So I can make it not cache. Which is good. I do inherit the store of the left side (A) and I have no control over its caching behaviour.
    Let me elaborate:

Say a tuple A,B got emmited after joining and the delete for A goes into the cache.
After that the B record would be deleted aswell.
B's join processor would look up A and see `null` while computing for old and new value (at this point we can execute joiner with A beeing null and still emit something, but its not gonna represent the actual oldValue)
Then As cache flushes
it doesn't see B so its also not gonna put a proper oldValue.

The output can then not be used for say any aggregate as a delete would not reliably find its old aggregate where it needs to be removed from filter will also break as it stopps null,null changes from propagating. So for me it looks pretty clearly that Caching with Join breaks KTable semantics. be it my new join or the currently existing once.

4. I further want to propose that I leave out IQ support in the first step. Copy pasting the if(storeName == null) that is in almost any processor is unideal. I want to lift it to the topology level in the next step (adding a new processor that will maintain the user provided store as a downstream processor)

That is where I stand currently. I would appreciate feedback on all the points

Best Jan







On 27.10.2017 06:38, Jan Filipiak wrote:
Hello everyone,

this is the new discussion thread after the ID-clash.

Best
Jan

______


Hello Kafka-users,

I want to continue with the development of KAFKA-3705, which allows
the Streams DSL to perform KTableKTable-Joins when the KTables have a
one-to-many relationship.
To make sure we cover the requirements of as many users as possible
and have a good solution afterwards I invite everyone to read through
the KIP I put together and discuss it here in this Thread.

https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable

https://issues.apache.org/jira/browse/KAFKA-3705
https://github.com/apache/kafka/pull/3720

I think a public discussion and vote on a solution is exactly what is
needed to bring this feauture into kafka-streams. I am looking forward
to everyones opinion!

Please keep the discussion on the mailing list rather than commenting
on the wiki (wiki discussions get unwieldy fast).

Best
Jan




Reply via email to