Update:
I want to give a quick update on what I found porting the 0.10 version
towards 1.0.
1. It is difficult to provide a stock CombinedKey Serde.
We effectively wrap 2 serdes for the key. We do not have good topic
names to feed into the Avro Serde for K1 and K2 for the same topic.
We can also not carry along the Serdes from the creation of the
table and remember the topic name because of whitelist subscriptions.
2. We should drop the Idea of keysplitter and combiner
I cannot seem to find a good place to have a single layer to handle
this. It seems to spread everywhere throughout the codebase. I think
that its due to the fact that it is an oddity and a break in the
architecture to have something like this. Maybe one introduces that in a
later step but it's
very messy to have that in the first step and really consuming 80%
of the effort put into the KIP.
3. Caching is messing with my head very heavily at the moment. I have
full control over the RocksDB holding the right side (b), So I can make
it not cache. Which is good. I do inherit the store of the left side
(A) and I have no control over its caching behaviour.
Let me elaborate:
Say a tuple A,B got emmited after joining and the delete for A goes into
the cache.
After that the B record would be deleted aswell.
B's join processor would look up A and see `null` while computing for
old and new value
(at this point we can execute joiner with A beeing null and still emit
something, but its not gonna represent the actual oldValue)
Then As cache flushes
it doesn't see B so its also not gonna put a proper oldValue.
The output can then not be used for say any aggregate as a delete
would not reliably find its old aggregate where it needs to be removed from
filter will also break as it stopps null,null changes from
propagating. So for me it looks pretty clearly that Caching with Join
breaks KTable semantics. be it my new join or the
currently existing once.
4. I further want to propose that I leave out IQ support in the first
step. Copy pasting the if(storeName == null) that is in almost any
processor is unideal. I want to lift it to the topology level in
the next step (adding a new processor that will maintain the user
provided store as a downstream processor)
That is where I stand currently. I would appreciate feedback on all the
points
Best Jan
On 27.10.2017 06:38, Jan Filipiak wrote:
Hello everyone,
this is the new discussion thread after the ID-clash.
Best
Jan
______
Hello Kafka-users,
I want to continue with the development of KAFKA-3705, which allows
the Streams DSL to perform KTableKTable-Joins when the KTables have a
one-to-many relationship.
To make sure we cover the requirements of as many users as possible
and have a good solution afterwards I invite everyone to read through
the KIP I put together and discuss it here in this Thread.
https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable
https://issues.apache.org/jira/browse/KAFKA-3705
https://github.com/apache/kafka/pull/3720
I think a public discussion and vote on a solution is exactly what is
needed to bring this feauture into kafka-streams. I am looking forward
to everyones opinion!
Please keep the discussion on the mailing list rather than commenting
on the wiki (wiki discussions get unwieldy fast).
Best
Jan